* Re: Compiler error messages [not found] ` <En96AJ.JxL@world.std.com> @ 1998-01-23 0:00 ` Nick Roberts [not found] ` <EnAqpo.2oJ@world.std.com> 0 siblings, 1 reply; 7+ messages in thread From: Nick Roberts @ 1998-01-23 0:00 UTC (permalink / raw) -- Nick Roberts Croydon, UK Proprietor, ThoughtWing Software; Independent Software Development Consultant * Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 * *** Always game for a verbal joust (usually as the turkey) *** Robert A Duff <robertduff@world.std.com> wrote in article <En96AJ.JxL@world.std.com>... > >...I always prefer compilers which simply stop at the first > >error. > > If the compiler is fast enough, this is acceptable. But even on todays > fast machines, I think it's worthwhile to try to do better. It's pretty > annoying to run a 15-minute build, go off and eat lunch, and come back > to find that the compiler (or make facility) stopped on some silly error > in the first file. On what machine does an Ada 'rebuild' take 15 minutes? Ouch! Even a huge project shouldn't take that long to do a minimum rebuild (generally): this is an issue which goes far beyond error messages, to programmer productivity in general. One of the things that slows down compilers is trying to be ultra-clever about errors. Which would you prefer: compiler (a) which takes 10 minutes to compile but stops at the first error (and costs less, and is more reliable); or compiler (b) which takes 15 minutes and has recovery? > >My advice to compiler writers would be: make SURE that the compiler reports > >any error 100% accurately. That means making NO assumptions about what > >caused the error ("oh, it was _probably_ because the user forgot to type a > >semicolon", etc...). > > What do you mean by this? It seems to me that the only way to avoid > making such assumptions would be for the compiler to have a single error > message, "This is illegal." Any "better" error message is necessarily > making some assumption about what the programmer must have meant to > write. See my other post to answer this one. > Here's an example where a compiler really ought to try to be "clever": > In Ada, if two packages have a type called Something_Or_Other, and you > have a use_clause on both packages, then the two Something_Or_Other's > cancel each other out -- neither one is directly visible. If I refer to > Something_Or_Other, I don't expect the compiler to correctly guess which > one I meant, but I definitely want it to tell me that this > use_clause-cancellation is happening, and where the two > Something_Or_Other's are. I don't want just "No directly visible > declaration of Something_Or_Other." I would be really surprised if a compiler giving a "name not visible" didn't list all the candidates (the entities in the dictionary whose names could include the offending name). You mean to say some compilers don't do this? Now busy testing... ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <EnAqpo.2oJ@world.std.com>]
* Re: Compiler error messages [not found] ` <EnAqpo.2oJ@world.std.com> @ 1998-01-24 0:00 ` Nick Roberts 0 siblings, 0 replies; 7+ messages in thread From: Nick Roberts @ 1998-01-24 0:00 UTC (permalink / raw) Robert A Duff <robertduff@world.std.com> wrote in article <EnAqpo.2oJ@world.std.com>... > I'm currently using a 500 MHz Alpha workstation with 128 MB of RAM. I > didn't say *minimum* rebuild, by which I assume you mean I've changed > one file that nothing else depends on. I don't see how you can be > surprised at 15 minute builds when I didn't tell you how much source > code I have, nor how much of it was changed. Surely you will agree that > *some* rebuilds on *some* projects take 15 minutes or more even on a > modern fast workstation? > > (By the way, my case is somewhat special: I work on a compiler written > in its own language, and it's important from time to time to bootstrap > the compiler -- compile it with itself, then use the resulting > executable to compile it again, and then check that the executables are > identical. Sometimes three recompiles are necessary. So this means > that I'm doing a fair number of recompiles from scratch, as opposed to > making a single change and recompiling just one file.) How much source code do you have? On a machine like that -- and I'm going to regret saying this I just know it -- you must have a godawful compiler to take 15 minutes, even if it is doing a full rebuild three times. Does yours do ultra-heavy optimization? Phew! > >is an issue which goes far beyond error messages, to programmer > >productivity in general. One of the things that slows down compilers is > >trying to be ultra-clever about errors. Which would you prefer: compiler > >(a) which takes 10 minutes to compile but stops at the first error (and > >costs less, and is more reliable); or compiler (b) which takes 15 minutes > >and has recovery? > > I prefer (a). Of course, the real bottleneck is often the linker, not > the compiler. Either way, I won't be satisfied until I can make various > changes to a large body of code, and rebuild it in less than about 0.2 > second. I agree about the linker. Happiness is a linker hand-coded in assembly. > In any case, I don't think error recovery is a substantial portion of > compile time. In many cases it's possible to design the compiler so > that most of the error recovery work happens only when there actually is > an error, in which case it doesn't matter so much. This kind of design is even more complicated. Trust your uncle Nicky, he's been there, seen the film, read the book, bought the customised coffee mug, _and_ the special edition set of figurines. True, it's a neat trick if you can do it! > >I would be really surprised if a compiler giving a "name not visible" > >didn't list all the candidates (the entities in the dictionary whose names > >could include the offending name). You mean to say some compilers don't do > >this? Now busy testing... > > Sure, compilers do something along those lines, but there are various > choices of what to print out, and not all compilers do an equally good > job. You say "the dictionary", as if it were one monolithic thing. In > Ada, there are all kinds of complicated scope rules. If I declare X in > a private part, and refer to X from outside, should the compiler point > out that perhaps I meant to declare it in the visible part? What if X > is in the body (in which case the compiler won't normally be looking at > that file at all, during this particular run)? I've always thought that all compilers (written in conventional langauges) have only one dictionary, with flags and things to distinguish the nitty gritty. Am I wrong? > Here's an interesting example: Suppose I have a generic package > List_Generic, which has a Push procedure in it. And I instantiate that > generic 17 times, for 17 different element types. (The reason I'm > talking about generics is that they cause heavy overloading.) And I try > to call Push on a list-of-integers, but I forgot to have a use_clause > for the list-of-integers package. An Ada compiler might try to point > out missing use_clauses, but in this case, I believe most will not be > smart enough to tell me which package -- it's obvious which package I > meant, by looking at the types of the arguments to Push, but that's not > easy to teach the compiler, especially in the presence of overloaded > arguments. Instead, I suspect most compilers will give me a list of 17 > Pushes I might have meant, or else will just point at the generic itself > and let me track down the instances myself. I believe I even remember a > compiler that would print 17 references to the generic itself. I was thinking along the lines of listing all the entities in the dictionary of whose full name the offending name could be a part. So, in the above example, you would get all 17 instantiations of the Push procedure, and you may well get other Pushes too. With each full name would go an indication of the kind of entity, e.g. Name "Push" not visible, candidates are as follows: - MyCom.ATCS.Airobject_Count_Stacking.Push procedure (in Airobject_Count) - MyCom.ATCS.Tracker_Count_Stacking.Push procedure (in Tracker_Count) - MyCom.ATCS.Acquiror_Count_Stacking.Push procedure (in Acquiror_Count) <and so on> - MyCom.ATCS.Front_Panel.Buttons.Push type tagged record This would have the obvious problem of verbosity. But what, really, is the sensible alternative? Not to print nothing at all, surely. To select entities (e.g. only procedures, in this case)? Dangerous, surely (could be wrong). I think everything has to be listed! I have to say, I also think that in most cases, in practice, there wouldn't be so many candidates. Nick Roberts Croydon, UK Proprietor, ThoughtWing Software; Independent Software Development Consultant * Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 * *** Always game for a verbal joust (usually as the turkey) *** ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Compiler error messages [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com> [not found] ` <En96AJ.JxL@world.std.com> @ 1998-01-23 0:00 ` Robert Dewar 1998-01-23 0:00 ` Robert Dewar 1998-01-23 0:00 ` Larry Kilgallen 3 siblings, 0 replies; 7+ messages in thread From: Robert Dewar @ 1998-01-23 0:00 UTC (permalink / raw) Some other points that Nick makes <<I do think there are a lot of things that compilers can do to help users. Always making reference to the appropriate section(s) of the manual, for example (something that precious few compilers actually do -- why???).>> Let's assume "the manual" means the Ada RM in this case. Indeed many Ada compilers do make RM references. We very deliberately decide in GNAT not to except in a few unusual cases. Our reasoning is that for expert users, this is unlikely to be necessary, since you know the language and you know what is right and wrong. For naive users, going and reading the RM tends to add confusion on top of confusion. Yes, we know all the arguments on both sides of this issue. No need to rehash them. If you want, go to DejaNews, there have been long threads on this issue before. Many people agree with us, some do not. Many people comment that they like the error messages in GNAT (it was certainly for example one of the reasons that the Air Force academy chose GNAT over other competing compilers for teaching Ada). We think this is at least partly a reflection of the fact that we are forced to try to come up with a clear error message without relying on the RM reference for an explanation (that often is inaccesible to beginners). As I say, the interesting thing here is not general discussions but particular examples. Interestingly, when we challenge people who think that RM references are a good thing to come up with specific examples where an RM reference would help, we have got virtually no input (that does not surprise us!) <<Other ideas are: it is occasionally helpful for the compiler to report how long it took to process each file, and usually this is very easy to do>> Does not seem very useful to me, though it would be useful to program. You can find out how long you spend in each phase of gcc, using the standard gcc option. (read the gcc manual, it has lots of useful stuff!) <<Something that most compiler writers could provide which would be extremely useful to their users -- [extremely odd ethnic reference removed] -- is to provide a section in the manual which discusses each error (or those where it would be useful) in some detail.>> Actually we don't think a section of the manual as such is the right solution here. We have a design for this, it is a program called GNOME (Gnat On-line error Message Explanation). THe idea is to bonk on an error message from your editor or IDE or whatever, and you get a menu pointing to a full explanation of the error, cross linked via huper text to the RM, Rationale, and whatever other useful reference materials are around. Isn't that a nice idea? Unfortunately all we have in place so far is the great name, and since error messages are pretty good in GNAT, it is not something that is on the top of the priority list. Our users don't complain about error messages. In fact we would like them to complain more, or at least send in constructive suggestions. "I know this is wrong, but it seems the error message could have been more helpful" are useful reports for us Robert Dewar Ada Core Technologies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Compiler error messages [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com> [not found] ` <En96AJ.JxL@world.std.com> 1998-01-23 0:00 ` Robert Dewar @ 1998-01-23 0:00 ` Robert Dewar 1998-01-23 0:00 ` Nick Roberts 1998-01-23 0:00 ` Larry Kilgallen 3 siblings, 1 reply; 7+ messages in thread From: Robert Dewar @ 1998-01-23 0:00 UTC (permalink / raw) Nick Roberts said <<My advice to compiler writers would be: make SURE that the compiler reports any error 100% accurately. That means making NO assumptions about what caused the error ("oh, it was _probably_ because the user forgot to type a semicolon", etc...). It means reporting everything that could possibly have caused the error (directly!), even if this means a humungous error message. It means producing a technically precise message, even if you feel some users would prefer something more 'down to earth' (because 'down to earth' invariably means inaccurate/incomplete/vague/wrong). >> The trouble is that this reasonable prescription is meaningless. A program is either right or wrong from a formal point of view. Especially when it comes to syntax errors, the only possible syntax error that the above principle could permit is "The above program does not meet the syntax in the Ada RM" without any indication of where or what is wrong. To give *any* more detailed indication of what is wrong requires that you make assumptions of the kind that you say you don't like. I don't know how much you know about compiler techniques, but a compiler never really knows anything about what is wrong in the absence of assumptions of some kind. The question always boils down to how to make these assumptions. It is of course huge and unuseful hyperbole to say that compilers that attempt to give a clear message "invariably [result in] inaccurate/incomplete/vague/wrong [messages]". Your mention of "technically precise" message is not thought through carefully. It makes me think that you are a user and not builder of compilers, since if you built them, you would be more aware of this obvious point. For example, in the discussion at hand a := b & + c; all the following messages are technically precise in the only possible sense that this can be meaningful Missing operand between & and + + c must be parenthesized Redundant + ignored These are relatively reasonable, the following are just as precise from a formal point of view identifier You_Did_Not_Want_This_Here missing between & and + above statement should have been "accept abc" & + replaced by minus operator etc. The only reason these "technically correct" messages are "wrong" is because they are making less likely assumptions than the first set. Let's take an example where GNAT does a lot of work in trying to cdome up with a correct message (try this on various Ada compilers). Write a big package body that looks like package body XYZ is procedure A; procedure B; procedure C; ... procedure Z; procedure A is ... procedure B is ... ... procedure Z is ... end; that's fine, now change the semicolon after the procedure spec for M to an is: procedure L; procedure M is procedure N; that's an *easy* cut and paste error. GNAT will tell you that the is should be a semicolon. This is obvious to a human, but not at all obvious to a compiler. Why not? Well the text from procedure M is, up to and including the final end statement, is a valid procedure body. OOOPS slight mistake for this to be 100% true, add just before the final begin a null package body: begin null; end; the favorite Ada compiler that I used for years before GNAT simply said "unexpected end of file" pointing to the end of the program for this. Easy to see why, it scanned out what it thought was the body of M successfully, and then planned on resuming the scan of the package body and was surprised to find an end of file. THis was a truly horrid error. After a while you got to know it meant that somewhere you had is in place of semicolon, and sometimes I would have to do edits in a binary search to find the bad one. Note that both the GNAT and other compiler errors are both technically valid error messages, but one is MUCH more helpful than the other. My experience in error messages is that it is not something that can be addressed by simplistic principles of the type Nick is reaching for. On the contrary getting to the point of generating useful error messages is extremely difficult. Most people are pleasantly surprised at how well GNAT does in pinning down messages (one of the students in my compiler class last semester, where eveyerone was using Ada, sent some email asking how GNAT manages to give such accurate error messages. Now when that student was asking that question, what did he mean by accurate? Technnically accurate? NOt at all. He meant messages that corresponded to the error he had made. Now only the programmmer knwos the true fix for an error message. An informative error message means guessing correctly at something that is close enough to this "real" reason to click. This is difficult. A huge amount of effort in the GNAT sources goes into this. Let's take another example. Suppose during parsing you encounter a junk end line, i.e. one that is not what is expected. There are three possibilities 1. It is a piece of junk that should be ignored 2. It is a corruption of the currently expected end line, and should be accepted as such 3. There is a missing end line, and this one belongs to an outer scope It is absolutely crucial to make the "right" decision here, since an error will cause chaos in cascaded messages. Of course you can't always make the right decision, but you can try. GNAT uses all sorts of heuristics. It pays close attention to any tokens used, to help match up end lines, and it even looks at the indentation for a clue as to what was meant. If you are interested in pursuing this, have a look at unit par-endh.adb in the GNAT sources. Of course GNAT does not do a perfect job in generating error messages. This is not possible, in the sense that it is not a well defined task. But it does pretty well, and we work on improving it all the time. It is much more instructive to look at specific examples than to speak in generalities here. I certainly agree with Nick that many compilers have incredibly appallingly bad error message generation. In particular, I have never seen a C compiler that I thought was even vaguely acceptable in this regard. Ada compilers have generally been better, partly because Gerry Fisher's interest in error detection meant that the original Ada Ed was pretty good, and as a result the ACVC tests came to expect pretty decent error recovery. Many of the Ada 83 compilers actually directly borrowed some of the NYU work here. We think GNAT takes the generation of good error messages to a stage that is a definite notch better than what has been there previously, but there is lots of room for improvement. We are always happy to get error message suggestions, and examples where things did not work well. SOmetimes the answer is "sorry, we can't be this telepathic", other times the answer is "this may surprise you, but actually this case is easy to fix!" Robert Dewar Ada Core Technologies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Compiler error messages 1998-01-23 0:00 ` Robert Dewar @ 1998-01-23 0:00 ` Nick Roberts 0 siblings, 0 replies; 7+ messages in thread From: Nick Roberts @ 1998-01-23 0:00 UTC (permalink / raw) Robert Dewar gives a long reply to my post about compiler error messages, for which many thanks earnestly. His reply is specific to GNAT (with which I am not, in fact, very familiar). I have no doubt that GNAT is very clever at producing error messages, and I am not trying to criticise either GNAT or any other specific compiler. Robert introduces an example of a highly displaced error locus, very much a classic case. I will never forget the day I got an error at the end of a multi-thousand line Pascal program -- a highly convoluted one at that -- corresponding to an actual error buried deep in the middle. It took me days to find it. It still hurts just thinking about that one. However, I can assuredly remember many times when various 'smart' compilers have 'guessed' an error locus, always wrongly. They never actually help to find the error. In fact, good language design -- such as that of Ada -- is the only thing which does help. I am, in fact, a compiler writer, not a user; a fact which, of course, puts me at a disadvantage when it comes to error messages: messages which mean a lot to me may well be gibberish for a user. Referring to the example Robert gives, a := b & + c; the technically correct error would be "term expected" at the plus sign. This is precisely what the Ada RM specifies at this point in the syntax. Contrary to what Robert suggests, there is no need for any guesswork or assumptions. Of course, this error message might not actually be very helpful to the user, and I've no doubt that other examples would show up this idea much more poignantly. The essence of the problem, to me, is how to be helpful to the user in a solid way, rather than just making a few blind stabs in the dark as to what went wrong. Whilst I've no doubt that certain compilers may be very clever at making these guesses, a guess is a guess. I'm wanting a bit more science to it! Once again, contributions from any and all would be most welcome! -- Nick Roberts Croydon, UK Proprietor, ThoughtWing Software; Independent Software Development Consultant * Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 * *** Always game for a verbal joust (usually as the turkey) *** ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Compiler error messages [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com> ` (2 preceding siblings ...) 1998-01-23 0:00 ` Robert Dewar @ 1998-01-23 0:00 ` Larry Kilgallen 1998-01-23 0:00 ` Robert Dewar 3 siblings, 1 reply; 7+ messages in thread From: Larry Kilgallen @ 1998-01-23 0:00 UTC (permalink / raw) In article <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>, "Nick Roberts" <Nick.Roberts@dial.pipex.com> writes: > I've been most interested in the thread about compiler error messages. > > Having used many many compilers (BASIC, PASCAL, C, Ada, and all sorts of > others) for many many years, I've come to the conclusion that, almost > always, the cleverer the compiler tries to be about error messages, the > less helpful it ends up being, in reality. > > Many is the time when a compiler has reported an error to me, most > elaborately and cleverly, and been completely and 100% wrong about the true > nature/source of the error. And boy does it make me spit. Hands up who > hasn't been infuriated by a 'smart' compiler producing reams of completely > spurious errors (after one legitimate one), presumably because the compiler > writer thought it would be really clever for the compiler to 'ignore' the > first error. I always prefer compilers which simply stop at the first > error. What a sad waste of effort. I have seen some compilers whch do a horrid job (DEC Scan and Bliss-32) and some which do a wonderful job (DEC Ada). I suppose this depends on what sort of errors one is making, but if I knew enough to categorize my errors, I wouldn't make them ! Larry Kilgallen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Compiler error messages 1998-01-23 0:00 ` Larry Kilgallen @ 1998-01-23 0:00 ` Robert Dewar 0 siblings, 0 replies; 7+ messages in thread From: Robert Dewar @ 1998-01-23 0:00 UTC (permalink / raw) Larry says <<> Many is the time when a compiler has reported an error to me, most > elaborately and cleverly, and been completely and 100% wrong about the true > nature/source of the error. And boy does it make me spit. Hands up who > hasn't been infuriated by a 'smart' compiler producing reams of completely > spurious errors (after one legitimate one), presumably because the compiler > writer thought it would be really clever for the compiler to 'ignore' the > first error. I always prefer compilers which simply stop at the first > error. What a sad waste of effort. >> The idea of stopping on the first error is not an unreasonable one, but in practice most people prefer to find more than one error on each compilation if possible. If you really like to get only the first error, then use the GNAT switch -gnatm1, that is what it is there for. But doing good error recovery is definitely not wasted effort for two reasons: 1. Figuring out what really went wrong and recovering from it are closely related tasks. 2. As I say above, many, I would guess most people prefer multiple error messages. By the way, in the Ada case, it is of course impractical to validate a compiler that does not have pretty good error recovery! ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~1998-01-24 0:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com> [not found] ` <En96AJ.JxL@world.std.com> 1998-01-23 0:00 ` Compiler error messages Nick Roberts [not found] ` <EnAqpo.2oJ@world.std.com> 1998-01-24 0:00 ` Nick Roberts 1998-01-23 0:00 ` Robert Dewar 1998-01-23 0:00 ` Robert Dewar 1998-01-23 0:00 ` Nick Roberts 1998-01-23 0:00 ` Larry Kilgallen 1998-01-23 0:00 ` Robert Dewar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox