From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,79e55eadd97001c2 X-Google-Attributes: gid103376,public From: dewar@merv.cs.nyu.edu (Robert Dewar) Subject: Re: Compiler error messages Date: 1998/01/23 Message-ID: X-Deja-AN: 318585480 References: <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com> X-Complaints-To: usenet@news.nyu.edu X-Trace: news.nyu.edu 885557940 3238 (None) 128.122.140.58 Organization: New York University Newsgroups: comp.lang.ada Date: 1998-01-23T00:00:00+00:00 List-Id: Nick Roberts said <> The trouble is that this reasonable prescription is meaningless. A program is either right or wrong from a formal point of view. Especially when it comes to syntax errors, the only possible syntax error that the above principle could permit is "The above program does not meet the syntax in the Ada RM" without any indication of where or what is wrong. To give *any* more detailed indication of what is wrong requires that you make assumptions of the kind that you say you don't like. I don't know how much you know about compiler techniques, but a compiler never really knows anything about what is wrong in the absence of assumptions of some kind. The question always boils down to how to make these assumptions. It is of course huge and unuseful hyperbole to say that compilers that attempt to give a clear message "invariably [result in] inaccurate/incomplete/vague/wrong [messages]". Your mention of "technically precise" message is not thought through carefully. It makes me think that you are a user and not builder of compilers, since if you built them, you would be more aware of this obvious point. For example, in the discussion at hand a := b & + c; all the following messages are technically precise in the only possible sense that this can be meaningful Missing operand between & and + + c must be parenthesized Redundant + ignored These are relatively reasonable, the following are just as precise from a formal point of view identifier You_Did_Not_Want_This_Here missing between & and + above statement should have been "accept abc" & + replaced by minus operator etc. The only reason these "technically correct" messages are "wrong" is because they are making less likely assumptions than the first set. Let's take an example where GNAT does a lot of work in trying to cdome up with a correct message (try this on various Ada compilers). Write a big package body that looks like package body XYZ is procedure A; procedure B; procedure C; ... procedure Z; procedure A is ... procedure B is ... ... procedure Z is ... end; that's fine, now change the semicolon after the procedure spec for M to an is: procedure L; procedure M is procedure N; that's an *easy* cut and paste error. GNAT will tell you that the is should be a semicolon. This is obvious to a human, but not at all obvious to a compiler. Why not? Well the text from procedure M is, up to and including the final end statement, is a valid procedure body. OOOPS slight mistake for this to be 100% true, add just before the final begin a null package body: begin null; end; the favorite Ada compiler that I used for years before GNAT simply said "unexpected end of file" pointing to the end of the program for this. Easy to see why, it scanned out what it thought was the body of M successfully, and then planned on resuming the scan of the package body and was surprised to find an end of file. THis was a truly horrid error. After a while you got to know it meant that somewhere you had is in place of semicolon, and sometimes I would have to do edits in a binary search to find the bad one. Note that both the GNAT and other compiler errors are both technically valid error messages, but one is MUCH more helpful than the other. My experience in error messages is that it is not something that can be addressed by simplistic principles of the type Nick is reaching for. On the contrary getting to the point of generating useful error messages is extremely difficult. Most people are pleasantly surprised at how well GNAT does in pinning down messages (one of the students in my compiler class last semester, where eveyerone was using Ada, sent some email asking how GNAT manages to give such accurate error messages. Now when that student was asking that question, what did he mean by accurate? Technnically accurate? NOt at all. He meant messages that corresponded to the error he had made. Now only the programmmer knwos the true fix for an error message. An informative error message means guessing correctly at something that is close enough to this "real" reason to click. This is difficult. A huge amount of effort in the GNAT sources goes into this. Let's take another example. Suppose during parsing you encounter a junk end line, i.e. one that is not what is expected. There are three possibilities 1. It is a piece of junk that should be ignored 2. It is a corruption of the currently expected end line, and should be accepted as such 3. There is a missing end line, and this one belongs to an outer scope It is absolutely crucial to make the "right" decision here, since an error will cause chaos in cascaded messages. Of course you can't always make the right decision, but you can try. GNAT uses all sorts of heuristics. It pays close attention to any tokens used, to help match up end lines, and it even looks at the indentation for a clue as to what was meant. If you are interested in pursuing this, have a look at unit par-endh.adb in the GNAT sources. Of course GNAT does not do a perfect job in generating error messages. This is not possible, in the sense that it is not a well defined task. But it does pretty well, and we work on improving it all the time. It is much more instructive to look at specific examples than to speak in generalities here. I certainly agree with Nick that many compilers have incredibly appallingly bad error message generation. In particular, I have never seen a C compiler that I thought was even vaguely acceptable in this regard. Ada compilers have generally been better, partly because Gerry Fisher's interest in error detection meant that the original Ada Ed was pretty good, and as a result the ACVC tests came to expect pretty decent error recovery. Many of the Ada 83 compilers actually directly borrowed some of the NYU work here. We think GNAT takes the generation of good error messages to a stage that is a definite notch better than what has been there previously, but there is lots of room for improvement. We are always happy to get error message suggestions, and examples where things did not work well. SOmetimes the answer is "sorry, we can't be this telepathic", other times the answer is "this may surprise you, but actually this case is easy to fix!" Robert Dewar Ada Core Technologies