Re: Compiler error messages

comp.lang.ada
 help / color / mirror / Atom feed

* Re: Compiler error messages
       [not found] ` <En96AJ.JxL@world.std.com>
@ 1998-01-23  0:00   ` Nick Roberts
       [not found]     ` <EnAqpo.2oJ@world.std.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Roberts @ 1998-01-23  0:00 UTC (permalink / raw)




-- 

Nick Roberts
Croydon, UK

Proprietor, ThoughtWing Software; Independent Software Development
Consultant
* Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 *
*** Always game for a verbal joust (usually as the turkey) ***


Robert A Duff <robertduff@world.std.com> wrote in article
<En96AJ.JxL@world.std.com>...
> >...I always prefer compilers which simply stop at the first
> >error.
> 
> If the compiler is fast enough, this is acceptable.  But even on todays
> fast machines, I think it's worthwhile to try to do better.  It's pretty
> annoying to run a 15-minute build, go off and eat lunch, and come back
> to find that the compiler (or make facility) stopped on some silly error
> in the first file.

On what machine does an Ada 'rebuild' take 15 minutes?  Ouch!  Even a huge
project shouldn't take that long to do a minimum rebuild (generally): this
is an issue which goes far beyond error messages, to programmer
productivity in general.  One of the things that slows down compilers is
trying to be ultra-clever about errors.  Which would you prefer: compiler
(a) which takes 10 minutes to compile but stops at the first error (and
costs less, and is more reliable); or compiler (b) which takes 15 minutes
and has recovery?

> >My advice to compiler writers would be: make SURE that the compiler
reports
> >any error 100% accurately.  That means making NO assumptions about what
> >caused the error ("oh, it was _probably_ because the user forgot to type
a
> >semicolon", etc...).
> 
> What do you mean by this?  It seems to me that the only way to avoid
> making such assumptions would be for the compiler to have a single error
> message, "This is illegal."  Any "better" error message is necessarily
> making some assumption about what the programmer must have meant to
> write.

See my other post to answer this one.

> Here's an example where a compiler really ought to try to be "clever":
> In Ada, if two packages have a type called Something_Or_Other, and you
> have a use_clause on both packages, then the two Something_Or_Other's
> cancel each other out -- neither one is directly visible.  If I refer to
> Something_Or_Other, I don't expect the compiler to correctly guess which
> one I meant, but I definitely want it to tell me that this
> use_clause-cancellation is happening, and where the two
> Something_Or_Other's are.  I don't want just "No directly visible
> declaration of Something_Or_Other."

I would be really surprised if a compiler giving a "name not visible"
didn't list all the candidates (the entities in the dictionary whose names
could include the offending name).  You mean to say some compilers don't do
this?  Now busy testing...





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
       [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>
  1998-01-23  0:00 ` Compiler error messages Larry Kilgallen
  1998-01-23  0:00 ` Robert Dewar
@ 1998-01-23  0:00 ` Robert Dewar
  1998-01-23  0:00   ` Nick Roberts
       [not found] ` <En96AJ.JxL@world.std.com>
  3 siblings, 1 reply; 7+ messages in thread
From: Robert Dewar @ 1998-01-23  0:00 UTC (permalink / raw)



Nick Roberts said

<<My advice to compiler writers would be: make SURE that the compiler reports
any error 100% accurately.  That means making NO assumptions about what
caused the error ("oh, it was _probably_ because the user forgot to type a
semicolon", etc...).  It means reporting everything that could possibly
have caused the error (directly!), even if this means a humungous error
message.  It means producing a technically precise message, even if you
feel some users would prefer something more 'down to earth' (because 'down
to earth' invariably means inaccurate/incomplete/vague/wrong).
>>

The trouble is that this reasonable prescription is meaningless.

A program is either right or wrong from a formal point of view. Especially
when it comes to syntax errors, the only possible syntax error that the
above principle could permit is

"The above program does not meet the syntax in the Ada RM"

without any indication of where or what is wrong. To give *any* more 
detailed indication of what is wrong requires that you make assumptions
of the kind that you say you don't like.

I don't know how much you know about compiler techniques, but a compiler
never really knows anything about what is wrong in the absence of
assumptions of some kind.

The question always boils down to how to make these assumptions.
It is of course huge and unuseful hyperbole to say that compilers
that attempt to give a clear message "invariably [result in]
inaccurate/incomplete/vague/wrong [messages]".

Your mention of "technically precise" message is not thought through
carefully. It makes me think that you are a user and not builder of
compilers, since if you built them, you would be more aware of this
obvious point.

For example, in the discussion at hand

  a := b & + c;

all the following messages are technically precise in the only possible
sense that this can be meaningful

  Missing operand between & and +
  + c must be parenthesized
  Redundant + ignored
  
These are relatively reasonable, the following are just as precise
from a formal point of view

  identifier You_Did_Not_Want_This_Here missing between & and +
  above statement should have been "accept abc"
  & + replaced by minus operator

etc. The only reason these "technically correct" messages are "wrong" is
because they are making less likely assumptions than the first set.

Let's take an example where GNAT does a lot of work in trying to cdome
up with a correct message (try this on various Ada compilers).

Write a big package body that looks like

    package body XYZ is
      procedure A;
      procedure B;
      procedure C;
      ...
      procedure Z;
      
      procedure A is ...
      procedure B is ...
      ...
      procedure Z is ...
    end;

that's fine, now change the semicolon after the procedure spec for M to
an is:

      procedure L;
      procedure M is
      procedure N;

that's an *easy* cut and paste error.

GNAT will tell you that the is should be a semicolon.

This is obvious to a human, but not at all obvious to a compiler.
Why not?

Well the text from procedure M is, up to and including the final end
statement, is a valid procedure body. 

OOOPS slight mistake for this to be 100% true, add just before the final
begin a null package body:

    begin
       null;
    end;


the favorite Ada compiler that I used for years before GNAT simply said

"unexpected end of file" pointing to the end of the program for this.

Easy to see why, it scanned out what it thought was the body of M 
successfully, and then planned on resuming the scan of the package
body and was surprised to find an end of file.

THis was a truly horrid error. After a while you got to know it meant
that somewhere you had is in place of semicolon, and sometimes I 
would have to do edits in a binary search to find the bad one.

Note that both the GNAT and other compiler errors are both technically
valid error messages, but one is MUCH more helpful than the other.

My experience in error messages is that it is not something that can
be addressed by simplistic principles of the type Nick is reaching
for. On the contrary getting to the point of generating useful
error messages is extremely difficult.

Most people are pleasantly surprised at how well GNAT does in pinning
down messages (one of the students in my compiler class last semester,
where eveyerone was using Ada, sent some email asking how GNAT manages
to give such accurate error messages.

Now when that student was asking that question, what did he mean by
accurate?

Technnically accurate?

NOt at all. He meant messages that corresponded to the error he had made.

Now only the programmmer knwos the true fix for an error message. 

An informative error message means guessing correctly at something that
is close enough to this "real" reason to click.

This is difficult. A huge amount of effort in the GNAT sources goes into
this. Let's take another example.


Suppose during parsing you encounter a junk end line, i.e. one that is
not what is expected.

There are three possibilities

  1. It is a piece of junk that should be ignored

  2. It is a corruption of the currently expected end line, and should
	be accepted as such

  3. There is a missing end line, and this one belongs to an outer scope

It is absolutely crucial to make the "right" decision here, since an
error will cause chaos in cascaded messages.

Of course you can't always make the right decision, but you can try.
GNAT uses all sorts of heuristics. It pays close attention to any
tokens used, to help match up end lines, and it even looks at the
indentation for a clue as to what was meant. If you are interested
in pursuing this, have a look at unit par-endh.adb in the GNAT sources.

Of course GNAT does not do a perfect job in generating error messages.
This is not possible, in the sense that it is not a well defined task.

But it does pretty well, and we work on improving it all the time.

It is much more instructive to look at specific examples than to
speak in generalities here.

I certainly agree with Nick that many compilers have incredibly appallingly
bad error message generation. In particular, I have never seen a C compiler
that I thought was even vaguely acceptable in this regard.

Ada compilers have generally been better, partly because Gerry Fisher's
interest in error detection meant that the original Ada Ed was pretty
good, and as a result the ACVC tests came to expect pretty decent
error recovery. Many of the Ada 83 compilers actually directly borrowed
some of the NYU work here.

We think GNAT takes the generation of good error messages to a stage
that is a definite notch better than what has been there previously,
but there is lots of room for improvement. 

We are always happy to get error message suggestions, and examples where
things did not work well. SOmetimes the answer is "sorry, we can't be
this telepathic", other times the answer is "this may surprise you,
but actually this case is easy to fix!"

Robert Dewar
Ada Core Technologies





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
       [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>
  1998-01-23  0:00 ` Compiler error messages Larry Kilgallen
@ 1998-01-23  0:00 ` Robert Dewar
  1998-01-23  0:00 ` Robert Dewar
       [not found] ` <En96AJ.JxL@world.std.com>
  3 siblings, 0 replies; 7+ messages in thread
From: Robert Dewar @ 1998-01-23  0:00 UTC (permalink / raw)



Some other points that Nick makes

  <<I do think there are a lot of things that compilers can do to help users.
    Always making reference to the appropriate section(s) of the manual, for
    example (something that precious few compilers actually do -- why???).>>

Let's assume "the manual" means the Ada RM in this case. Indeed many Ada
compilers do make RM references. We very deliberately decide in GNAT not
to except in a few unusual cases. Our reasoning is that for expert users,
this is unlikely to be necessary, since you know the language and you know
what is right and wrong. For naive users, going and reading the RM tends
to add confusion on top of confusion.

Yes, we know all the arguments on both sides of this issue. No need to
rehash them. If you want, go to DejaNews, there have been long threads on
this issue before. Many people agree with us, some do not. Many people
comment that they like the error messages in GNAT (it was certainly for
example one of the reasons that the Air Force academy chose GNAT over other
competing compilers for teaching Ada). We think this is at least partly a
reflection of the fact that we are forced to try to come up with a
clear error message without relying on the RM reference for an explanation
(that often is inaccesible to beginners).

As I say, the interesting thing here is not general discussions but
particular examples. Interestingly, when we challenge people who think
that RM references are a good thing to come up with specific examples
where an RM reference would help, we have got virtually no input (that
does not surprise us!)

  <<Other ideas are: it is occasionally helpful for the compiler to
    report how long it took to process each file, and usually this
    is very easy to do>>

Does not seem very useful to me, though it would be useful to program. You
can find out how long you spend in each phase of gcc, using the standard
gcc option. (read the gcc manual, it has lots of useful stuff!)

  <<Something that most compiler writers could provide which would be extremely
    useful to their users -- [extremely odd ethnic reference removed] --
    is to provide a section in the manual which discusses each
    error (or those where it would be useful) in some detail.>>

Actually we don't think a section of the manual as such is the right
solution here. We have a design for this, it is a program called GNOME
(Gnat On-line error Message Explanation). THe idea is to bonk on an 
error message from your editor or IDE or whatever, and you get a menu
pointing to a full explanation of the error, cross linked via huper
text to the RM, Rationale, and whatever other useful reference materials
are around.

Isn't that a nice idea?

Unfortunately all we have in place so far is the great name, and since
error messages are pretty good in GNAT, it is not something that is on 
the top of the priority list.

Our users don't complain about error messages. In fact we would like them
to complain more, or at least send in constructive suggestions. "I know
this is wrong, but it seems the error message could have been more helpful"
are useful reports for us

Robert Dewar
Ada Core Technologies


  
  







^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
       [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>
@ 1998-01-23  0:00 ` Larry Kilgallen
  1998-01-23  0:00   ` Robert Dewar
  1998-01-23  0:00 ` Robert Dewar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Larry Kilgallen @ 1998-01-23  0:00 UTC (permalink / raw)



In article <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>, "Nick Roberts" <Nick.Roberts@dial.pipex.com> writes:
> I've been most interested in the thread about compiler error messages.
> 
> Having used many many compilers (BASIC, PASCAL, C, Ada, and all sorts of
> others) for many many years, I've come to the conclusion that, almost
> always, the cleverer the compiler tries to be about error messages, the
> less helpful it ends up being, in reality.
> 
> Many is the time when a compiler has reported an error to me, most
> elaborately and cleverly, and been completely and 100% wrong about the true
> nature/source of the error.  And boy does it make me spit.  Hands up who
> hasn't been infuriated by a 'smart' compiler producing reams of completely
> spurious errors (after one legitimate one), presumably because the compiler
> writer thought it would be really clever for the compiler to 'ignore' the
> first error.  I always prefer compilers which simply stop at the first
> error.  What a sad waste of effort.

I have seen some compilers whch do a horrid job (DEC Scan and Bliss-32)
and some which do a wonderful job (DEC Ada).  I suppose this depends on
what sort of errors one is making, but if I knew enough to categorize
my errors, I wouldn't make them !

Larry Kilgallen




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
  1998-01-23  0:00 ` Compiler error messages Larry Kilgallen
@ 1998-01-23  0:00   ` Robert Dewar
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Dewar @ 1998-01-23  0:00 UTC (permalink / raw)



Larry says

<<> Many is the time when a compiler has reported an error to me, most
> elaborately and cleverly, and been completely and 100% wrong about the true
> nature/source of the error.  And boy does it make me spit.  Hands up who
> hasn't been infuriated by a 'smart' compiler producing reams of completely
> spurious errors (after one legitimate one), presumably because the compiler
> writer thought it would be really clever for the compiler to 'ignore' the
> first error.  I always prefer compilers which simply stop at the first
> error.  What a sad waste of effort.
>>

The idea of stopping on the first error is not an unreasonable one, but
in practice most people prefer to find more than one error on each
compilation if possible.

If you really like to get only the first error, then use the GNAT
switch -gnatm1, that is what it is there for. 

But doing good error recovery is definitely not wasted effort for
two reasons:

1. Figuring out what really went wrong and recovering from it are
closely related tasks.

2. As I say above, many, I would guess most people prefer multiple
error messages.

By the way, in the Ada case, it is of course impractical to validate
a compiler that does not have pretty good error recovery!





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
  1998-01-23  0:00 ` Robert Dewar
@ 1998-01-23  0:00   ` Nick Roberts
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Roberts @ 1998-01-23  0:00 UTC (permalink / raw)



Robert Dewar gives a long reply to my post about compiler error messages,
for which many thanks earnestly.  His reply is specific to GNAT (with which
I am not, in fact, very familiar).  I have no doubt that GNAT is very
clever at producing error messages, and I am not trying to criticise either
GNAT or any other specific compiler.

Robert introduces an example of a highly displaced error locus, very much a
classic case.  I will never forget the day I got an error at the end of a
multi-thousand line Pascal program -- a highly convoluted one at that --
corresponding to an actual error buried deep in the middle.  It took me
days to find it.  It still hurts just thinking about that one.  

However, I can assuredly remember many times when various 'smart' compilers
have 'guessed' an error locus, always wrongly.  They never actually help to
find the error.  In fact, good language design -- such as that of Ada -- is
the only thing which does help.

I am, in fact, a compiler writer, not a user; a fact which, of course, puts
me at a disadvantage when it comes to error messages: messages which mean a
lot to me may well be gibberish for a user.  Referring to the example
Robert gives,

   a := b & + c;

the technically correct error would be "term expected" at the plus sign. 
This is precisely what the Ada RM specifies at this point in the syntax. 
Contrary to what Robert suggests, there is no need for any guesswork or
assumptions.

Of course, this error message might not actually be very helpful to the
user, and I've no doubt that other examples would show up this idea much
more poignantly.  The essence of the problem, to me, is how to be helpful
to the user in a solid way, rather than just making a few blind stabs in
the dark as to what went wrong.  Whilst I've no doubt that certain
compilers may be very clever at making these guesses, a guess is a guess. 
I'm wanting a bit more science to it!

Once again, contributions from any and all would be most welcome!

-- 

Nick Roberts
Croydon, UK

Proprietor, ThoughtWing Software; Independent Software Development
Consultant
* Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 *
*** Always game for a verbal joust (usually as the turkey) ***





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Compiler error messages
       [not found]     ` <EnAqpo.2oJ@world.std.com>
@ 1998-01-24  0:00       ` Nick Roberts
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Roberts @ 1998-01-24  0:00 UTC (permalink / raw)




Robert A Duff <robertduff@world.std.com> wrote in article
<EnAqpo.2oJ@world.std.com>...
> I'm currently using a 500 MHz Alpha workstation with 128 MB of RAM.  I
> didn't say *minimum* rebuild, by which I assume you mean I've changed
> one file that nothing else depends on.  I don't see how you can be
> surprised at 15 minute builds when I didn't tell you how much source
> code I have, nor how much of it was changed.  Surely you will agree that
> *some* rebuilds on *some* projects take 15 minutes or more even on a
> modern fast workstation?
> 
> (By the way, my case is somewhat special: I work on a compiler written
> in its own language, and it's important from time to time to bootstrap
> the compiler -- compile it with itself, then use the resulting
> executable to compile it again, and then check that the executables are
> identical.  Sometimes three recompiles are necessary.  So this means
> that I'm doing a fair number of recompiles from scratch, as opposed to
> making a single change and recompiling just one file.)

How much source code do you have?  On a machine like that -- and I'm going
to regret saying this I just know it -- you must have a godawful compiler
to take 15 minutes, even if it is doing a full rebuild three times.  Does
yours do ultra-heavy optimization?  Phew!

> >is an issue which goes far beyond error messages, to programmer
> >productivity in general.  One of the things that slows down compilers is
> >trying to be ultra-clever about errors.  Which would you prefer:
compiler
> >(a) which takes 10 minutes to compile but stops at the first error (and
> >costs less, and is more reliable); or compiler (b) which takes 15
minutes
> >and has recovery?
> 
> I prefer (a).  Of course, the real bottleneck is often the linker, not
> the compiler.  Either way, I won't be satisfied until I can make various
> changes to a large body of code, and rebuild it in less than about 0.2
> second.

I agree about the linker.  Happiness is a linker hand-coded in assembly.

> In any case, I don't think error recovery is a substantial portion of
> compile time.  In many cases it's possible to design the compiler so
> that most of the error recovery work happens only when there actually is
> an error, in which case it doesn't matter so much.

This kind of design is even more complicated.  Trust your uncle Nicky, he's
been there, seen the film, read the book, bought the customised coffee mug,
_and_ the special edition set of figurines.  True, it's a neat trick if you
can do it!

> >I would be really surprised if a compiler giving a "name not visible"
> >didn't list all the candidates (the entities in the dictionary whose
names
> >could include the offending name).  You mean to say some compilers don't
do
> >this?  Now busy testing...
> 
> Sure, compilers do something along those lines, but there are various
> choices of what to print out, and not all compilers do an equally good
> job.  You say "the dictionary", as if it were one monolithic thing.  In
> Ada, there are all kinds of complicated scope rules.  If I declare X in
> a private part, and refer to X from outside, should the compiler point
> out that perhaps I meant to declare it in the visible part?  What if X
> is in the body (in which case the compiler won't normally be looking at
> that file at all, during this particular run)?

I've always thought that all compilers (written in conventional langauges)
have only one dictionary, with flags and things to distinguish the nitty
gritty.  Am I wrong?

> Here's an interesting example: Suppose I have a generic package
> List_Generic, which has a Push procedure in it.  And I instantiate that
> generic 17 times, for 17 different element types.  (The reason I'm
> talking about generics is that they cause heavy overloading.)  And I try
> to call Push on a list-of-integers, but I forgot to have a use_clause
> for the list-of-integers package.  An Ada compiler might try to point
> out missing use_clauses, but in this case, I believe most will not be
> smart enough to tell me which package -- it's obvious which package I
> meant, by looking at the types of the arguments to Push, but that's not
> easy to teach the compiler, especially in the presence of overloaded
> arguments.  Instead, I suspect most compilers will give me a list of 17
> Pushes I might have meant, or else will just point at the generic itself
> and let me track down the instances myself.  I believe I even remember a
> compiler that would print 17 references to the generic itself.

I was thinking along the lines of listing all the entities in the
dictionary of whose full name the offending name could be a part.  So, in
the above example, you would get all 17 instantiations of the Push
procedure, and you may well get other Pushes too.  With each full name
would go an indication of the kind of entity, e.g.

   Name "Push" not visible, candidates are as follows:
   - MyCom.ATCS.Airobject_Count_Stacking.Push procedure (in
Airobject_Count)
   - MyCom.ATCS.Tracker_Count_Stacking.Push procedure (in Tracker_Count)
   - MyCom.ATCS.Acquiror_Count_Stacking.Push procedure (in Acquiror_Count)
   <and so on>
   - MyCom.ATCS.Front_Panel.Buttons.Push type tagged record

This would have the obvious problem of verbosity.  But what, really, is the
sensible alternative?  Not to print nothing at all, surely.  To select
entities (e.g. only procedures, in this case)?  Dangerous, surely (could be
wrong).  I think everything has to be listed!  I have to say, I also think
that in most cases, in practice, there wouldn't be so many candidates.

Nick Roberts
Croydon, UK

Proprietor, ThoughtWing Software; Independent Software Development
Consultant
* Nick.Roberts@dial.pipex.com * Voicemail & Fax +44 181-405 1124 *
*** Always game for a verbal joust (usually as the turkey) ***






^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1998-01-24  0:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <01bd278c$bea48680$9dfc82c1@xhv46.dial.pipex.com>
1998-01-23  0:00 ` Compiler error messages Larry Kilgallen
1998-01-23  0:00   ` Robert Dewar
1998-01-23  0:00 ` Robert Dewar
1998-01-23  0:00 ` Robert Dewar
1998-01-23  0:00   ` Nick Roberts
     [not found] ` <En96AJ.JxL@world.std.com>
1998-01-23  0:00   ` Nick Roberts
     [not found]     ` <EnAqpo.2oJ@world.std.com>
1998-01-24  0:00       ` Nick Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox