lexical ambiguity

comp.lang.ada
 help / color / mirror / Atom feed

* lexical ambiguity
@ 2006-06-02 22:13 bla_bla1357
  2006-06-02 22:35 ` Frank J. Lhota
  2006-06-02 23:27 ` Keith Thompson
  0 siblings, 2 replies; 23+ messages in thread
From: bla_bla1357 @ 2006-06-02 22:13 UTC (permalink / raw)


I'm doing a lexical analysis of Ada using Lex as part of a student project.
The highlight is on using Lex, not on the programming language of Ada and
I'm not farmilliar with using Ada. So what I woulkd like to find out is if
there is any lexical ambiguity in Ada (like the ambiguity in C with the
unary and binary plus and minus). Thanks in advance...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-02 22:13 lexical ambiguity bla_bla1357
@ 2006-06-02 22:35 ` Frank J. Lhota
  2006-06-03  5:20   ` Jeffrey R. Carter
  2006-06-02 23:27 ` Keith Thompson
  1 sibling, 1 reply; 23+ messages in thread
From: Frank J. Lhota @ 2006-06-02 22:35 UTC (permalink / raw)


The biggest lexical issue with Ada is the multiple uses of the single quote:

- Single quotes surround character literals (e.g. 'A'),
- prefix attributes (for example List'First), and
- are used in aggregates, such as Rational'(Num =>1, Demom => 2).





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-02 22:13 lexical ambiguity bla_bla1357
  2006-06-02 22:35 ` Frank J. Lhota
@ 2006-06-02 23:27 ` Keith Thompson
  1 sibling, 0 replies; 23+ messages in thread
From: Keith Thompson @ 2006-06-02 23:27 UTC (permalink / raw)


bla_bla1357 <bla_bla1357MaknispaM@yahoo.com> writes:
> I'm doing a lexical analysis of Ada using Lex as part of a student project.
> The highlight is on using Lex, not on the programming language of Ada and
> I'm not farmilliar with using Ada. So what I woulkd like to find out is if
> there is any lexical ambiguity in Ada (like the ambiguity in C with the
> unary and binary plus and minus). Thanks in advance...

I suppose it depends on what you mean by "lexical ambiguity".

Strictly speaking, there are no grammatical ambiguities in either
language.  There are plenty of things that look like ambiguities, but
they're all resolved by the rules of the language.

In C, for example, this:
    x+++++y
looks like it could be parsed as
    x ++ + ++ y
which would be a legal expression, but in fact it's tokenized as
    x ++ ++ + y
which results in a syntax error.  (C's typedef names do cause some
interesting lexical problems, but that's another topic.)

Ada, like, C, has unary and binary "+" and "-" operators, but each
operator is easily identified based on the syntactic context in which
it appears.  One well-known case of a near ambiguity is:
    Character'('x')
If Ada followed C's "maximal munch" rule, this would be tokenized
as
    Character '(' x '...
leading to a syntax error; instead, it's tokenized as:
    Character ' ( 'x' )

So, there are no real ambiguities in either language, but each uses
different rules to resolve things that would otherwise have been
ambiguous.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-02 22:35 ` Frank J. Lhota
@ 2006-06-03  5:20   ` Jeffrey R. Carter
  2006-06-04 17:33     ` Frank J. Lhota
  0 siblings, 1 reply; 23+ messages in thread
From: Jeffrey R. Carter @ 2006-06-03  5:20 UTC (permalink / raw)


Frank J. Lhota wrote:
> 
> - are used in aggregates, such as Rational'(Num =>1, Demom => 2).

This is a qualified expression, as is Integer'(I). It just happens that 
the expression is an aggregate. Aggregates themselves don't use the 
apostrophe:

R : Rational := (Num => 1, Denom => 2);

-- 
Jeff Carter
"Why don't you bore a hole in yourself and let the sap run out?"
Horse Feathers
49



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-03  5:20   ` Jeffrey R. Carter
@ 2006-06-04 17:33     ` Frank J. Lhota
  2006-06-05  1:36       ` Jeffrey R. Carter
  0 siblings, 1 reply; 23+ messages in thread
From: Frank J. Lhota @ 2006-06-04 17:33 UTC (permalink / raw)


"Jeffrey R. Carter" <spam.not.jrcarter@acm.not.spam.org> wrote in message 
news:w_8gg.760526$084.110855@attbi_s22...
> Frank J. Lhota wrote:
>>
>> - are used in aggregates, such as Rational'(Num =>1, Demom => 2).
>
> This is a qualified expression, as is Integer'(I). It just happens that 
> the expression is an aggregate. Aggregates themselves don't use the 
> apostrophe:
>
> R : Rational := (Num => 1, Denom => 2);

Yes, of course you're right. The main point is that the multiple uses of 
single quote is the one thing that the Ada lexer needs to be especially 
careful about. Make sure that your lexer can handle the following exression 
properly:

    Foo'(',',',',',' ... ) 





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-04 17:33     ` Frank J. Lhota
@ 2006-06-05  1:36       ` Jeffrey R. Carter
  2006-06-05 18:30         ` Frank J. Lhota
  0 siblings, 1 reply; 23+ messages in thread
From: Jeffrey R. Carter @ 2006-06-05  1:36 UTC (permalink / raw)


Frank J. Lhota wrote:
> 
> Yes, of course you're right. The main point is that the multiple uses of 
> single quote is the one thing that the Ada lexer needs to be especially 
> careful about. Make sure that your lexer can handle the following exression 
> properly:

Yes, but it's good to be precise.

>     Foo'(',',',',',' ... ) 

Clearly you have an evil mind :)

-- 
Jeff Carter
"What I wouldn't give for a large sock with horse manure in it."
Annie Hall
42



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05  1:36       ` Jeffrey R. Carter
@ 2006-06-05 18:30         ` Frank J. Lhota
  2006-06-05 20:27           ` Keith Thompson
  2006-06-05 22:16           ` Jeffrey R. Carter
  0 siblings, 2 replies; 23+ messages in thread
From: Frank J. Lhota @ 2006-06-05 18:30 UTC (permalink / raw)

Jeffrey R. Carter wrote:
> Yes, but it's good to be precise.

Absolutely! I should have said "qualified expression" in my original 
post. Sorry for any confusion that I may have caused.

>>     Foo'(',',',',',' ... ) 
> 
> Clearly you have an evil mind :)

Well, there is a good reason to consider this worst case scenario. I 
have seen quick and dirty Ada lexers that try to determine if a single 
quote starts a character literal by looking ahead 2 character. As this 
scenario shows, this approach is not guaranteed to work.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05 18:30         ` Frank J. Lhota
@ 2006-06-05 20:27           ` Keith Thompson
  2006-06-05 22:11             ` Jeffrey R. Carter
  2006-06-05 22:16           ` Jeffrey R. Carter
  1 sibling, 1 reply; 23+ messages in thread
From: Keith Thompson @ 2006-06-05 20:27 UTC (permalink / raw)


"Frank J. Lhota" <flhota@NOSPAM.ll.mit.edu> writes:
> Jeffrey R. Carter wrote:
>> Yes, but it's good to be precise.
>
> Absolutely! I should have said "qualified expression" in my original
> post. Sorry for any confusion that I may have caused.
>
>>>     Foo'(',',',',',' ... )
>> Clearly you have an evil mind :)
>
> Well, there is a good reason to consider this worst case scenario. I
> have seen quick and dirty Ada lexers that try to determine if a single
> quote starts a character literal by looking ahead 2 character. As this
> scenario shows, this approach is not guaranteed to work.

If I recall correctly, it's sufficient to remember what the previous
token was.  A character literal cannot follow an identifier.

I think that might break down if an implementation chooses to define
an attribute with a single-character name, but I don't remember the
details; presumably no implementation will actually do this.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05 20:27           ` Keith Thompson
@ 2006-06-05 22:11             ` Jeffrey R. Carter
  2006-06-06 10:39               ` Georg Bauhaus
  0 siblings, 1 reply; 23+ messages in thread
From: Jeffrey R. Carter @ 2006-06-05 22:11 UTC (permalink / raw)

Keith Thompson wrote:
> 
> If I recall correctly, it's sufficient to remember what the previous
> token was.  A character literal cannot follow an identifier.

Right, so it must be either an attribute, a qualified expression, or an 
error. An attribute must be an identifier, so it can't be an attribute, 
so it's either a qualified expression or an error. In this case, it's an 
error, since you can't have "..." as part of an aggregate :)

-- 
Jeff Carter
"Nobody expects the Spanish Inquisition!"
Monty Python's Flying Circus
22

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05 18:30         ` Frank J. Lhota
  2006-06-05 20:27           ` Keith Thompson
@ 2006-06-05 22:16           ` Jeffrey R. Carter
  2006-06-06 13:20             ` Frank J. Lhota
  1 sibling, 1 reply; 23+ messages in thread
From: Jeffrey R. Carter @ 2006-06-05 22:16 UTC (permalink / raw)

Frank J. Lhota wrote:
> 
> Well, there is a good reason to consider this worst case scenario. I 
> have seen quick and dirty Ada lexers that try to determine if a single 
> quote starts a character literal by looking ahead 2 character. As this 
> scenario shows, this approach is not guaranteed to work.

That's too simple minded. A character literal can't follow an 
identifier, so this must be either an attribute or a qualified 
expression (presuming it's not an error). Since "(" can't be an 
attribute, it must be a qualified expression. I'm not sure how to parse 
"...", though.

You still have an evil mind, since you didn't include any spaces between 
the components of the aggregate, making it even harder for humans to 
parse (lack of spaces shouldn't make any difference to machine parsing).

-- 
Jeff Carter
"Nobody expects the Spanish Inquisition!"
Monty Python's Flying Circus
22

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05 22:11             ` Jeffrey R. Carter
@ 2006-06-06 10:39               ` Georg Bauhaus
  2006-06-06 11:38                 ` M E Leypold
                                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Georg Bauhaus @ 2006-06-06 10:39 UTC (permalink / raw)


On Mon, 2006-06-05 at 22:11 +0000, Jeffrey R. Carter wrote:
> Keith Thompson wrote:
> > 
> > If I recall correctly, it's sufficient to remember what the previous
> > token was.  A character literal cannot follow an identifier.
> 
> Right, so it must be either an attribute, a qualified expression, or an 
> error. 

Though the previous token shouldn't be a reserved word, as in

 if'('="-"("="('='=',',','=','))


-- Georg 





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 10:39               ` Georg Bauhaus
@ 2006-06-06 11:38                 ` M E Leypold
  2006-06-07  9:02                   ` Dmitry A. Kazakov
                                     ` (2 more replies)
  2006-06-06 13:50                 ` Simon Clubley
  2006-06-06 18:56                 ` Peter C. Chapin
  2 siblings, 3 replies; 23+ messages in thread
From: M E Leypold @ 2006-06-06 11:38 UTC (permalink / raw)



Georg Bauhaus <bauhaus@futureapps.de> writes:

> On Mon, 2006-06-05 at 22:11 +0000, Jeffrey R. Carter wrote:
> > Keith Thompson wrote:
> > > 
> > > If I recall correctly, it's sufficient to remember what the previous
> > > token was.  A character literal cannot follow an identifier.
> > 
> > Right, so it must be either an attribute, a qualified expression, or an 
> > error. 
> 
> Though the previous token shouldn't be a reserved word, as in
> 
>  if'('="-"("="('='=',',','=','))

Or 

   return'a'; 

So now (question to all): Is the following rule enough?

   - "'" is the beginning of a character literal if the token before
     "'" has not been an identifier (reserved words not counted as
     identifier in this case).

Regards -- Markus






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-05 22:16           ` Jeffrey R. Carter
@ 2006-06-06 13:20             ` Frank J. Lhota
  0 siblings, 0 replies; 23+ messages in thread
From: Frank J. Lhota @ 2006-06-06 13:20 UTC (permalink / raw)


Jeffrey R. Carter wrote:
> Frank J. Lhota wrote:
>>
>> Well, there is a good reason to consider this worst case scenario. I 
>> have seen quick and dirty Ada lexers that try to determine if a single 
>> quote starts a character literal by looking ahead 2 character. As this 
>> scenario shows, this approach is not guaranteed to work.
> 
> That's too simple minded. A character literal can't follow an 
> identifier, so this must be either an attribute or a qualified 
> expression (presuming it's not an error). Since "(" can't be an 
> attribute, it must be a qualified expression. I'm not sure how to parse 
> "...", though.

That is precisely my point: the character look-ahead is too simple 
minded. As you and other posters have pointed out, if we simply keep 
track of the last token, we can use that information to determine how to 
handle the single quote.

> You still have an evil mind, since you didn't include any spaces between 
> the components of the aggregate, making it even harder for humans to 
> parse (lack of spaces shouldn't make any difference to machine parsing).

This example was to illustrate a worst case scenario for an Ada lexer. 
It was *not* presented as an example of recommended programming style, 
which it clearly is not.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 10:39               ` Georg Bauhaus
  2006-06-06 11:38                 ` M E Leypold
@ 2006-06-06 13:50                 ` Simon Clubley
  2006-06-06 18:56                 ` Peter C. Chapin
  2 siblings, 0 replies; 23+ messages in thread
From: Simon Clubley @ 2006-06-06 13:50 UTC (permalink / raw)


In article <1149590366.8521.5.camel@localhost>, Georg Bauhaus <bauhaus@futureapps.de> writes:
> 
> Though the previous token shouldn't be a reserved word, as in
> 
>  if'('="-"("="('='=',',','=','))
> 

Hmmm. :-)

Perhaps somebody should run a "Obfuscated Ada" contest...

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
If Google's motto is "don't be evil", then how did we get Google Groups 2 ?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 10:39               ` Georg Bauhaus
  2006-06-06 11:38                 ` M E Leypold
  2006-06-06 13:50                 ` Simon Clubley
@ 2006-06-06 18:56                 ` Peter C. Chapin
  2006-06-06 19:41                   ` Georg Bauhaus
  2 siblings, 1 reply; 23+ messages in thread
From: Peter C. Chapin @ 2006-06-06 18:56 UTC (permalink / raw)


Georg Bauhaus <bauhaus@futureapps.de> wrote in 
news:1149590366.8521.5.camel@localhost:

>  if'('="-"("="('='=',',','=','))

Now *that* is evil. :-)

Peter



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 18:56                 ` Peter C. Chapin
@ 2006-06-06 19:41                   ` Georg Bauhaus
  0 siblings, 0 replies; 23+ messages in thread
From: Georg Bauhaus @ 2006-06-06 19:41 UTC (permalink / raw)

On Tue, 2006-06-06 at 18:56 +0000, Peter C. Chapin wrote:
> Georg Bauhaus <bauhaus@futureapps.de> wrote in 
> news:1149590366.8521.5.camel@localhost:
> 
> >  if'('="-"("="('='=',',','=','))
> 
> Now *that* is evil. :-)

;)

When it came to the tick mark in ASnip's tokenizer,
I had to consider the case when there isn't a token at
which to look back (a source snippet might well
start with 'x'). So the solution isn't perfect.

I should add another piece of history, for better
classing of tokens where possible.

-- Georg 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 11:38                 ` M E Leypold
@ 2006-06-07  9:02                   ` Dmitry A. Kazakov
  2006-06-07 13:15                   ` Georg Bauhaus
  2006-06-07 14:49                   ` Robert A Duff
  2 siblings, 0 replies; 23+ messages in thread
From: Dmitry A. Kazakov @ 2006-06-07  9:02 UTC (permalink / raw)


On 06 Jun 2006 13:38:06 +0200, M E Leypold wrote:

> Georg Bauhaus <bauhaus@futureapps.de> writes:
> 
>> On Mon, 2006-06-05 at 22:11 +0000, Jeffrey R. Carter wrote:
>>> Keith Thompson wrote:
>>> > 
>>> > If I recall correctly, it's sufficient to remember what the previous
>>> > token was.  A character literal cannot follow an identifier.
>>> 
>>> Right, so it must be either an attribute, a qualified expression, or an 
>>> error. 
>> 
>> Though the previous token shouldn't be a reserved word, as in
>> 
>>  if'('="-"("="('='=',',','=','))
> 
> Or 
> 
>    return'a'; 
> 
> So now (question to all): Is the following rule enough?
> 
>    - "'" is the beginning of a character literal if the token before
>      "'" has not been an identifier (reserved words not counted as
>      identifier in this case).

It does not differ from the case of +/-. In the infix context, i.e. after
an operand (whatever it might be), ' is an infix operation as well as +/-.
In the prefix context, where an operand is expected ' introduces a
character literal (=operand), +/- do an unary prefix operation.

Your rule is wrong: 'A' and 'B'. "and" is a reserved word. Then of course
"..." comments should be parsed before. Which gives you a nice vicious
circle around ' " ' and " ' ". (:-))

The bottom line: parsing has state.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 11:38                 ` M E Leypold
  2006-06-07  9:02                   ` Dmitry A. Kazakov
@ 2006-06-07 13:15                   ` Georg Bauhaus
  2006-06-07 14:49                   ` Robert A Duff
  2 siblings, 0 replies; 23+ messages in thread
From: Georg Bauhaus @ 2006-06-07 13:15 UTC (permalink / raw)


On Tue, 2006-06-06 at 13:38 +0200, M E Leypold wrote:


>    - "'" is the beginning of a character literal if the token before
>      "'" has not been an identifier (reserved words not counted as
>      identifier in this case).

You could change the words of the rule slightly be considering
   '''

Georg 





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-06 11:38                 ` M E Leypold
  2006-06-07  9:02                   ` Dmitry A. Kazakov
  2006-06-07 13:15                   ` Georg Bauhaus
@ 2006-06-07 14:49                   ` Robert A Duff
  2006-06-07 17:18                     ` M E Leypold
  2 siblings, 1 reply; 23+ messages in thread
From: Robert A Duff @ 2006-06-07 14:49 UTC (permalink / raw)


M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:

> So now (question to all): Is the following rule enough?
> 
>    - "'" is the beginning of a character literal if the token before
>      "'" has not been an identifier (reserved words not counted as
>      identifier in this case).

Not quite:

    function F(X: Integer) return String;

    Length: constant Natural := F(123)'Length;

    Y: access T'Class := ...;
    Z: access T2'Class := Y.all'Access;

For reserved words, I think you have to study the grammar, and determine
which ones can precede a tick mark.

- Bob



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-07 14:49                   ` Robert A Duff
@ 2006-06-07 17:18                     ` M E Leypold
  2006-06-08 21:30                       ` Robert A Duff
                                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: M E Leypold @ 2006-06-07 17:18 UTC (permalink / raw)

Robert A Duff <bobduff@shell01.TheWorld.com> writes:

> M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:
> 
> > So now (question to all): Is the following rule enough?
> > 
> >    - "'" is the beginning of a character literal if the token before
> >      "'" has not been an identifier (reserved words not counted as
> >      identifier in this case).
> 
> Not quite:
> 
>     function F(X: Integer) return String;
> 
>     Length: constant Natural := F(123)'Length;

Ouch. 

OK. First a message to Dmitry A. Kazakov and Georg Bauhaus: Sorry, I
did neither understand all of what you said nor the exact
implications. But Thanks!

Than: The original poster asked a question about 'lexical
ambiguity'. The ensuing diskussions leaves me more and more doubtful:
Can lexical anlysis (grouping characters to tokens and grammatical
analysis (building a parse tree from a token sequence) be separated
cleanly in Ada?

My first approach would have been (no I'm not implementing an Ada
parser, but since compiler construction has been a favorite subject of
me for a number of years, I'm a bit curious about the position of Ada
in all this) -- now: My first approach would have been, to write a
lexer with a minimal amount of state. It would shift into
collect-string state when encountering a '"' (I mean a double quote
:-) and into especially into maybe-now-comes-a-character-literal state
at certain points. My first take was that the "certain points" are
always after identifiers. In view of the case quoted above
(F(123)'Length) I could amend this rule by adding ')' to the certain
points.

But now things become rather ad-hoc. Well -- as I said, that it's just
curiosity driving me, so I'm not going now to examine the RM not I'm
going to reverse engineer GNAT to find out how it is done in reality.

But if anyone in c.l.a. has the answer to the following questions, I'd
be eternally grateful. Well, grateful, anyway. :-)

  - Is it possible (for Ada parsers) to separate lexical analysis and
    grammatical analysis into seperate phases without tricky feedback
    from parser to lexer, possibly by using a lexer with a finite
    amount of states.

  - What is the complete rule for deciding when the next token might
    be a character literal. Or is that undecidable by just looking on
    past input (i.e. using lexer state)?

BTW: The "evil" case 

    if'('="-"("="('='=',',','=','))

is not parsed ok by syntax highligting in emacs ada-mode (I wouldn't
have expected it, actually). The rule there seems to be my incomplete
rule without the reserved words exception. Everything falls magically
into place if a " " is inserted immediately after "if".

> 
>     Y: access T'Class := ...;
>     Z: access T2'Class := Y.all'Access;
> 
> For reserved words, I think you have to study the grammar, and determine
> which ones can precede a tick mark.

OK. That I understand now. 

Regards -- Markus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-07 17:18                     ` M E Leypold
@ 2006-06-08 21:30                       ` Robert A Duff
  2006-06-09  4:41                       ` Jeffrey R. Carter
  2006-06-09  8:23                       ` Georg Bauhaus
  2 siblings, 0 replies; 23+ messages in thread
From: Robert A Duff @ 2006-06-08 21:30 UTC (permalink / raw)

M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:

> Robert A Duff <bobduff@shell01.TheWorld.com> writes:
> 
> > M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:
> > 
> > > So now (question to all): Is the following rule enough?
> > > 
> > >    - "'" is the beginning of a character literal if the token before
> > >      "'" has not been an identifier (reserved words not counted as
> > >      identifier in this case).
> > 
> > Not quite:
> > 
> >     function F(X: Integer) return String;
> > 
> >     Length: constant Natural := F(123)'Length;
> 
> Ouch. 

It's not a BIG ouch.  To determine whether a single quote begins a
character literal versus a tick, it is sufficient to look back one
token.  Some tokens can be followed by a tick, some by a char_lit,
and some by neither.  None can be followed by both.  It's fairly
straightforward to study the grammar and determine which are which.
Or look at the GNAT sources.

It might be wise to include a sentinel token at the start of the token
stream (Begin_File_Token or whatever), just in case ' comes first
(that would be illegal, but you don't want to crash on it).

It can all be done in the lexer, with no feedback from the parser -- the
lexer just needs to keep track of the previous token, and check it when
it sees a single quote.  Lookahead will get you in trouble; look-back
is the better answer here.

> OK. First a message to Dmitry A. Kazakov and Georg Bauhaus: Sorry, I
> did neither understand all of what you said nor the exact
> implications. But Thanks!

I didn't entirely understand that, either.

> Than: The original poster asked a question about 'lexical
> ambiguity'. The ensuing diskussions leaves me more and more doubtful:
> Can lexical anlysis (grouping characters to tokens and grammatical
> analysis (building a parse tree from a token sequence) be separated
> cleanly in Ada?

Yes.  The look-back is localized to the lexer (which is not "clean", but
at least it's localized (separated from the parser)).

> My first approach would have been (no I'm not implementing an Ada
> parser, but since compiler construction has been a favorite subject of
> me for a number of years, I'm a bit curious about the position of Ada
> in all this) -- now: My first approach would have been, to write a
> lexer with a minimal amount of state. It would shift into
> collect-string state when encountering a '"' (I mean a double quote
> :-) and into especially into maybe-now-comes-a-character-literal state
> at certain points. My first take was that the "certain points" are
> always after identifiers. In view of the case quoted above
> (F(123)'Length) I could amend this rule by adding ')' to the certain
> points.

Right.  But you have to study the grammar to know which tokens have this
property.  It's not that big of a deal.

> But now things become rather ad-hoc. Well -- as I said, that it's just
> curiosity driving me, so I'm not going now to examine the RM not I'm
> going to reverse engineer GNAT to find out how it is done in reality.
> 
> But if anyone in c.l.a. has the answer to the following questions, I'd
> be eternally grateful. Well, grateful, anyway. :-)
> 
>   - Is it possible (for Ada parsers) to separate lexical analysis and
>     grammatical analysis into seperate phases without tricky feedback
>     from parser to lexer, possibly by using a lexer with a finite
>     amount of states.

Yes.  Just a tiny bit of state -- the previous token.  The lexer writer
needs to understand the grammar, but the lexer does not need to
understand the parser.

>   - What is the complete rule for deciding when the next token might
>     be a character literal. Or is that undecidable by just looking on
>     past input (i.e. using lexer state)?

It is decidable by looking at the previous token.  I forget the exact
rule, but it can be deduced easily from the grammar.

> BTW: The "evil" case 
> 
>     if'('="-"("="('='=',',','=','))
> 
> is not parsed ok by syntax highligting in emacs ada-mode (I wouldn't
> have expected it, actually). The rule there seems to be my incomplete
> rule without the reserved words exception. Everything falls magically
> into place if a " " is inserted immediately after "if".

I'm not surprised.  Emacs ada-mode uses some ad-hoc technique that
doesn't always work properly.  Anyway, Emacs is trying to parse bits and
pieces of things without seeing the whole file, and that's a whole
'nother thing.  It is certainly easy to parse the above "evil" thing
properly, but not necessarily if you start in the middle of it.

> >     Y: access T'Class := ...;
> >     Z: access T2'Class := Y.all'Access;
> > 
> > For reserved words, I think you have to study the grammar, and determine
> > which ones can precede a tick mark.
> 
> OK. That I understand now. 
> 
> Regards -- Markus

- Bob

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-07 17:18                     ` M E Leypold
  2006-06-08 21:30                       ` Robert A Duff
@ 2006-06-09  4:41                       ` Jeffrey R. Carter
  2006-06-09  8:23                       ` Georg Bauhaus
  2 siblings, 0 replies; 23+ messages in thread
From: Jeffrey R. Carter @ 2006-06-09  4:41 UTC (permalink / raw)


M E Leypold wrote:
> 
> But now things become rather ad-hoc. Well -- as I said, that it's just
> curiosity driving me, so I'm not going now to examine the RM not I'm
> going to reverse engineer GNAT to find out how it is done in reality.

You don't need to reverse engineer it.  The sources are freely available.

-- 
Jeff Carter
"Run away! Run away!"
Monty Python and the Holy Grail
58



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: lexical ambiguity
  2006-06-07 17:18                     ` M E Leypold
  2006-06-08 21:30                       ` Robert A Duff
  2006-06-09  4:41                       ` Jeffrey R. Carter
@ 2006-06-09  8:23                       ` Georg Bauhaus
  2 siblings, 0 replies; 23+ messages in thread
From: Georg Bauhaus @ 2006-06-09  8:23 UTC (permalink / raw)


M E Leypold wrote:

>> M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:
>>
>>> So now (question to all): Is the following rule enough?
>>>
>>>    - "'" is the beginning of a character literal if the token before
>>>      "'" has not been an identifier (reserved words not counted as
>>>      identifier in this case).

> OK. First a message to Dmitry A. Kazakov and Georg Bauhaus: Sorry, I
> did neither understand all of what you said nor the exact
> implications. But Thanks!

Just a sloppy remark that in ''' the second single quote
isn't the beginning of a character literal even though the
token before it has not been an identifier. Just another case I
could think of.



Georg 



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-06-09  8:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-02 22:13 lexical ambiguity bla_bla1357
2006-06-02 22:35 ` Frank J. Lhota
2006-06-03  5:20   ` Jeffrey R. Carter
2006-06-04 17:33     ` Frank J. Lhota
2006-06-05  1:36       ` Jeffrey R. Carter
2006-06-05 18:30         ` Frank J. Lhota
2006-06-05 20:27           ` Keith Thompson
2006-06-05 22:11             ` Jeffrey R. Carter
2006-06-06 10:39               ` Georg Bauhaus
2006-06-06 11:38                 ` M E Leypold
2006-06-07  9:02                   ` Dmitry A. Kazakov
2006-06-07 13:15                   ` Georg Bauhaus
2006-06-07 14:49                   ` Robert A Duff
2006-06-07 17:18                     ` M E Leypold
2006-06-08 21:30                       ` Robert A Duff
2006-06-09  4:41                       ` Jeffrey R. Carter
2006-06-09  8:23                       ` Georg Bauhaus
2006-06-06 13:50                 ` Simon Clubley
2006-06-06 18:56                 ` Peter C. Chapin
2006-06-06 19:41                   ` Georg Bauhaus
2006-06-05 22:16           ` Jeffrey R. Carter
2006-06-06 13:20             ` Frank J. Lhota
2006-06-02 23:27 ` Keith Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox