From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,1f96acbbf1e7e66a
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Newsgroups: comp.lang.ada
Subject: Re: lexical ambiguity
References: <1nozvv83n7lhc.1b3qf0olmyllp$.dlg@40tude.net>
 <n-6dnQKIUdPzIB3Z4p2dnA@rcn.net> <w_8gg.760526$084.110855@attbi_s22>
 <z6ydnZgK5-jrhB7ZnZ2dneKdnZydnZ2d@rcn.net>
 <nULgg.1005626$xm3.320354@attbi_s21>
 <9M_gg.1598$O5.554@llslave.llan.ll.mit.edu> <lnirnfnyao.fsf@nuthaus.mib.org>
 <t_1hg.764258$084.649755@attbi_s22> <1149590366.8521.5.camel@localhost>
 <tz1wu2tsyp.fsf@hod.lan.m-e-leypold.de>
 <wccd5dl6mx0.fsf@shell01.TheWorld.com>
From: M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de>
Date: 07 Jun 2006 19:18:57 +0200
Message-ID: <euirncvq7y.fsf@hod.lan.m-e-leypold.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Some cool user agent (SCUG)
NNTP-Posting-Host: 88.72.239.162
X-Trace: news.arcor-ip.de 1149700413 88.72.239.162 (7 Jun 2006 19:13:33 +0200)
X-Complaints-To: abuse@arcor-ip.de
Path: 
 g2news2.google.com!news2.google.com!news.germany.com!news.unit0.net!newsfeed.arcor-ip.de!news.arcor-ip.de!not-for-mail
Xref: g2news2.google.com comp.lang.ada:4713
Date: 2006-06-07T19:18:57+02:00
List-Id: <comp.lang.ada>


Robert A Duff <bobduff@shell01.TheWorld.com> writes:

> M E Leypold <development-2006-8ecbb5cc8a-REMOVETHIS@m-e-leypold.de> writes:
> 
> > So now (question to all): Is the following rule enough?
> > 
> >    - "'" is the beginning of a character literal if the token before
> >      "'" has not been an identifier (reserved words not counted as
> >      identifier in this case).
> 
> Not quite:
> 
>     function F(X: Integer) return String;
> 
>     Length: constant Natural := F(123)'Length;

Ouch. 

OK. First a message to Dmitry A. Kazakov and Georg Bauhaus: Sorry, I
did neither understand all of what you said nor the exact
implications. But Thanks!

Than: The original poster asked a question about 'lexical
ambiguity'. The ensuing diskussions leaves me more and more doubtful:
Can lexical anlysis (grouping characters to tokens and grammatical
analysis (building a parse tree from a token sequence) be separated
cleanly in Ada?

My first approach would have been (no I'm not implementing an Ada
parser, but since compiler construction has been a favorite subject of
me for a number of years, I'm a bit curious about the position of Ada
in all this) -- now: My first approach would have been, to write a
lexer with a minimal amount of state. It would shift into
collect-string state when encountering a '"' (I mean a double quote
:-) and into especially into maybe-now-comes-a-character-literal state
at certain points. My first take was that the "certain points" are
always after identifiers. In view of the case quoted above
(F(123)'Length) I could amend this rule by adding ')' to the certain
points.

But now things become rather ad-hoc. Well -- as I said, that it's just
curiosity driving me, so I'm not going now to examine the RM not I'm
going to reverse engineer GNAT to find out how it is done in reality.

But if anyone in c.l.a. has the answer to the following questions, I'd
be eternally grateful. Well, grateful, anyway. :-)

  - Is it possible (for Ada parsers) to separate lexical analysis and
    grammatical analysis into seperate phases without tricky feedback
    from parser to lexer, possibly by using a lexer with a finite
    amount of states.

  - What is the complete rule for deciding when the next token might
    be a character literal. Or is that undecidable by just looking on
    past input (i.e. using lexer state)?


BTW: The "evil" case 

    if'('="-"("="('='=',',','=','))

is not parsed ok by syntax highligting in emacs ada-mode (I wouldn't
have expected it, actually). The rule there seems to be my incomplete
rule without the reserved words exception. Everything falls magically
into place if a " " is inserted immediately after "if".

> 
>     Y: access T'Class := ...;
>     Z: access T2'Class := Y.all'Access;
> 
> For reserved words, I think you have to study the grammar, and determine
> which ones can precede a tick mark.

OK. That I understand now. 

Regards -- Markus