From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: OpenToken: Parsing Ada (subset)? References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk> <85twupvjxo.fsf@stephe-leake.org> <81ceb070-16fe-4578-a09a-eb11a2bbb664@googlegroups.com> <162zj7c2l0ykp$.1rxias18vby83.dlg@40tude.net> <856172bk80.fsf@stephe-leake.org> <26ccc147-7a15-48d7-8808-3248edfbf433@googlegroups.com> <85k2v3aeyv.fsf@stephe-leake.org> Date: Wed, 17 Jun 2015 12:58:03 -0500 Message-ID: <85h9q68bf8.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (windows-nt) Cancel-Lock: sha1:oJB48IVPpBpUlnQ02OAXQPWQ9bs= MIME-Version: 1.0 Content-Type: text/plain X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: 3d18c5581b52de97f808413396 X-Received-Bytes: 4388 X-Received-Body-CRC: 3561068680 Xref: news.eternal-september.org comp.lang.ada:26362 Date: 2015-06-17T12:58:03-05:00 List-Id: "Randy Brukardt" writes: > "Stephen Leake" wrote in message > news:85k2v3aeyv.fsf@stephe-leake.org... >> One way to handle this is to provide for feedback from the parser to the >> lexer; if a parse fails, push back the character literal, tell the lexer >> to treat the first single quote as a TICK, and procede. I'll work on >> implementing that in FastToken with the Aflex lexer; it will be a good >> example. >> >> Another way is to treat this particular sequence of tokens as a valid >> expression, but rewrite it before handing off to the rest of the parser. >> That requires identifying all such special cases; not too hard. >> >> A third choice is to not define a CHARACTER_LITERAL token; then the >> sequence of tokens is always >> >> IDENTIFIER TICK LEFT_PAREN TICK IDENTIFIER TICK RIGHT_PAREN >> >> and the parser must identify the character literal, or the grammar must >> be re-written in the same manner. That may be the simplest solution. >> >> If I recall correctly, this issue has been discussed here before, and >> the proposed solutions were similar. I don't know how GNAT handles this. > > I don't think you identified the solution that is typically used: remember > the previous token identified by the lexer. Then, when encountering an > apostrophe, the token is unconditionally an apostrophe if the preceding > token is "all", an identifier, a character or string literal, or an rparen; > else it might be a character literal. That's the third choice above; the lexer returns TICK (= apostrophe) for all cases, and the parser deals with further classification. Hmm. Unless you are saying that logic is in the lexer; I don't see that it matters much. Aflex does have a provision for adding some logic in a lexer, although I'm not sure it supports "remember the previous token". > No "feedback from the parser" needed > (that seems like a nightmare to me). The method was originally proposed by > Tischler in Ada Letters in July 1983, pg 36. (I got this out of the comments > of Janus/Ada, of course.) > > I tend to agree with Dmitry; for lexing Ada, regular expressions are just > not going to work; you'll need too many fixups to make them worth the > trouble. Just write the thing in Ada, it won't take you any longer than > figuring out the correct regular expression for an identifier. And that > makes it easy to handle the weird special cases of Ada. > > Other approaches are going to lex some programs incorrectly; how important > that is will vary depending on what kind of tool you are writing but since > the effort is similar, it's hard to see the advantage of a regular > expression or other "automatic" lexer. (It makes much more sense for a > parser, where the effort can be orders of magnitude different.) Ok. I guess I'd like to see some actual examples of hand-written lexers. The one in OpenToken is not inspiring to me; that's why I got rid of it for FastToken (it's definitely easier for me to write regexps than to write another OpenToken recognizer (= lexer module)). I have looked briefly at the GNAT lexer. It is highly optimized, and is apparently generated from some SNOBOL sources (ie, _not_ "hand written"). For example, it uses nested if-then-else on each character of each keyword; not something you want to do by hand. -- -- Stephe