From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: OpenToken: Parsing Ada (subset)? Date: Wed, 17 Jun 2015 21:03:59 +0200 Organization: cbb software GmbH Message-ID: <1ucuzb8jv2ibe.4awaxtp8eab6.dlg@40tude.net> References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk> <85twupvjxo.fsf@stephe-leake.org> <81ceb070-16fe-4578-a09a-eb11a2bbb664@googlegroups.com> <162zj7c2l0ykp$.1rxias18vby83.dlg@40tude.net> <856172bk80.fsf@stephe-leake.org> <1ljiyuuchbxvp.wrtbilkw3rdb.dlg@40tude.net> <85pp4vakmy.fsf@stephe-leake.org> <1a08qrccls0bi$.16y7q3hosklae.dlg@40tude.net> <85twu68cqb.fsf@stephe-leake.org> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: evoS9sCOdnHjo0GRLLMU1Q.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news.eternal-september.org comp.lang.ada:26364 Date: 2015-06-17T21:03:59+02:00 List-Id: On Wed, 17 Jun 2015 12:29:48 -0500, Stephen Leake wrote: > "Dmitry A. Kazakov" writes: > >> On Tue, 16 Jun 2015 07:43:49 -0500, Stephen Leake wrote: >> >>> "Dmitry A. Kazakov" writes: >>> >>> Here's the regular expression I use for Ada numeric literals: >>> >>> "\([0-9]+#\)?[-+0-9a-fA-F.]+\(#\)?" >>> >>> Given that you are at least a little familiar with regular expressions, >>> there's nothing hard about that. >> >> It is hard. > > Ok, I gather you are not "at least a little familiar with regular > expressions". *Nobody* is familiar with to be sure that the language generated by the pattern like above is one of the Ada numeric literal. Note "like", because your pattern obviously does not generate the Ada numeric literal. The things are actually much worse that complexity. It is a combination of complexity and weakness. Regular expressions cannot do stuff like Ada literals. Thus patterns actually used are only approximations to what is required. The designer must know how the generated language differ from the required one. And the reader must read not only the program but also the mind of pattern designer. >>> It does not enforce all the lexical rules for numbers; it allows >>> repeated, leading, and trailing underscores; it doesn't enforce pairs of >>> '#'. >> >> That is exactly the point. It does not parse literal right > > It's _not_ a "parser"; it's a "lexer". > > Define "right". def Right: No false positives, no false negatives <=> Rejects only illegal literals, accepts only legal literals. >The line between lexer and parser is a design decision, > not set in stone. True, but we are not talking about higher-level things like maximum fraction length supported. Simple lexical stuff like: - matching '#'s - non-repeating '_'s - valid base number - the set of digits corresponding to the base etc all are beyond the power of regular expressions. (Unlike SNOBOL patterns) >> and you have to >> reparse the matched chunk of text once again. What was the gain? > > Doing it this way allows reusing a regexp engine, which is easier than > writing a lexer from scatch. You still have to parse it again. Also with or without regular expression you have to do it. The only difference is in detecting the end of the lexeme. Not a problem for manually written scanner at all. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de