From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Niklas Holsti Newsgroups: comp.lang.ada Subject: Re: Interesting article on ARG work Date: Tue, 10 Apr 2018 00:17:39 +0300 Organization: Tidorum Ltd Message-ID: References: <1b44444f-c1b3-414e-84fb-8798961487c3@googlegroups.com> <62ee0aac-49da-4925-b9aa-a16695b3fc45@googlegroups.com> <9879872e-c18a-4667-afe5-41ce0f54559f@googlegroups.com> <80db2d05-744f-4201-ba1b-4436f8040491@googlegroups.com> <59f9ab6d-d6ba-45ff-a6f0-c5699983d9e8@googlegroups.com> <1a390e22-f49f-4028-8e58-ca4d0f51e4b6@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: individual.net SkwMrQmV2zk5Sy2mbuwdpQzL2GJ2WaCFa6/UNbv993tZi/oIxT Cancel-Lock: sha1:cPpd/LVVbkBY788om9ki8hleIkQ= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 In-Reply-To: Xref: reader02.eternal-september.org comp.lang.ada:51425 Date: 2018-04-10T00:17:39+03:00 List-Id: On 18-04-09 21:08 , Dan'l Miller wrote: > On Monday, April 9, 2018 at 11:12:56 AM UTC-5, Niklas Holsti wrote: >> On 18-04-08 22:48 , Dan'l Miller wrote: ... but the GLR parser >> would apply this nondeterministic choice separately at each >> occurrence of a "@", so there would be an exponentially large set >> of alternative parse attempts. > > Good Lord! What an especially inappropriate design to intentionally > cause combinatorial explosion. How kind you are. That is AIUI what plain GLR would result in, if one does not extend GLR with something more. > The following URL's design would be wiser and practical > (e.g., where @ is isomorphic to #if DEBUG in the C preprocessor and > thus exactly one CONFIG…X in this paper). The reference Dan'l introduced is: [1] https://cs.nyu.edu/rgrimm/papers/pldi12.pdf Quote from that paper, describing the SuperC parsing method, called Fork-Merge LR: "our work is inspired by GLR parsing ... which also forks and merges subparsers. But whereas GLR parsers match different productions to the same input fragment, FMLR matches the same production to different input fragments. Furthermore, unlike GLR and TypeChef, FMLR parsers can reuse existing LR grammars and parser table generators; only the parser engine is new." It seems to be rather different from ordinary GLR. > The reduced-production stack will temporarily bifurcate into exactly > 2 parallel reduced-production stacks (and have exactly 2 alternative > AST subtrees extant concurrently for the affected lines of source code). If there are 2 alternative AST subtrees at each choice point, the total number of possible complete ASTs (which is what the later stages of an ordinary compiler would have to process) is exponential in the number of choice points (that is, "@" instances) -- assuming that the alternatives can be chosen independently, which is what plain GLR would give, as I understand it. > Then those 2 bifurcations will be > merged again into a traditional LR reduced-production stack upon lack > of an @ prefix after a contiguous sequence of @-code lines. I didn't claim that the parser would take exponential resources (although it may happen in [1]) just that the number of possible parses (ASTs) is exponential. If the parser produces an "AST-with-alternatives" structure with alternative subtrees at N points, this "AST-with-alternatives" structure itself is not exponential in size, but it represents 2**N ordinary ASTs. Assuming, again, that we are allowed to choose any combination of the N alternative subtrees for producing the ordinary ASTs from the "AST-with-alternatives". > Niklas, you seem to be > thinking that @ would appear in the GLRed BNF. No, it would not, at > least not in an exemplary implementation. In GLR, the origin of the > ambiguity can be from a variety of origins: codified in the grammar > itself, I suppose that means nondeterministic (ambiguous) productions, which is what I was talking of. With "@", or the equivalent, visible in the grammar, to make it ambiguous where a choice between alternatives is wanted. > dynamically injected by the lexical-analysis layer, The parser sees what the lexer produces. If the lexer produces an "@", or the equivalent "#ifdef" or whatever, then that must be in the grammar in order to be parsed. If the lexer somehow compels the parser to fork or join at certain points in the token stream, this can just as well be described formally as inserting a special token or token sequence in the stream. In [1] it seems that the lexer and preprocessor stages produce C source which still contains #ifdef-#else-#endif structures, forcing the parser to fork at the #ifdef/#else, and join/merge at the #endif. > a programmer's unwisely-excessively-deeply nested > conditional-compilation via a preprocessor, and so forth. Preprocessing is not parsing. Ref. [1] indeed describes extensive automatic restructuring of the source code, mostly at the lexer and preprocessor levels, before the actual parsing. One of the automatic transformations applied in [1] hoists #ifdef conditionals to a level where their branches are complete C syntactical "constructs". For example, in this original source code the #ifdef conditional part is not a complete C statement: #ifdef A if (cond) x = y; else #endif x = z; In [1], this code is transformed (by the parser, it is said) into an equivalent form where the ifdef-controlled alternatives are C statements: #ifdef A if (cond) x = y; else x = z; #else x = z; #endif It seems to me that this is similar to what Dmitry is suggesting, and that Dmitry would require programmers to write the second form rather than the first, although it is longer and has some duplicated code. For readers unused to #ifdef's (such as I) the second form is clearer. The essential point in which [1] departs from basic GLR is that [1] produces an AST-with-alternatives structure where each alternative is labelled with the logical condition (#ifdef condition) for choosing this alternative. If we assume, as in [1], that the "configuration variables" in the #ifdef conditions are constant (that is, have the same value at all choice points in the AST-with-alternatives), then the alternative subtrees chosen at the various choice points are correlated, and we no longer have an exponential number of ASTs (unless there are exponentially many combinations of control-variable values, which may happen). This is exactly the kind of "uniform choice" principle that I spoke of, in my preceding post, as a necessary extension of GLR, for this job. Thus I think [1] reinforces the point I made there. Note that [1] only claims to *parse* C in an "all configurations at once" manner. It explicitly excludes *compilation* from its scope. The parser in [1] does do a little semantic analysis (classifying identifiers as "typdef" or "other"), but it certainly cannot guarantee that a successfully parsed C program will also compile successfully in some configuration, let alone in all configurations. For an Ada "conditional compilation" facility, one would like to have the latter guarantee, or at least a guarantee of legality in all configurations. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .