From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Newsgroups: comp.lang.ada
Subject: Re: Interesting article on ARG work
Date: Tue, 10 Apr 2018 00:17:39 +0300
Organization: Tidorum Ltd
Message-ID: <fj23jkFfushU1@mid.individual.net>
References: <p9s890$mmt$1@franka.jacob-sparre.dk>
 <1b44444f-c1b3-414e-84fb-8798961487c3@googlegroups.com>
 <62ee0aac-49da-4925-b9aa-a16695b3fc45@googlegroups.com>
 <pa0vga$rbe$1@franka.jacob-sparre.dk>
 <9879872e-c18a-4667-afe5-41ce0f54559f@googlegroups.com>
 <pa3jmj$9n3$1@franka.jacob-sparre.dk> <pa4ip3$1ton$1@gioia.aioe.org>
 <pa67ci$t34$1@franka.jacob-sparre.dk> <pa77nc$1pu1$1@gioia.aioe.org>
 <pa9a6q$ii$1@franka.jacob-sparre.dk> <paa5gk$knn$1@gioia.aioe.org>
 <80db2d05-744f-4201-ba1b-4436f8040491@googlegroups.com>
 <paaptu$1pce$1@gioia.aioe.org>
 <59f9ab6d-d6ba-45ff-a6f0-c5699983d9e8@googlegroups.com>
 <pachc4$mkr$1@gioia.aioe.org>
 <1a390e22-f49f-4028-8e58-ca4d0f51e4b6@googlegroups.com>
 <fj1ho6Fbv06U1@mid.individual.net>
 <b1f76a33-a41b-46bf-a334-f8c8fbbbbb24@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net SkwMrQmV2zk5Sy2mbuwdpQzL2GJ2WaCFa6/UNbv993tZi/oIxT
Cancel-Lock: sha1:cPpd/LVVbkBY788om9ki8hleIkQ=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:45.0)
 Gecko/20100101 Thunderbird/45.8.0
In-Reply-To: <b1f76a33-a41b-46bf-a334-f8c8fbbbbb24@googlegroups.com>
Xref: reader02.eternal-september.org comp.lang.ada:51425
Date: 2018-04-10T00:17:39+03:00
List-Id: <comp.lang.ada>

On 18-04-09 21:08 , Dan'l Miller wrote:
> On Monday, April 9, 2018 at 11:12:56 AM UTC-5, Niklas Holsti wrote:
>> On 18-04-08 22:48 , Dan'l Miller wrote: ... but the GLR parser
>> would apply this nondeterministic choice separately at each
>> occurrence of a "@", so there would be an exponentially large set
>> of alternative parse attempts.
>
> Good Lord!  What an especially inappropriate design to intentionally
> cause combinatorial explosion.

How kind you are.

That is AIUI what plain GLR would result in, if one does not extend GLR 
with something more.

> The following URL's design would be wiser and practical
> (e.g., where @ is isomorphic to #if DEBUG in the C preprocessor and
> thus exactly one CONFIG…X in this paper).

The reference Dan'l introduced is:

   [1] https://cs.nyu.edu/rgrimm/papers/pldi12.pdf

Quote from that paper, describing the SuperC parsing method, called 
Fork-Merge LR: "our work is inspired by GLR parsing ... which also forks 
and merges subparsers. But whereas GLR parsers match different 
productions to the same input fragment, FMLR matches the same 
production to different input fragments. Furthermore, unlike GLR and 
TypeChef, FMLR parsers can reuse existing LR grammars and parser table 
generators; only the parser engine is new."

It seems to be rather different from ordinary GLR.

> The reduced-production stack will temporarily bifurcate into exactly
> 2 parallel reduced-production stacks (and have exactly 2 alternative
> AST subtrees extant concurrently for the affected lines of source code).

If there are 2 alternative AST subtrees at each choice point, the total 
number of possible complete ASTs (which is what the later stages of an 
ordinary compiler would have to process) is exponential in the number of 
choice points (that is, "@" instances) -- assuming that the alternatives 
can be chosen independently, which is what plain GLR would give, as I 
understand it.

> Then those 2 bifurcations will be
> merged again into a traditional LR reduced-production stack upon lack
> of an @ prefix after a contiguous sequence of @-code lines.

I didn't claim that the parser would take exponential resources 
(although it may happen in [1]) just that the number of possible parses 
(ASTs) is exponential.

If the parser produces an "AST-with-alternatives" structure with 
alternative subtrees at N points, this "AST-with-alternatives" structure 
itself is not exponential in size, but it represents 2**N ordinary ASTs. 
Assuming, again, that we are allowed to choose any combination of the N 
alternative subtrees for producing the ordinary ASTs from the 
"AST-with-alternatives".

> Niklas, you seem to be
> thinking that @ would appear in the GLRed BNF.  No, it would not, at
> least not in an exemplary implementation.  In GLR, the origin of the
> ambiguity can be from a variety of origins:  codified in the grammar
> itself,

I suppose that means nondeterministic (ambiguous) productions, which is 
what I was talking of. With "@", or the equivalent, visible in the 
grammar, to make it ambiguous where a choice between alternatives is wanted.

> dynamically injected by the lexical-analysis layer,

The parser sees what the lexer produces. If the lexer produces an "@", 
or the equivalent "#ifdef" or whatever, then that must be in the grammar 
in order to be parsed.

If the lexer somehow compels the parser to fork or join at certain 
points in the token stream, this can just as well be described formally 
as inserting a special token or token sequence in the stream. In [1] it 
seems that the lexer and preprocessor stages produce C source which 
still contains #ifdef-#else-#endif structures, forcing the parser to 
fork at the #ifdef/#else, and join/merge at the #endif.

> a programmer's unwisely-excessively-deeply nested
> conditional-compilation via a preprocessor, and so forth.

Preprocessing is not parsing.

Ref. [1] indeed describes extensive automatic restructuring of the 
source code, mostly at the lexer and preprocessor levels, before the 
actual parsing.

One of the automatic transformations applied in [1] hoists #ifdef 
conditionals to a level where their branches are complete C syntactical 
"constructs". For example, in this original source code the #ifdef 
conditional part is not a complete C statement:

    #ifdef A
       if (cond)
          x = y;
       else
    #endif
          x = z;

In [1], this code is transformed (by the parser, it is said) into an 
equivalent form where the ifdef-controlled alternatives are C statements:

    #ifdef A
       if (cond)
          x = y;
       else
          x = z;
    #else
       x = z;
    #endif

It seems to me that this is similar to what Dmitry is suggesting, and 
that Dmitry would require programmers to write the second form rather 
than the first, although it is longer and has some duplicated code. For 
readers unused to #ifdef's (such as I) the second form is clearer.

The essential point in which [1] departs from basic GLR is that [1] 
produces an AST-with-alternatives structure where each alternative is 
labelled with the logical condition (#ifdef condition) for choosing this 
alternative.

If we assume, as in [1], that the "configuration variables" in the 
#ifdef conditions are constant (that is, have the same value at all 
choice points in the AST-with-alternatives), then the alternative 
subtrees chosen at the various choice points are correlated, and we no 
longer have an exponential number of ASTs (unless there are 
exponentially many combinations of control-variable values, which may 
happen).

This is exactly the kind of "uniform choice" principle that I spoke of, 
in my preceding post, as a necessary extension of GLR, for this job. 
Thus I think [1] reinforces the point I made there.

Note that [1] only claims to *parse* C in an "all configurations at 
once" manner. It explicitly excludes *compilation* from its scope. The 
parser in [1] does do a little semantic analysis (classifying 
identifiers as "typdef" or "other"), but it certainly cannot guarantee 
that a successfully parsed C program will also compile successfully in 
some configuration, let alone in all configurations. For an Ada 
"conditional compilation" facility, one would like to have the latter 
guarantee, or at least a guarantee of legality in all configurations.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .