From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Subject: Re: Interesting article on ARG work
Date: Tue, 10 Apr 2018 00:17:39 +0300
Date: 2018-04-10T00:17:39+03:00 [thread overview]
Message-ID: <fj23jkFfushU1@mid.individual.net> (raw)
In-Reply-To: <b1f76a33-a41b-46bf-a334-f8c8fbbbbb24@googlegroups.com>
On 18-04-09 21:08 , Dan'l Miller wrote:
> On Monday, April 9, 2018 at 11:12:56 AM UTC-5, Niklas Holsti wrote:
>> On 18-04-08 22:48 , Dan'l Miller wrote: ... but the GLR parser
>> would apply this nondeterministic choice separately at each
>> occurrence of a "@", so there would be an exponentially large set
>> of alternative parse attempts.
>
> Good Lord! What an especially inappropriate design to intentionally
> cause combinatorial explosion.
How kind you are.
That is AIUI what plain GLR would result in, if one does not extend GLR
with something more.
> The following URL's design would be wiser and practical
> (e.g., where @ is isomorphic to #if DEBUG in the C preprocessor and
> thus exactly one CONFIG…X in this paper).
The reference Dan'l introduced is:
[1] https://cs.nyu.edu/rgrimm/papers/pldi12.pdf
Quote from that paper, describing the SuperC parsing method, called
Fork-Merge LR: "our work is inspired by GLR parsing ... which also forks
and merges subparsers. But whereas GLR parsers match different
productions to the same input fragment, FMLR matches the same
production to different input fragments. Furthermore, unlike GLR and
TypeChef, FMLR parsers can reuse existing LR grammars and parser table
generators; only the parser engine is new."
It seems to be rather different from ordinary GLR.
> The reduced-production stack will temporarily bifurcate into exactly
> 2 parallel reduced-production stacks (and have exactly 2 alternative
> AST subtrees extant concurrently for the affected lines of source code).
If there are 2 alternative AST subtrees at each choice point, the total
number of possible complete ASTs (which is what the later stages of an
ordinary compiler would have to process) is exponential in the number of
choice points (that is, "@" instances) -- assuming that the alternatives
can be chosen independently, which is what plain GLR would give, as I
understand it.
> Then those 2 bifurcations will be
> merged again into a traditional LR reduced-production stack upon lack
> of an @ prefix after a contiguous sequence of @-code lines.
I didn't claim that the parser would take exponential resources
(although it may happen in [1]) just that the number of possible parses
(ASTs) is exponential.
If the parser produces an "AST-with-alternatives" structure with
alternative subtrees at N points, this "AST-with-alternatives" structure
itself is not exponential in size, but it represents 2**N ordinary ASTs.
Assuming, again, that we are allowed to choose any combination of the N
alternative subtrees for producing the ordinary ASTs from the
"AST-with-alternatives".
> Niklas, you seem to be
> thinking that @ would appear in the GLRed BNF. No, it would not, at
> least not in an exemplary implementation. In GLR, the origin of the
> ambiguity can be from a variety of origins: codified in the grammar
> itself,
I suppose that means nondeterministic (ambiguous) productions, which is
what I was talking of. With "@", or the equivalent, visible in the
grammar, to make it ambiguous where a choice between alternatives is wanted.
> dynamically injected by the lexical-analysis layer,
The parser sees what the lexer produces. If the lexer produces an "@",
or the equivalent "#ifdef" or whatever, then that must be in the grammar
in order to be parsed.
If the lexer somehow compels the parser to fork or join at certain
points in the token stream, this can just as well be described formally
as inserting a special token or token sequence in the stream. In [1] it
seems that the lexer and preprocessor stages produce C source which
still contains #ifdef-#else-#endif structures, forcing the parser to
fork at the #ifdef/#else, and join/merge at the #endif.
> a programmer's unwisely-excessively-deeply nested
> conditional-compilation via a preprocessor, and so forth.
Preprocessing is not parsing.
Ref. [1] indeed describes extensive automatic restructuring of the
source code, mostly at the lexer and preprocessor levels, before the
actual parsing.
One of the automatic transformations applied in [1] hoists #ifdef
conditionals to a level where their branches are complete C syntactical
"constructs". For example, in this original source code the #ifdef
conditional part is not a complete C statement:
#ifdef A
if (cond)
x = y;
else
#endif
x = z;
In [1], this code is transformed (by the parser, it is said) into an
equivalent form where the ifdef-controlled alternatives are C statements:
#ifdef A
if (cond)
x = y;
else
x = z;
#else
x = z;
#endif
It seems to me that this is similar to what Dmitry is suggesting, and
that Dmitry would require programmers to write the second form rather
than the first, although it is longer and has some duplicated code. For
readers unused to #ifdef's (such as I) the second form is clearer.
The essential point in which [1] departs from basic GLR is that [1]
produces an AST-with-alternatives structure where each alternative is
labelled with the logical condition (#ifdef condition) for choosing this
alternative.
If we assume, as in [1], that the "configuration variables" in the
#ifdef conditions are constant (that is, have the same value at all
choice points in the AST-with-alternatives), then the alternative
subtrees chosen at the various choice points are correlated, and we no
longer have an exponential number of ASTs (unless there are
exponentially many combinations of control-variable values, which may
happen).
This is exactly the kind of "uniform choice" principle that I spoke of,
in my preceding post, as a necessary extension of GLR, for this job.
Thus I think [1] reinforces the point I made there.
Note that [1] only claims to *parse* C in an "all configurations at
once" manner. It explicitly excludes *compilation* from its scope. The
parser in [1] does do a little semantic analysis (classifying
identifiers as "typdef" or "other"), but it certainly cannot guarantee
that a successfully parsed C program will also compile successfully in
some configuration, let alone in all configurations. For an Ada
"conditional compilation" facility, one would like to have the latter
guarantee, or at least a guarantee of legality in all configurations.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
next prev parent reply other threads:[~2018-04-09 21:17 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-02 3:32 Interesting article on ARG work Randy Brukardt
2018-04-02 14:49 ` Dan'l Miller
2018-04-03 16:34 ` Bojan Bozovic
2018-04-03 22:33 ` Randy Brukardt
2018-04-04 2:12 ` Bojan Bozovic
2018-04-04 15:05 ` Dan'l Miller
2018-04-04 15:30 ` gerdien.de.kruyf
2018-04-04 16:09 ` Dan'l Miller
2018-04-04 22:30 ` Randy Brukardt
2018-04-04 22:43 ` Paul Rubin
2018-04-05 0:44 ` Mehdi Saada
2018-04-05 21:23 ` Randy Brukardt
2018-04-05 2:05 ` Bojan Bozovic
2018-04-05 22:12 ` Randy Brukardt
2018-04-06 13:35 ` Bojan Bozovic
2018-04-07 2:01 ` Randy Brukardt
2018-04-05 7:21 ` Dmitry A. Kazakov
2018-04-05 22:18 ` Randy Brukardt
2018-04-06 7:30 ` Dmitry A. Kazakov
2018-04-07 2:25 ` Randy Brukardt
2018-04-07 10:11 ` Dmitry A. Kazakov
2018-04-07 15:27 ` Dan'l Miller
2018-04-07 15:59 ` Dmitry A. Kazakov
2018-04-08 0:14 ` Dan'l Miller
2018-04-08 7:46 ` Dmitry A. Kazakov
2018-04-08 19:48 ` Dan'l Miller
2018-04-08 20:09 ` Dmitry A. Kazakov
2018-04-09 3:50 ` Dan'l Miller
2018-04-09 6:40 ` Jan de Kruyf
2018-04-09 7:43 ` Dmitry A. Kazakov
2018-04-09 13:40 ` Dan'l Miller
2018-04-09 14:13 ` Dmitry A. Kazakov
2018-04-09 14:36 ` Dan'l Miller
2018-04-09 14:44 ` Dmitry A. Kazakov
2018-04-09 15:03 ` Dan'l Miller
2018-04-09 16:12 ` Niklas Holsti
2018-04-09 16:30 ` Dmitry A. Kazakov
2018-04-09 16:45 ` Niklas Holsti
2018-04-09 17:33 ` Dan'l Miller
2018-04-09 19:47 ` Dmitry A. Kazakov
2018-04-09 20:24 ` Randy Brukardt
2018-04-10 8:17 ` Dmitry A. Kazakov
2018-04-09 18:08 ` Dan'l Miller
2018-04-09 21:17 ` Niklas Holsti [this message]
2018-04-09 22:09 ` Dan'l Miller
2018-04-10 19:23 ` Niklas Holsti
2018-04-10 19:46 ` Dan'l Miller
2018-04-15 7:50 ` Niklas Holsti
2018-04-15 13:31 ` Dan'l Miller
2018-04-15 18:37 ` Niklas Holsti
2018-04-09 20:14 ` Randy Brukardt
2018-04-06 23:49 ` Dan'l Miller
2018-04-12 10:21 ` Marius Amado-Alves
2018-04-15 13:07 ` Ada conditional compilation and program variants Niklas Holsti
2018-05-07 8:41 ` Jacob Sparre Andersen
2018-04-06 13:35 ` Interesting article on ARG work Marius Amado-Alves
2018-04-07 2:15 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox