comp.lang.ada
 help / color / mirror / Atom feed
From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: [Slightly OT] How to process lightweight text markup languages?
Date: Tue, 20 Jan 2015 16:00:10 -0600
Date: 2015-01-20T16:00:10-06:00	[thread overview]
Message-ID: <m9mj5b$92m$1@loke.gir.dk> (raw)
In-Reply-To: 1wclq766iu82d.b0k1hx30rgrt.dlg@40tude.net

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:1wclq766iu82d.b0k1hx30rgrt.dlg@40tude.net...
> On Tue, 20 Jan 2015 18:47:13 +0000 (UTC), Natasha Kerensikova wrote:
>
>> On 2015-01-18, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
>>> On Sun, 18 Jan 2015 18:04:08 +0000 (UTC), Natasha Kerensikova wrote:
>>>
>>> [...]
>>>
>>>> My latest attempts involve keeping the online architecture with 
>>>> separate
>>>> input and output types and streams, and keeping a stack of currently
>>>> opened constructs, with a dynamically dispatching ending test on each
>>>> character for each construct on the stack. It feels horribly
>>>> inefficient and complicated.
>>>
>>> Nothing complicated and most efficient in the sense that depending on 
>>> the
>>> formal language classification you could not be able eliminate the stack
>>> (memory) in some or other form. You could think of it in terms of 
>>> possible
>>> states of the parser. If the number of states is infinite you must have
>>> stack or else the parser itself must be infinite. Simple example is 
>>> parsing
>>> the language of balanced brackets: ()(()).
>>
>> Well it feels insurmountably complicated to me, that's why I posted in
>> first place -- to be enlightened.
>
> Nothing complicated to me, so far.
>
>> What I still can't make fit with what I know is how to deal
>> simultaneously with the precedence and the "implicit escaping", which is
>> further mudded by the interpretation of what is in the constructs
>> depends on the particular current construct.
>>
>> To put it in a grammar-like way (even though I doubt the considered
>> language has a grammar), I would have something like:
>>
>>    formatted-text ::= code-fragment | link | ... | literal-text
>>    code-fragment ::= '`' literal-text '`'
>>    link ::= '[' formatted-text ']' '(' url [ link-title ] ')'
>>    link-title ::= '"' literal-text '"'
>
> [] - brackets
> () - brackets
> `` - quotation marks
> "" - quotation marks
>
>> So if you remember my example,
>>
>>    [alpha`](http://example.com/)`](http://other.com)
>
> Since `` are quotation marks, the above should be:
>
>  +
>  |_ []
>  |   |_ +
>  |       |_ alpha
>  |       |_ ](http://example.com/)
>  |_ ()
>       |_ http://other.com
>
> + is an assumed infix catenation operation. No backtracking needed.
>
> [...]
>> Am I right so far? Am I missing something?
>
> Distinguishing lexical and syntactical elements? You don't bother with
> operators until expression terms (lexemes) matched. Once you matched them
> you never return back. They are all on the stack if not already bound by 
> an
> operation. If `` is declared literal, it is a term of the expression,
> atomically matched. It naturally takes precedence over anything else.

I agree with Dmitry; "standard" parsing has two stages, lexing (converting 
into a token stream) and parsing. You're trying to make due with only one, 
which complicates the problem a lot for no reason.

Also note that an LR parser acts similarly to your "parsing all 
possibilities at once". The parser state encodes all of the possibilities at 
the current point in the parsing, so it generally can handle quite a bit of 
complication. LR parsers are usually generated by a tool, and thus if there 
is not a unique solution, that's determined at the time of parser 
generation. As Dmitry says, such parsers (like the one we use in Janus/Ada) 
make it more challenging to deal with error correction (the parser generator 
we use originated as a research project into automated error correction --  
we don't use that error correction, which tells you how well that worked 
:-), but they can be quite small and very fast (especially on larger 
grammars like Ada's).

                                    Randy.


                             Randy.




  reply	other threads:[~2015-01-20 22:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-18 18:04 [Slightly OT] How to process lightweight text markup languages? Natasha Kerensikova
2015-01-18 20:21 ` Dmitry A. Kazakov
2015-01-19 11:09   ` G.B.
2015-01-19 13:21     ` Dmitry A. Kazakov
2015-01-19 16:58       ` G.B.
2015-01-19 17:58         ` Dmitry A. Kazakov
2015-01-20 14:41           ` Robert A Duff
2015-01-19 20:12         ` Randy Brukardt
2015-01-19 21:37           ` gautier_niouzes
2015-01-20  8:44             ` Dmitry A. Kazakov
2015-01-20 12:36               ` G.B.
2015-01-20 13:14                 ` Dmitry A. Kazakov
2015-01-20 20:36               ` Shark8
2015-01-20 21:16                 ` Dmitry A. Kazakov
2015-01-20 22:55                   ` J-P. Rosen
2015-01-21  8:35                     ` Dmitry A. Kazakov
2015-01-20 19:19             ` Natasha Kerensikova
2015-01-20 21:43             ` Randy Brukardt
2015-01-20 19:16           ` Natasha Kerensikova
2015-01-20 18:47   ` Natasha Kerensikova
2015-01-20 19:44     ` Dmitry A. Kazakov
2015-01-20 22:00       ` Randy Brukardt [this message]
2015-01-22 13:41         ` Natasha Kerensikova
2015-01-22 18:38           ` Dmitry A. Kazakov
2015-01-22 21:48             ` Randy Brukardt
2015-01-23 10:24     ` Stephen Leake
2015-01-21 14:54 ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox