comp.lang.ada
 help / color / mirror / Atom feed
From: Natasha Kerensikova <lithiumcat@instinctive.eu>
Subject: Re: [Slightly OT] How to process lightweight text markup languages?
Date: Tue, 20 Jan 2015 18:47:13 +0000 (UTC)
Date: 2015-01-20T18:47:13+00:00	[thread overview]
Message-ID: <slrnmbt8mf.19vl.lithiumcat@nat.rebma.instinctive.eu> (raw)
In-Reply-To: ynm6coktfevl.1esu61g1n9477.dlg@40tude.net

On 2015-01-18, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> On Sun, 18 Jan 2015 18:04:08 +0000 (UTC), Natasha Kerensikova wrote:
>
> [...] 
>
>> My latest attempts involve keeping the online architecture with separate
>> input and output types and streams, and keeping a stack of currently
>> opened constructs, with a dynamically dispatching ending test on each
>> character for each construct on the stack. It feels horribly
>> inefficient and complicated.
>
> Nothing complicated and most efficient in the sense that depending on the
> formal language classification you could not be able eliminate the stack
> (memory) in some or other form. You could think of it in terms of possible
> states of the parser. If the number of states is infinite you must have
> stack or else the parser itself must be infinite. Simple example is parsing
> the language of balanced brackets: ()(()).

Well it feels insurmountably complicated to me, that's why I posted in
first place -- to be enlightened.

What I still can't make fit with what I know is how to deal
simultaneously with the precedence and the "implicit escaping", which is
further mudded by the interpretation of what is in the constructs
depends on the particular current construct.

To put it in a grammar-like way (even though I doubt the considered
language has a grammar), I would have something like:

   formatted-text ::= code-fragment | link | ... | literal-text
   code-fragment ::= '`' literal-text '`'
   link ::= '[' formatted-text ']' '(' url [ link-title ] ')'
   link-title ::= '"' literal-text '"'

So if you remember my example,

   [alpha`](http://example.com/)`](http://other.com)

the part "](http://example.com/)" matches the end of a link, but only in
a formatted-text context, not inside a code fragment.

There for, considering an online algorithm, when the part
"[alpha`](http://example.com/)" has been read, there is no way to decide
whether it's a link or not, until the closing backtick is encountered,
or there is proof there is no closing backtick by encountering the end
of the text (or of the enclosing even-higher-precedence structure).

This led me to the conclusion that the only solutions are backtracking
and simultaneously parsing all available possibilities (i.e. using a
nondeterministic automaton). Considering the current state of computing,
I should probably go for backtracking.

Basic backtracking would be pushing the current state when encountering
an opening marker, and if the end is reached with a non-empty stack, pop
the top-most state, replace the opening marker by the literal
sequence of its representation, and restart parsing.

If I'm not mistaken, adding precedences to the mix would just change the
meaning of "end is reached" in the previous paragraph: it would not only
mean the end of input, but also the ending marker of any
higher-precedence construct currently in the stack.

Am I right so far? Am I missing something?


Thanks for your help,
Natasha

  parent reply	other threads:[~2015-01-20 18:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-18 18:04 [Slightly OT] How to process lightweight text markup languages? Natasha Kerensikova
2015-01-18 20:21 ` Dmitry A. Kazakov
2015-01-19 11:09   ` G.B.
2015-01-19 13:21     ` Dmitry A. Kazakov
2015-01-19 16:58       ` G.B.
2015-01-19 17:58         ` Dmitry A. Kazakov
2015-01-20 14:41           ` Robert A Duff
2015-01-19 20:12         ` Randy Brukardt
2015-01-19 21:37           ` gautier_niouzes
2015-01-20  8:44             ` Dmitry A. Kazakov
2015-01-20 12:36               ` G.B.
2015-01-20 13:14                 ` Dmitry A. Kazakov
2015-01-20 20:36               ` Shark8
2015-01-20 21:16                 ` Dmitry A. Kazakov
2015-01-20 22:55                   ` J-P. Rosen
2015-01-21  8:35                     ` Dmitry A. Kazakov
2015-01-20 19:19             ` Natasha Kerensikova
2015-01-20 21:43             ` Randy Brukardt
2015-01-20 19:16           ` Natasha Kerensikova
2015-01-20 18:47   ` Natasha Kerensikova [this message]
2015-01-20 19:44     ` Dmitry A. Kazakov
2015-01-20 22:00       ` Randy Brukardt
2015-01-22 13:41         ` Natasha Kerensikova
2015-01-22 18:38           ` Dmitry A. Kazakov
2015-01-22 21:48             ` Randy Brukardt
2015-01-23 10:24     ` Stephen Leake
2015-01-21 14:54 ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox