comp.lang.ada
 help / color / mirror / Atom feed
From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: [Slightly OT] How to process lightweight text markup languages?
Date: Mon, 19 Jan 2015 17:58:38 +0100
Date: 2015-01-19T17:58:38+01:00	[thread overview]
Message-ID: <m9jd2u$j08$1@dont-email.me> (raw)
In-Reply-To: <c9058n5dlu56.608mrt8042o0$.dlg@40tude.net>

On 19.01.15 14:21, Dmitry A. Kazakov wrote:
> On Mon, 19 Jan 2015 12:09:40 +0100, G.B. wrote:
>
>> On 18.01.15 21:21, Dmitry A. Kazakov wrote:
>>> This is a pretty straightforward and simple technique.
>>
>> The trouble is with expectations:
>>
>> Input:
>>
>>    ((){)([()[[]])]
>>
>> Typical parsers will respond with such useless results
>> as "error at EOF". Not something that a (close to)
>> natural language processor can afford, I think.
>
> Not with the technique I described. In your example, the operator stack
> will contain:
>
>    (  at pos. 2   <--- stack top
>    (  at pos. 1
>
> when } will try to wind it up by popping the last unmatched (. Since } does
> not match ( you will easily generate "the closing curly bracket at pos. 3
> does not match the opening round bracket at pos. 2"

That's a possible answer, but may not be what should
have happened next if the brackets weren't tied together
properly and something is in need of recovery. See also
http://www.youtube.com/watch?v=cog2a3YeDMM

> Your experience probably come from grammar-generated parsers. The
> straightforward technique is so much better for all practical purposes, and
> for error messages generation especially.

Leaving some issues aside such as right brackets being far away,
or missing altogether, or superfluous due to having been placed
twice as in Natasha's example, or structured and misspelled, this
setup falls a little short of what is to be achieved. In
particular in a live system where there is no human involved,
something must be produced: If

  [alpha`]beta`

is a legitimate input, although possibly ungrammatical,
then what is to be produced?

A good translator needs to make the best of it. The
output should reflect the intention. That's only possible
when there is a likely, or legitimate interpretation,
as judged after the fact by readers of the output. What
they will recognize should be what the author had wanted
them to recognize.

If it was the writer's intention to write "`]", then the parser
must not touch the input and a non-translation is the best
solution. If not, then maybe error correction could switch
the positions of "`" and "]", maybe when looking ahead reveals
a likely match for "`". In any case, the input could be
shown alongside the translation, or at least be available
for checking.

I think the best solution is to come to terms with computers
and use them for text editing! Do not again start an even
more ad-hoc markup business than the one against which they
drew up GML in 1969.  I guess :-)


  reply	other threads:[~2015-01-19 16:58 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-18 18:04 [Slightly OT] How to process lightweight text markup languages? Natasha Kerensikova
2015-01-18 20:21 ` Dmitry A. Kazakov
2015-01-19 11:09   ` G.B.
2015-01-19 13:21     ` Dmitry A. Kazakov
2015-01-19 16:58       ` G.B. [this message]
2015-01-19 17:58         ` Dmitry A. Kazakov
2015-01-20 14:41           ` Robert A Duff
2015-01-19 20:12         ` Randy Brukardt
2015-01-19 21:37           ` gautier_niouzes
2015-01-20  8:44             ` Dmitry A. Kazakov
2015-01-20 12:36               ` G.B.
2015-01-20 13:14                 ` Dmitry A. Kazakov
2015-01-20 20:36               ` Shark8
2015-01-20 21:16                 ` Dmitry A. Kazakov
2015-01-20 22:55                   ` J-P. Rosen
2015-01-21  8:35                     ` Dmitry A. Kazakov
2015-01-20 19:19             ` Natasha Kerensikova
2015-01-20 21:43             ` Randy Brukardt
2015-01-20 19:16           ` Natasha Kerensikova
2015-01-20 18:47   ` Natasha Kerensikova
2015-01-20 19:44     ` Dmitry A. Kazakov
2015-01-20 22:00       ` Randy Brukardt
2015-01-22 13:41         ` Natasha Kerensikova
2015-01-22 18:38           ` Dmitry A. Kazakov
2015-01-22 21:48             ` Randy Brukardt
2015-01-23 10:24     ` Stephen Leake
2015-01-21 14:54 ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox