From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail
From: "G.B." <bauhaus@futureapps.invalid>
Newsgroups: comp.lang.ada
Subject: Re: [Slightly OT] How to process lightweight text markup languages?
Date: Mon, 19 Jan 2015 17:58:38 +0100
Organization: A noiseless patient Spider
Message-ID: <m9jd2u$j08$1@dont-email.me>
References: <slrnmbntdm.19vl.lithiumcat@nat.rebma.instinctive.eu>
 <ynm6coktfevl.1esu61g1n9477.dlg@40tude.net> <m9iokj$upl$1@dont-email.me>
 <c9058n5dlu56.608mrt8042o0$.dlg@40tude.net>
Reply-To: nonlegitur@futureapps.de
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jan 2015 16:58:06 +0000 (UTC)
Injection-Info: mx02.eternal-september.org;
 posting-host="b96887e80893c84a90c3007226ca0d1c";
	logging-data="19464"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX1+Iz8vFscjJID2bteJOyteGIHvAjss+n1w="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
In-Reply-To: <c9058n5dlu56.608mrt8042o0$.dlg@40tude.net>
Cancel-Lock: sha1:lbBmszYP3CKD2++8oaCBHHruoZA=
Xref: news.eternal-september.org comp.lang.ada:24605
Date: 2015-01-19T17:58:38+01:00
List-Id: <comp.lang.ada>

On 19.01.15 14:21, Dmitry A. Kazakov wrote:
> On Mon, 19 Jan 2015 12:09:40 +0100, G.B. wrote:
>
>> On 18.01.15 21:21, Dmitry A. Kazakov wrote:
>>> This is a pretty straightforward and simple technique.
>>
>> The trouble is with expectations:
>>
>> Input:
>>
>>    ((){)([()[[]])]
>>
>> Typical parsers will respond with such useless results
>> as "error at EOF". Not something that a (close to)
>> natural language processor can afford, I think.
>
> Not with the technique I described. In your example, the operator stack
> will contain:
>
>    (  at pos. 2   <--- stack top
>    (  at pos. 1
>
> when } will try to wind it up by popping the last unmatched (. Since } does
> not match ( you will easily generate "the closing curly bracket at pos. 3
> does not match the opening round bracket at pos. 2"

That's a possible answer, but may not be what should
have happened next if the brackets weren't tied together
properly and something is in need of recovery. See also
http://www.youtube.com/watch?v=cog2a3YeDMM

> Your experience probably come from grammar-generated parsers. The
> straightforward technique is so much better for all practical purposes, and
> for error messages generation especially.

Leaving some issues aside such as right brackets being far away,
or missing altogether, or superfluous due to having been placed
twice as in Natasha's example, or structured and misspelled, this
setup falls a little short of what is to be achieved. In
particular in a live system where there is no human involved,
something must be produced: If

  [alpha`]beta`

is a legitimate input, although possibly ungrammatical,
then what is to be produced?

A good translator needs to make the best of it. The
output should reflect the intention. That's only possible
when there is a likely, or legitimate interpretation,
as judged after the fact by readers of the output. What
they will recognize should be what the author had wanted
them to recognize.

If it was the writer's intention to write "`]", then the parser
must not touch the input and a non-translation is the best
solution. If not, then maybe error correction could switch
the positions of "`" and "]", maybe when looking ahead reveals
a likely match for "`". In any case, the input could be
shown alongside the translation, or at least be available
for checking.

I think the best solution is to come to terms with computers
and use them for text editing! Do not again start an even
more ad-hoc markup business than the one against which they
drew up GML in 1969.  I guess :-)