Re: Parser interface design - Natasha Kerensikova

comp.lang.ada
 help / color / mirror / Atom feed

From: Natasha Kerensikova <lithiumcat@gmail.com>
Subject: Re: Parser interface design
Date: Wed, 13 Apr 2011 08:20:59 +0000 (UTC)
Date: 2011-04-13T08:20:59+00:00	[thread overview]
Message-ID: <slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu> (raw)
In-Reply-To: m2tye3qomy.fsf@pushface.org

Hello,

On 2011-04-12, Simon Wright <simon@pushface.org> wrote:
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>
>> On Tue, 12 Apr 2011 16:13:30 +0000 (UTC), Natasha Kerensikova wrote:
>>
>>> On 2011-04-08, Simon Wright <simon@pushface.org> wrote:
>>
>>>> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
>>>> No_Op ("bar") would just return "bar".
>>> 
>>> And that's exactly why disabling emphasis should be done on the parser
>>> level: the callback only has the semantic information ("render bar
>>> emphasized") without any knowledge of how it was obtained (the source
>>> could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
>>> callback can reconstruct the intact input.
>>
>> That looks wrong to me. I think that obvious design is to move it to the
>> renderer:
>>
>>    HTML.Emphasize ("bar")  -> "<b>bar</b>"
>>    ASCII.Emphasize ("bar")  -> "*bar*"
>>    Plain_Text.Emphasize ("bar")  -> "bar"
>>    Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
>>    ...
>>
>> Parser just calls On_Emphasize, the implementation of routes it to the
>> renderer's Emphasize, which figures out what to do.
>
> Clearly you and I think similarly here. But Natasha is (as well as being
> the potential implementer) the customer.

I'm afraid you're both missing the issue. Dmitry's example is clearly
about different kind of emphasis, while I'm talking about *disabling*
emphasis.

I understand that emphasis is such a mild feature that you might miss
the point of why disabling it at all, so let's consider another Markdown
feature: inline HTML tag. According to Markdown specification, when
encountering "<script>" in the input text, it is supposed to be output
as-is, i.e. generate a HTML script tag. If Markdown is used to format
untrusted input, for example blog comments, it makes sense to prevent
the whole system from outputing arbitrary HTML tags, right?

Moreover, in a context where it's clear that HTML tags are not active,
for example here, there are legitimate uses of text that looks like the
tag. So the "right" thing to do is, in my opinion, to treat the tag as
regular text and escape whatever needs to be escaped depending on the
target. Obviously, the first part is on parser level (e.g. calling
Normal_Text callback) while the second is on renderer level. Therefore,
to make the first part happen, the parser needs to be somehow instructed
that inline HTML feature is disabled.

Another feature that I disabled in blog comments is headers, but because
of aesthetics rather than security. It's an interesting example on top
of inline HTML, because there is technically no information lost in the
parser, so the renderer can theoretically reconstruct the input to make
believe the feature is disabled (though it would be a convoluted
renderer, the information of whether it is a span-level vs block-level
HTML tag is difficult (but not impossible) to reconstruct without
peeking into parser internal state (which we agree is not an option)).
On the other hand, Markdown allows two formats to specify headers, which
are indistinguishable in the renderer, which makes it impossible for it
to fake the feature removal.

These dangerous features are what made me want to cripple the parser in
the first place, and I thought it makes no sense to allow only a few
features to be disabled when I can just as easily allow all of them to
be independently turned on or off -- hence my example of disabling
emphasis.

Are my motivations clearer now, or is it still just a whim of the
customer imposing a fragile design?

Thanks for your comments,
Natasha

next prev parent reply	other threads:[~2011-04-13  8:20 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-06 10:11 Parser interface design Natasha Kerensikova
2011-04-06 12:17 ` Georg Bauhaus
2011-04-07 18:56   ` Natasha Kerensikova
2011-04-08 11:49     ` Stephen Leake
2011-04-06 12:20 ` Dmitry A. Kazakov
2011-04-07 19:14   ` Natasha Kerensikova
2011-04-07 20:31     ` Dmitry A. Kazakov
2011-04-08 13:51       ` Natasha Kerensikova
2011-04-08 14:21         ` Dmitry A. Kazakov
2011-04-12 15:58           ` Natasha Kerensikova
2011-04-12 17:14             ` Dmitry A. Kazakov
2011-04-06 15:51 ` Georg Bauhaus
2011-04-07 19:44   ` Natasha Kerensikova
2011-04-07 20:52     ` Dmitry A. Kazakov
2011-04-07 22:09     ` Simon Wright
2011-04-08 14:03       ` Natasha Kerensikova
2011-04-08 19:06         ` Jeffrey Carter
2011-04-08 19:59         ` Simon Wright
2011-04-12 16:13           ` Natasha Kerensikova
2011-04-12 17:22             ` Dmitry A. Kazakov
2011-04-12 19:02               ` Simon Wright
2011-04-13  8:20                 ` Natasha Kerensikova [this message]
2011-04-13  8:37                   ` Dmitry A. Kazakov
2011-04-13 11:06                     ` Georg Bauhaus
2011-04-13 12:46                       ` Dmitry A. Kazakov
2011-04-13 22:33                   ` Randy Brukardt
2011-04-14  6:55                     ` Natasha Kerensikova
2011-04-15  0:22                       ` Randy Brukardt
2011-04-12 21:54               ` Randy Brukardt
2011-04-07 22:13     ` Georg Bauhaus
2011-04-08 15:30       ` Natasha Kerensikova
2011-04-07  0:36 ` Randy Brukardt
2011-04-08 11:16 ` Brian Drummond
2011-04-19  9:08 ` Natasha Kerensikova
2011-04-19 12:35   ` Ludovic Brenta
2011-04-20 10:44     ` Brian Drummond
2011-04-19 17:28   ` Jeffrey Carter

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox