Re: Parser interface design - Natasha Kerensikova

comp.lang.ada
 help / color / mirror / Atom feed

From: Natasha Kerensikova <lithiumcat@gmail.com>
Subject: Re: Parser interface design
Date: Thu, 14 Apr 2011 06:55:11 +0000 (UTC)
Date: 2011-04-14T06:55:11+00:00	[thread overview]
Message-ID: <slrniqd6if.2fnq.lithiumcat@sigil.instinctive.eu> (raw)
In-Reply-To: io58cj$7rn$1@munin.nbi.dk

Hello,

On 2011-04-13, Randy Brukardt <randy@rrsoftware.com> wrote:
> "Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
> news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu...
> ...
>> These dangerous features are what made me want to cripple the parser in
>> the first place, and I thought it makes no sense to allow only a few
>> features to be disabled when I can just as easily allow all of them to
>> be independently turned on or off -- hence my example of disabling
>> emphasis.
>>
>> Are my motivations clearer now, or is it still just a whim of the
>> customer imposing a fragile design?
>
> Your intentions are fine, but I still don't think you should be trying to 
> modify the behavior of the parser; that's the job for the "interpretation" 
> layer. Maybe that's because of my compiler background, but what you are 
> trying to do is very similar to a compiler, or to the Ada Standard 
> formatter, or many other batch-oriented tools.

Well, I intended to do both, modify the parser behavior and put some
logic on the interpretation/output layer.

Isn't it the parser role to tell whether the string "<script>" is normal
text or an HTML tag? That's the kind of modification I was thinking
about.

Isn't it the HTML renderer role to escape angular bracket when the
script "<script>" is normal text? I believe it is, because the escaping
is HTML-specific. It wouldn't need the same escaping if the output was
PDF, for example.

Isn't it again the renderer role to make whatever sense it can out of a
"<script>" tag depending on the output format? For HTML output it's a
simple copy, but it seems non-trivial for a PDF output, and impossible
for a plain-text output. But that's not something for the parser to
worry about.

> In your specific case, I believe that preventing "execution" of embedded 
> HTML and the like is the job of the output layer (renderer), because that 
> way it is impossible to forget a case and allow something through. In the RM 
> Formatter tool, that is accomplished by having all text that is intended to 
> be visible in the output format go through a particular output interface: 
> "Ordinary_Text". And that interface is responsible for quoting any 
> characters that might be interpreted as commands ("<", ">", "&" for HTML, 
> "\" for RTF, and so on.) You would have a separate interface for anything 
> that you wanted to output directly (so that it could be executed), such as 
> your script example.

In my case, escaping special character like angular bracket so that they
are considered normal text when it is normal text, is indeed something
on the renderer level. But this is different from enabling or disabling
language features.

> If the rule is that the renderer should always making everything it
> outputs harmless unless it is explicitly instructed otherwise, you'll
> have a lot less trouble.

I never intended not to follow that rule. But a script tag *is*
harmless, if the input can be trusted.

Now if it was a matter of forbidding specifically the script-tag, while
allowing others deemed "harmless", then I agree it should be done on the
renderer level. But changing the language grammar to wipe out the very
concept of inline HTML tag is definitely something to be handled in the
parser.

> To take an example, an Ada compiler doesn't "modify the behavior of the 
> parser" to deal with comments or strings in the source; these are treated as 
> single elements and aren't parsed at all. If one of these needs to be 
> output, it will just be output with the renderer making any transformations 
> needed to keep the output safe. Thus, there is no need to look inside of 
> these constructs to see what is in them.

Does an Ada compiler modify the behavior of the parser when selecting
Ada83 vs Ada95 vs Ada05? That's exactly what this is about here: it's
different feature sets, except that for convenience and coherence the
features are not enabled or disabled individually.

The standard Markdown grammar might look like this:

...
Span_Element ::= Normal_Text | Emphasis | Code_Span | ...
Emphasis ::= "*" Span_Element "*" | "_" Span_Element "_"
Code_Span ::= "`" Inner_Code_Span "`"
Inner_Code_Span ::= Code_Text | Code_Span
...

Now when I'm talking about "disabling emphasis", I mean parsing the
following grammar instead:

...
Span_Element ::= Normal_Text | Code_Span | ...
Code_Span ::= "`" Inner_Code_Span "`"
Inner_Code_Span ::= Code_Text | Code_Span
...

This is of course very different from "rendering emphasis spans like
normal text" or "apply no formatting to mark emphasis" or whatever. It's
just ensuring that the feature cannot cause any harm by preventing its
very existence. How can you make it any safer than that?

Thanks for your insights,
Natasha

next prev parent reply	other threads:[~2011-04-14  6:55 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-06 10:11 Parser interface design Natasha Kerensikova
2011-04-06 12:17 ` Georg Bauhaus
2011-04-07 18:56   ` Natasha Kerensikova
2011-04-08 11:49     ` Stephen Leake
2011-04-06 12:20 ` Dmitry A. Kazakov
2011-04-07 19:14   ` Natasha Kerensikova
2011-04-07 20:31     ` Dmitry A. Kazakov
2011-04-08 13:51       ` Natasha Kerensikova
2011-04-08 14:21         ` Dmitry A. Kazakov
2011-04-12 15:58           ` Natasha Kerensikova
2011-04-12 17:14             ` Dmitry A. Kazakov
2011-04-06 15:51 ` Georg Bauhaus
2011-04-07 19:44   ` Natasha Kerensikova
2011-04-07 20:52     ` Dmitry A. Kazakov
2011-04-07 22:09     ` Simon Wright
2011-04-08 14:03       ` Natasha Kerensikova
2011-04-08 19:06         ` Jeffrey Carter
2011-04-08 19:59         ` Simon Wright
2011-04-12 16:13           ` Natasha Kerensikova
2011-04-12 17:22             ` Dmitry A. Kazakov
2011-04-12 19:02               ` Simon Wright
2011-04-13  8:20                 ` Natasha Kerensikova
2011-04-13  8:37                   ` Dmitry A. Kazakov
2011-04-13 11:06                     ` Georg Bauhaus
2011-04-13 12:46                       ` Dmitry A. Kazakov
2011-04-13 22:33                   ` Randy Brukardt
2011-04-14  6:55                     ` Natasha Kerensikova [this message]
2011-04-15  0:22                       ` Randy Brukardt
2011-04-12 21:54               ` Randy Brukardt
2011-04-07 22:13     ` Georg Bauhaus
2011-04-08 15:30       ` Natasha Kerensikova
2011-04-07  0:36 ` Randy Brukardt
2011-04-08 11:16 ` Brian Drummond
2011-04-19  9:08 ` Natasha Kerensikova
2011-04-19 12:35   ` Ludovic Brenta
2011-04-20 10:44     ` Brian Drummond
2011-04-19 17:28   ` Jeffrey Carter

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox