Re: Parser interface design

comp.lang.ada
 help / color / mirror / Atom feed

From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Parser interface design
Date: Wed, 13 Apr 2011 17:33:52 -0500
Date: 2011-04-13T17:33:52-05:00	[thread overview]
Message-ID: <io58cj$7rn$1@munin.nbi.dk> (raw)
In-Reply-To: slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu

"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu...
...
> These dangerous features are what made me want to cripple the parser in
> the first place, and I thought it makes no sense to allow only a few
> features to be disabled when I can just as easily allow all of them to
> be independently turned on or off -- hence my example of disabling
> emphasis.
>
> Are my motivations clearer now, or is it still just a whim of the
> customer imposing a fragile design?

Your intentions are fine, but I still don't think you should be trying to 
modify the behavior of the parser; that's the job for the "interpretation" 
layer. Maybe that's because of my compiler background, but what you are 
trying to do is very similar to a compiler, or to the Ada Standard 
formatter, or many other batch-oriented tools.

In those sort of tools, the parser (input layer) simply organizes the 
information from the input into a common form. It's the layer that sits 
between the input and the output layer (render in your case) that does the 
operations that depend on things other than the input itself. It's highly 
unlikely that you could avoid having such a layer at all (something has to 
connect the input and the output), and this is the place to do stuff that 
does not clearly have to do with the input or the output (such as 
transformations).

In your specific case, I believe that preventing "execution" of embedded 
HTML and the like is the job of the output layer (renderer), because that 
way it is impossible to forget a case and allow something through. In the RM 
Formatter tool, that is accomplished by having all text that is intended to 
be visible in the output format go through a particular output interface: 
"Ordinary_Text". And that interface is responsible for quoting any 
characters that might be interpreted as commands ("<", ">", "&" for HTML, 
"\" for RTF, and so on.) You would have a separate interface for anything 
that you wanted to output directly (so that it could be executed), such as 
your script example.

It's very important that you isolate all of the rendering in a single 
interface, so that if you have to track down a bug caused by allowing 
something bad into the output (and trust me, you will :-), you only need to 
look in a single place for the problem. You don't want to have to try to 
figure out whether the parser should have prevented the problem, or the 
output layer, or something else, because it's really easy to think that some 
other layer should handle something. (This is especially a problem in 
multi-person projects, where fixing something is always someone else's 
responsibility.) If the rule is that the renderer should always making 
everything it outputs harmless unless it is explicitly instructed otherwise, 
you'll have a lot less trouble.

To take an example, an Ada compiler doesn't "modify the behavior of the 
parser" to deal with comments or strings in the source; these are treated as 
single elements and aren't parsed at all. If one of these needs to be 
output, it will just be output with the renderer making any transformations 
needed to keep the output safe. Thus, there is no need to look inside of 
these constructs to see what is in them.

Similarly, the handling of the command language for the RM formatter doesn't 
change. What option settings do is change the actual effect of the various 
commands, and choose particular input and output formats (such as the source 
files to use, and whether to output in HTML or RTF or something else).

As previously suggested, look at the design of the RM Formatter to see one 
way to do this.

                                           Randy.

next prev parent reply	other threads:[~2011-04-13 22:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-06 10:11 Parser interface design Natasha Kerensikova
2011-04-06 12:17 ` Georg Bauhaus
2011-04-07 18:56   ` Natasha Kerensikova
2011-04-08 11:49     ` Stephen Leake
2011-04-06 12:20 ` Dmitry A. Kazakov
2011-04-07 19:14   ` Natasha Kerensikova
2011-04-07 20:31     ` Dmitry A. Kazakov
2011-04-08 13:51       ` Natasha Kerensikova
2011-04-08 14:21         ` Dmitry A. Kazakov
2011-04-12 15:58           ` Natasha Kerensikova
2011-04-12 17:14             ` Dmitry A. Kazakov
2011-04-06 15:51 ` Georg Bauhaus
2011-04-07 19:44   ` Natasha Kerensikova
2011-04-07 20:52     ` Dmitry A. Kazakov
2011-04-07 22:09     ` Simon Wright
2011-04-08 14:03       ` Natasha Kerensikova
2011-04-08 19:06         ` Jeffrey Carter
2011-04-08 19:59         ` Simon Wright
2011-04-12 16:13           ` Natasha Kerensikova
2011-04-12 17:22             ` Dmitry A. Kazakov
2011-04-12 19:02               ` Simon Wright
2011-04-13  8:20                 ` Natasha Kerensikova
2011-04-13  8:37                   ` Dmitry A. Kazakov
2011-04-13 11:06                     ` Georg Bauhaus
2011-04-13 12:46                       ` Dmitry A. Kazakov
2011-04-13 22:33                   ` Randy Brukardt [this message]
2011-04-14  6:55                     ` Natasha Kerensikova
2011-04-15  0:22                       ` Randy Brukardt
2011-04-12 21:54               ` Randy Brukardt
2011-04-07 22:13     ` Georg Bauhaus
2011-04-08 15:30       ` Natasha Kerensikova
2011-04-07  0:36 ` Randy Brukardt
2011-04-08 11:16 ` Brian Drummond
2011-04-19  9:08 ` Natasha Kerensikova
2011-04-19 12:35   ` Ludovic Brenta
2011-04-20 10:44     ` Brian Drummond
2011-04-19 17:28   ` Jeffrey Carter

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox