From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,50e705cdf2767cc6 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!feeder.news-service.com!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!gegeweb.org!news.ecp.fr!news.jacob-sparre.dk!pnx.dk!jacob-sparre.dk!ada-dk.org!.POSTED!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: Parser interface design Date: Wed, 13 Apr 2011 17:33:52 -0500 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <4d9c8c19$0$6769$9b4e6d93@newsspool3.arcor-online.net> <1ovsbvdul64pw$.1q49g3o7n296m$.dlg@40tude.net> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1302734035 8055 69.95.181.76 (13 Apr 2011 22:33:55 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Wed, 13 Apr 2011 22:33:55 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931 Xref: g2news1.google.com comp.lang.ada:18788 Date: 2011-04-13T17:33:52-05:00 List-Id: "Natasha Kerensikova" wrote in message news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu... ... > These dangerous features are what made me want to cripple the parser in > the first place, and I thought it makes no sense to allow only a few > features to be disabled when I can just as easily allow all of them to > be independently turned on or off -- hence my example of disabling > emphasis. > > Are my motivations clearer now, or is it still just a whim of the > customer imposing a fragile design? Your intentions are fine, but I still don't think you should be trying to modify the behavior of the parser; that's the job for the "interpretation" layer. Maybe that's because of my compiler background, but what you are trying to do is very similar to a compiler, or to the Ada Standard formatter, or many other batch-oriented tools. In those sort of tools, the parser (input layer) simply organizes the information from the input into a common form. It's the layer that sits between the input and the output layer (render in your case) that does the operations that depend on things other than the input itself. It's highly unlikely that you could avoid having such a layer at all (something has to connect the input and the output), and this is the place to do stuff that does not clearly have to do with the input or the output (such as transformations). In your specific case, I believe that preventing "execution" of embedded HTML and the like is the job of the output layer (renderer), because that way it is impossible to forget a case and allow something through. In the RM Formatter tool, that is accomplished by having all text that is intended to be visible in the output format go through a particular output interface: "Ordinary_Text". And that interface is responsible for quoting any characters that might be interpreted as commands ("<", ">", "&" for HTML, "\" for RTF, and so on.) You would have a separate interface for anything that you wanted to output directly (so that it could be executed), such as your script example. It's very important that you isolate all of the rendering in a single interface, so that if you have to track down a bug caused by allowing something bad into the output (and trust me, you will :-), you only need to look in a single place for the problem. You don't want to have to try to figure out whether the parser should have prevented the problem, or the output layer, or something else, because it's really easy to think that some other layer should handle something. (This is especially a problem in multi-person projects, where fixing something is always someone else's responsibility.) If the rule is that the renderer should always making everything it outputs harmless unless it is explicitly instructed otherwise, you'll have a lot less trouble. To take an example, an Ada compiler doesn't "modify the behavior of the parser" to deal with comments or strings in the source; these are treated as single elements and aren't parsed at all. If one of these needs to be output, it will just be output with the renderer making any transformations needed to keep the output safe. Thus, there is no need to look inside of these constructs to see what is in them. Similarly, the handling of the command language for the RM formatter doesn't change. What option settings do is change the actual effect of the various commands, and choose particular input and output formats (such as the source files to use, and whether to output in HTML or RTF or something else). As previously suggested, look at the design of the RM Formatter to see one way to do this. Randy.