From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,50e705cdf2767cc6 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!feeder2.cambriumusenet.nl!feed.tweaknews.nl!194.134.4.91.MISMATCH!news2.euro.net!feeder.news-service.com!85.214.198.2.MISMATCH!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Natasha Kerensikova Newsgroups: comp.lang.ada Subject: Re: Parser interface design Date: Fri, 8 Apr 2011 15:30:13 +0000 (UTC) Organization: A noiseless patient Spider Message-ID: References: <4d9c8c19$0$6769$9b4e6d93@newsspool3.arcor-online.net> <4d9e36fd$0$6760$9b4e6d93@newsspool3.arcor-online.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Date: Fri, 8 Apr 2011 15:30:13 +0000 (UTC) Injection-Info: mx03.eternal-september.org; posting-host="Mda950WjNwNLAFOE7yJXQw"; logging-data="18474"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xWmHDGQjYNa+rGfGhHg57" User-Agent: slrn/0.9.9p1 (FreeBSD) Cancel-Lock: sha1:SNH0tFgHNIVfH1COgbAPOD1pNEw= Xref: g2news2.google.com comp.lang.ada:19695 Date: 2011-04-08T15:30:13+00:00 List-Id: Hello, On 2011-04-07, Georg Bauhaus wrote: > The part between "generic" and "function" indeed corresponds to function > pointers, so that *is* the Renderer_Callbacks part, in Ada. Hence, the > parameter Renderer is not needed. Yes, I was aware of that, I just too hastily cut'n'paste when I was to tired to check efficiently I write it right. > generic > with function Emphasis (Contents: String) return String is<>; > ... > function Parser (Input: String) return String; > > The body of Parser will call Emphasis as needed (which is very much like > a function pointer)etc. > > I'm assuming that the string returned is the rendered text. Yes, however it's a quick mock-up, and I honestly have no idea whether it's a good way of returning the rendered text, or whether I should go for something else like Unbounded_String or Streams or something else. I still believe such a design decision lies downstream to this discussion (even though I'm actually interested in reading arguments towards an answer). > In a C struct of function pointers, you'd make them point > to any function that fits the bill. > You do the same with generics. Yes, that's how I picked up the generic approach so easily :-) > OK. There was one more actor. I'll remove it in the next > step. But first, assume that Tokens stand for atoms > or composites, > > > +--------+ +---------+ >| Parser | -> Token -> | Printer | -> Rendered Token > +--------+ +---------+ > | > v > knows output format (polymorphic) OK. What that Token object could possibly look like was the reason I couldn't see in my OP how to use a design based on offline parsing or renderer-controlled. More precisely, what I was missing (which is much easier to word now that I caught it -- see below) was how I could possibly encode the semantics except through the choice of a relevant callback function. > I understand you wanted the parser to be in control of the > printer, i.e., have Parser call Emphasis etc., instead. > I'll come to that. Actually, I didn't really *wanted* to let the parser control the code flow. It was more that I couldn't imagine how to do it otherwise; but the whole point of this thread was to discover how this could be done in Ada, whether parser-controlled or not. > procedure Parsing is > Output_Format: Format := Configuration.Choose; > begin > loop > declare > X : Token'Class := Parser.Next_Token; > begin > Print (X, Format); > end; > end loop; > end Parsing; > > This control structure pulls tokens. Parser could be a "lazy" > object, producing tokens as needed. (At this point, can you > see why I thought that, for example, the loop above could be > a task? Or the parser? But that is a different story and of > no concern here.) Indeed, so far I understand, and I do see how this can be turned into task, though I still don't know what benefit it could possibly have. However, I still have trouble to decide in which box this loop would live, between parser, client or (unlikely) renderer. > Then, next in the control structure, is a call on Print. > Print takes two arguments whose run-time type is not known. > By design, Print renders any kind of token into any format. > Print will therefore look at both X and Format, and find out what > they are (inspect their run-time tags). > Both X and Format could share O-O "interfaces", respectively. > Or X may be a variant record, the variants reflecting the different > kinds of token. The last phrase above was exactly what opened my eyes to another type of design, over the record of callbacks I already know. Huge thanks for that. Here is that second design, which I think is slightly different from your proposition, but at least I feel I master it to the point of being able to actually implement it. type Token_Kind is (Normal_Text, Emphasis, Header, Paragraph, Rule); type Token(Kind : Token_Kind := Normal_Text) is record case Kind is when Normal_Text => Contents : String; when Emphasis => Contents : String; when Paragraph => Contents : String; when Header => Level : Level_Range; Contents : String; when Rule => null; end record; -- Now in the renderer: function HTML_Tag (Tag : String; Contents : String) return String begin return "<" & Tag & ">" & Contents & "" end function; function HTML_Render (T : Token) return String begin case T.Kind is when Normal_Text => return T.Contents; when Emphasis => return HTML_Tag ("em", T.Contents); when Header => return HTML_Tag ("h" & T.Level'Image, T.Contents); when Paragraph => return HTML_Tag ("p", T.Contents); when Rule => return "
"; end case; end function; function XHTML_Render (T : Token) return String begin case T.Kind is when Rule => return "
"; when others => return HTML_Render (T); end case; end function; I'm not sure this actually works, there might be some syntax errors or some other low-level Ada issue; I'm writing this more to convey the idea than to have something compilable (I'd first learn a lot more about the actual Ada syntax before bothering you with something that is supposed to compile). This is still thought in a parser-controlled way, with the parser procedure that makes Token objects (hence making the variant record mutable (if I actually managed to write right)), but having such a Token handled in this way, this opens up new forms of parsers. For example the loop you proposed above, for example in the client realm, and a parser that would produce one such Token after the other. Or put the loop in the renderer, with (X)HTML_Render being a renderer-internal helper function. And using something more complex than a String for Contents, like some kind of list of Tokens, this would allow offline parsing too. And for more complex renderers, instead of a huge Case, I could use a map linking Token_Kind to the correct function, or something like that. A whole new world of designs is opening up to me \o/ > In the C style O-O you mentioned, Print would do that by > calling functions available through function pointers in > both X's struct and Format's struct. Or, to learn about X, > Print could just look at a distinguishing field of X. Actually I don't understand that part yet. Basically how I see the whole problem of designing an interface between a parser and a renderer, can be reduced to how to encode the semantics that the parser extract from the input text, and hand them over to the renderer to encode it into the output text. So my very first design used the callback choice by the parser to encode the semantics, and the callback arguments are the parameters associated to the type of token being processed. There is obviously one set of callback, and that set is conceptually indexed by each Markdown feature. So there is a need for only one dispatch table (per renderer), which means an abstract tagged type or an interface, and one level of inheritance. The parser or the client wouldn't have such tables. Now with my recent enlightenment, I can see the semantics encoded in the variant record, with parameters stored inside that object. But then I don't feel the need for any dynamic dispatch -- unless you consider my big Case as a kind of dynamic dispatch, but no matter what, at some point the renderer will have to "decide" which actual printing function to use. However, I still don't see how to need two sets of dynamic dispatch tables... > Your idea seems different in that the parser will call the rendering > procedures whenever *it* thinks it has something ready to be rendered. > That's fine, I think. Indeed, I think so too. I was just worried by the fact I wasn't able to come up with anything I could implement with a flow controlled by the client or the renderer. However I can't think of any actual situation where the parser must no control the flow, at least for a situation of Markdown to whatever markup processing. > Keeping your current design, just for the record, you could establish > an interface in the logical sense, > > type Renderer_Callbacks is abstract tagged private; > > function Emphasis > (This: Renderer_Callbacks; Contents: String) return String > is abstract; > > function Normal_Text > (This : Renderer_Callbacks; Contents: String) return String > is abstract; > > function Paragraph > (This : Renderer_Callbacks; Contents: String) return String > is abstract; > > Add derive types that will override behavior for > different output formats. Then > > function Parser > (R : Renderer_Callbacks'Class; Input : String) return String; > > The calls within Parser will be dispatching, depending on the > type of the actual renderer object. Yes, so far this is inside the realm of what I can understand. > And yes, if you don't start the hierarchy from reusable > functions that do not need to be overridden, there can be > code bloat. This is the tough spot: I haven't been able to find out how to avoid code bloat in the situation I described (with three sets of extensions and two markup targets). > This would not be the case with generics, which you > instantiate with whatever reusable functions you like! Indeed, and that's I'm still currently leaning towards generics or record of access to subprogram or variant record approaches :-) Thanks a lot for your help, Natasha