From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,50e705cdf2767cc6
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news3.google.com!feeder2.cambriumusenet.nl!feed.tweaknews.nl!194.134.4.91.MISMATCH!news2.euro.net!feeder.news-service.com!85.214.198.2.MISMATCH!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Natasha Kerensikova <lithiumcat@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: Parser interface design
Date: Fri, 8 Apr 2011 15:30:13 +0000 (UTC)
Organization: A noiseless patient Spider
Message-ID: <slrnipuag5.2fnq.lithiumcat@sigil.instinctive.eu>
References: <slrnipof32.2fnq.lithiumcat@sigil.instinctive.eu>
 <4d9c8c19$0$6769$9b4e6d93@newsspool3.arcor-online.net>
 <slrnips50m.2fnq.lithiumcat@sigil.instinctive.eu>
 <4d9e36fd$0$6760$9b4e6d93@newsspool3.arcor-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 8 Apr 2011 15:30:13 +0000 (UTC)
Injection-Info: mx03.eternal-september.org;
 posting-host="Mda950WjNwNLAFOE7yJXQw";
	logging-data="18474"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX18xWmHDGQjYNa+rGfGhHg57"
User-Agent: slrn/0.9.9p1 (FreeBSD)
Cancel-Lock: sha1:SNH0tFgHNIVfH1COgbAPOD1pNEw=
Xref: g2news2.google.com comp.lang.ada:19695
Date: 2011-04-08T15:30:13+00:00
List-Id: <comp.lang.ada>

Hello,

On 2011-04-07, Georg Bauhaus <rm-host.bauhaus@maps.futureapps.de> wrote:
> The part between "generic" and "function" indeed corresponds to function
> pointers, so that *is* the Renderer_Callbacks part, in Ada. Hence, the
> parameter Renderer is not needed.

Yes, I was aware of that, I just too hastily cut'n'paste when I was to
tired to check efficiently I write it right.

>       generic
>           with function Emphasis (Contents: String) return String is<>;
>           ...
>       function Parser (Input: String) return String;
>
> The body of Parser will call Emphasis as needed (which is very much like
> a function pointer)etc.
>
> I'm assuming that the string returned is the rendered text.

Yes, however it's a quick mock-up, and I honestly have no idea whether
it's a good way of returning the rendered text, or whether I should go
for something else like Unbounded_String or Streams or something else.
I still believe such a design decision lies downstream to this
discussion (even though I'm actually interested in reading arguments
towards an answer).

> In a C struct of function pointers, you'd make them point
> to any function that fits the bill.
> You do the same with generics.

Yes, that's how I picked up the generic approach so easily :-)

> OK.  There was one more actor.  I'll remove it in the next
> step.  But first, assume that Tokens stand for atoms
> or composites,
>
>
> +--------+               +---------+
>| Parser |  -> Token ->  | Printer | -> Rendered Token
> +--------+               +---------+
>                            |
>                            v
>               knows output format (polymorphic)

OK. What that Token object could possibly look like was the reason I
couldn't see in my OP how to use a design based on offline parsing or
renderer-controlled. More precisely, what I was missing (which is much
easier to word now that I caught it -- see below) was how I could
possibly encode the semantics except through the choice of a relevant
callback function.

> I understand you wanted the parser to be in control of the
> printer, i.e., have Parser call Emphasis etc., instead.
> I'll come to that.

Actually, I didn't really *wanted* to let the parser control the code
flow. It was more that I couldn't imagine how to do it otherwise; but
the whole point of this thread was to discover how this could be done in
Ada, whether parser-controlled or not.

> procedure Parsing is
>      Output_Format: Format := Configuration.Choose;
> begin
>      loop
>         declare
>             X : Token'Class := Parser.Next_Token;
>         begin
> 	   Print (X, Format);
>         end;
>      end loop;
> end Parsing;
>
> This control structure pulls tokens. Parser could be a "lazy"
> object, producing tokens as needed.  (At this point, can you
> see why I thought that, for example, the loop above could be
> a task? Or the parser?  But that is a different story and of
> no concern here.)

Indeed, so far I understand, and I do see how this can be turned into
task, though I still don't know what benefit it could possibly have.
However, I still have trouble to decide in which box this loop would
live, between parser, client or (unlikely) renderer.

> Then, next in the control structure, is a call on Print.
> Print takes two arguments whose run-time type is not known.
> By design, Print renders any kind of token into any format.
> Print will therefore look at both X and Format, and find out what
> they are (inspect their run-time tags).
> Both X and Format could share O-O "interfaces", respectively.
> Or X may be a variant record, the variants reflecting the different
> kinds of token.

The last phrase above was exactly what opened my eyes to another type of
design, over the record of callbacks I already know. Huge thanks for
that.

Here is that second design, which I think is slightly different from
your proposition, but at least I feel I master it to the point of being
able to actually implement it.

    type Token_Kind is (Normal_Text, Emphasis, Header, Paragraph, Rule);

    type Token(Kind : Token_Kind := Normal_Text) is record
       case Kind is
          when Normal_Text => Contents : String;
          when Emphasis    => Contents : String;
          when Paragraph   => Contents : String;
          when Header =>
             Level : Level_Range;
             Contents : String;
          when Rule => null;
       end record;

    --  Now in the renderer:

    function HTML_Tag (Tag : String; Contents : String) return String
    begin
      return "<" & Tag & ">" & Contents & "</" Tag & ">"
    end function;

    function HTML_Render (T : Token) return String
    begin
       case T.Kind is
          when Normal_Text =>
             return T.Contents;
          when Emphasis =>
             return HTML_Tag ("em", T.Contents);
          when Header =>
             return HTML_Tag ("h" & T.Level'Image, T.Contents);
          when Paragraph =>
             return HTML_Tag ("p", T.Contents);
          when Rule =>
             return "<hr>";
       end case;
    end function;

    function XHTML_Render (T : Token) return String
    begin
       case T.Kind is
          when Rule =>
             return "<hr />";
          when others =>
             return HTML_Render (T);
       end case;
    end function;

I'm not sure this actually works, there might be some syntax errors or
some other low-level Ada issue; I'm writing this more to convey the
idea than to have something compilable (I'd first learn a lot more about
the actual Ada syntax before bothering you with something that is
supposed to compile).

This is still thought in a parser-controlled way, with the parser
procedure that makes Token objects (hence making the variant record
mutable (if I actually managed to write right)), but having such a Token
handled in this way, this opens up new forms of parsers. For example the
loop you proposed above, for example in the client realm, and a parser
that would produce one such Token after the other. Or put the loop in
the renderer, with (X)HTML_Render being a renderer-internal helper
function.

And using something more complex than a String for Contents, like some
kind of list of Tokens, this would allow offline parsing too.

And for more complex renderers, instead of a huge Case, I could use a
map linking Token_Kind to the correct function, or something like that.

A whole new world of designs is opening up to me \o/

> In the C style O-O you mentioned, Print would do that by
> calling functions available through function pointers in
> both X's struct and Format's struct.  Or, to learn about X,
> Print could just look at a distinguishing field  of X.

Actually I don't understand that part yet.

Basically how I see the whole problem of designing an interface between
a parser and a renderer, can be reduced to how to encode the semantics
that the parser extract from the input text, and hand them over to the
renderer to encode it into the output text.

So my very first design used the callback choice by the parser to encode
the semantics, and the callback arguments are the parameters associated
to the type of token being processed. There is obviously one set of
callback, and that set is conceptually indexed by each Markdown feature.
So there is a need for only one dispatch table (per renderer), which
means an abstract tagged type or an interface, and one level of
inheritance. The parser or the client wouldn't have such tables.

Now with my recent enlightenment, I can see the semantics encoded in the
variant record, with parameters stored inside that object. But then I
don't feel the need for any dynamic dispatch -- unless you consider my
big Case as a kind of dynamic dispatch, but no matter what, at some
point the renderer will have to "decide" which actual printing function
to use.

However, I still don't see how to need two sets of dynamic dispatch
tables...

> Your idea seems different in that the parser will call the rendering
> procedures whenever *it* thinks it has something ready to be rendered.
> That's fine, I think.

Indeed, I think so too. I was just worried by the fact I wasn't able to
come up with anything I could implement with a flow controlled by the
client or the renderer. However I can't think of any actual situation
where the parser must no control the flow, at least for a situation of
Markdown to whatever markup processing.

> Keeping your current design, just for the record,  you could establish
> an interface in the logical sense,
>
>     type Renderer_Callbacks is abstract tagged private;
>
>     function Emphasis
>        (This: Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
>     function Normal_Text
>        (This : Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
>     function Paragraph
>        (This : Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
> Add derive types that will override behavior for
> different output formats. Then
>
>     function Parser
>       (R : Renderer_Callbacks'Class; Input : String) return String;
>
> The calls within Parser will be dispatching, depending on the
> type of the actual renderer object.

Yes, so far this is inside the realm of what I can understand.

> And yes, if you don't start the hierarchy from reusable
> functions that do not need to be overridden, there can be
> code bloat.

This is the tough spot: I haven't been able to find out how to avoid
code bloat in the situation I described (with three sets of extensions
and two markup targets).

> This would not be the case with generics, which you
> instantiate with whatever reusable functions you like!

Indeed, and that's I'm still currently leaning towards generics or
record of access to subprogram or variant record approaches :-)


Thanks a lot for your help,
Natasha