Parser interface design

comp.lang.ada
 help / color / mirror / Atom feed

* Parser interface design
@ 2011-04-06 10:11 Natasha Kerensikova
  2011-04-06 12:17 ` Georg Bauhaus
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-06 10:11 UTC (permalink / raw)


Hello,

before wasting too much time of anybody, I want to make it clear that
I'm asking about interface design in Ada which might actually never turn
up into real code. I'm still unsure about ever writing Ada code, and how
bad my previous thread went playing no small part in that. However I'm
still curious about how this particular problem might be solved in Ada.

I wrote a while back a C library to parse Markdown text. I call "parser"
the code that takes a text assumed to be formatted in Markdown as input,
and communicates in a still-to-be-defined way with another component,
which I call "renderer", and which outputs the same text but using
another formatting, for example HTML or PDF.

The problem on which I want your opinion is designing the interface
between the parser and the renderer. The point is to be able to "plug"
may renderer into the parser, and thereby obtain a different kind of
output, with as little code rewrite as possible (hence the idea of
refactoring the "parser" into a re-usable part).

I can see three kinds of interfaces:
  * an "offline" interfacing, where the parser returns an in-memory
    abstract representation of the input, which is then processed by the
renderer to produce its output;
  * an "event-based" interfacing, which is basically an "online"
    renderer-driven interfacing: the parser is first fed the input text,
and then various functions allows to query its state, the current
"event" (an event being "that kind of element has been encountered", or
"the input file is over"), the event parameters, etc.
  * a "callback-based" interfacing, which is an "online" parser-driven
    interfacing, where the parser is provided both the input text and a
set of callback from the renderer.

I can see how to go forward up to a real implementation only for the
third kind, which what I will discuss below. While the different merits
(both intrinsic and with regard to implementing it in Ada) of these
kinds interests me, it's a secondary question compared to the following
one.

Assuming the third kind is chosen, the C interface (which I already
coded a while back) is easy to come up with: a bunch of function
pointers and a void pointer for renderer-defined state, put together in
a struct.

I guess the same thing can be done in Ada, except it would be called a
bunch of access to procedures in a record (and the renderer state has
to come from elsewhere). But is it the best way of doing it?

I thought the most idiomatic way of putting together a bunch of
callbacks in Ada would be to use an Interface, and then rely on dynamic
dispatch. This also provides a nice way to embed the renderer state, in
the tagged type that implement the Interface.

Now when thinking a bit deeper about it, Ada interfaces seemed to have a
few drawbacks compared to the first C-ish record idea, as far as code
re-use goes. It felt like it boiled down to extension vs composition,
but I might be wrong on this and using terms I don't fully understand.

The C implementation I mentioned included a few example renderers, that
targets either HTML or XHTML, and that implement either vanilla
Markdown, or some Discount extensions on top of it, or some personal
extensions on previous Discount extensions.

This means a total of 6 example renderers. Some callbacks are the same
for all of them, some callbacks are specific to HTML or XHTML but used
for all extension sets, some callbacks are specific to an extension set
but independent of the output format, and a few callbacks are specific
to a combination of both target and extension set.

The C or Ada record approaches seem to handle it nicely by defining each
callback once, and then selecting them for each example renderers
depending on the chosen features.

The approach using tagged types implementing an interface seems heavier.
Part of the problem is single inheritance: one hierarchy has to be
chosen between HTML/XHTML or  extension sets, and the non-chosen feature
will be reconstructed independently for each instance of the chosen
feature. This sounds bad as far as code reuse goes, and heavy to write
and to maintain, at least compared to the simple record idea.

However I have the feeling that even with multiple inheritance the
tagged type approach doesn't fare much better, because I can only come
up with clumsy and heavy ways to specify from which parent each of the
callbacks comes (because that is what it boils down to).

I also though of implementing each callback once, as standalone
procedures, and then use tagged procedures to call them, but that seems
very messy too.

So what would be the best approach to interface a parser and a renderer?


Thanks in advance for your ideas,
Natasha



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
@ 2011-04-06 12:17 ` Georg Bauhaus
  2011-04-07 18:56   ` Natasha Kerensikova
  2011-04-06 12:20 ` Dmitry A. Kazakov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Georg Bauhaus @ 2011-04-06 12:17 UTC (permalink / raw)


On 06.04.11 12:11, Natasha Kerensikova wrote:

> I can see three kinds of interfaces:
>   * an "offline" interfacing, where the parser returns an in-memory
>     abstract representation of the input, which is then processed by the
> renderer to produce its output;
>   * an "event-based" interfacing, which is basically an "online"
>     renderer-driven interfacing: the parser is first fed the input text,
> and then various functions allows to query its state, the current
> "event" (an event being "that kind of element has been encountered", or
> "the input file is over"), the event parameters, etc.
>   * a "callback-based" interfacing, which is an "online" parser-driven
>     interfacing, where the parser is provided both the input text and a
> set of callback from the renderer.

Before offering an answer to the other questions, I'd like to
throw in a fourth option:

 * use a task that serves as an event queue. The task will collect
parsing events and forward them as necessary.  The parser is then
completely ignorant of any rendering.

As there are several input formats, one might either activate the
task matching the input format, or make a task exhibit different
behavior depending on the input format.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
  2011-04-06 12:17 ` Georg Bauhaus
@ 2011-04-06 12:20 ` Dmitry A. Kazakov
  2011-04-07 19:14   ` Natasha Kerensikova
  2011-04-06 15:51 ` Georg Bauhaus
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-06 12:20 UTC (permalink / raw)

On Wed, 6 Apr 2011 10:11:46 +0000 (UTC), Natasha Kerensikova wrote:

> I thought the most idiomatic way of putting together a bunch of
> callbacks in Ada would be to use an Interface, and then rely on dynamic
> dispatch.

It also can be an abstract base type with primitive operations defined
null. The advantage is that you can have a "null function" and non-null
implementations with a type, which you cannot with an interface.

> The approach using tagged types implementing an interface seems heavier.

You should not try to pack everything onto one types hierarchy. The parser
and renderer should likely be two different hierarchies glued together
through a mix-in in a third object aggregating both implementation.

> So what would be the best approach to interface a parser and a renderer?

The parsers I implemented used primitive operations as sematic callbacks.
As for the renderer, I did one built upon GtkSourceView buffer. But that
had its own parser, so there was no interaction between them.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
  2011-04-06 12:17 ` Georg Bauhaus
  2011-04-06 12:20 ` Dmitry A. Kazakov
@ 2011-04-06 15:51 ` Georg Bauhaus
  2011-04-07 19:44   ` Natasha Kerensikova
  2011-04-07  0:36 ` Randy Brukardt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Georg Bauhaus @ 2011-04-06 15:51 UTC (permalink / raw)


On 06.04.11 12:11, Natasha Kerensikova wrote:

> I thought the most idiomatic way of putting together a bunch of
> callbacks in Ada would be to use an Interface, and then rely on dynamic
> dispatch. This also provides a nice way to embed the renderer state, in
> the tagged type that implement the Interface.

Can be done, and one example is given further below.

Another way is to use "generic interfaces" for callback
communication, a more traditional way I think. For example,

generic
  with function Left_Context (X : T) return String is <>;
  with function Text (X : T) return String is <>;
  with function Right_Context (X : T) return String is <>;
package Printer is

  procedure Print_One (X : T);
  procedure Print_Many (XS : List_of_T);

end Printer;


Here, T is the type of a token. The generic functions need not be
primitive operations, so the parser can be fully decoupled from
rendering.
 After a Printer is instantiated, say,

  package HTML_Printer is new Printer (
	Left_Context => Opening_Tag,
        etc.);

the parser will call

   HTML_Printer.Print_One (Some_Token);

passing a current token. The Print_One procedure in turn will call
generic actual functions Left_Context, Text, and Right_Context.
These would be functions rendering the token's content as needed
for HTML.  For example, the actual for Left_Context, i.e. Opening_Tag,
will print a suitable start tag depending on the token passed
to it.

(Another option is to use generic formal packages.)


Anyway, here is another approach, probably not very original,
certainly lacking proper modularization and other things
that I did't see, sorry, but if offers another idea.  There aren't
any generics in it.

But there is a procedure

    procedure Print (Item : Token'Class;
                     Output : in out Format'Class);

The idea is that given any token, Print renders the token
Item into any format passed for Output.

The example assumes that there is a parser capable of producing
values of types in Token'Class. Funny_Token names one such type.
The interface of type Token does not really reach into parsing,
although for the sake of this example, there is a ubiquitous
type Printable that connects type Token and Format. (As
Dmitry said, I'm using the type system here; but maybe---since
everything is just abstract---the coupling is tolerable.)

The test case is written from the perspective of a controlling
program which could be the parser or some program that drives
the parser.

$ ./test_rendering
<pre>=</pre>(pre =)
$

with Rendering.XHTML, Rendering.Parentheses;

procedure Test_Rendering is
    use Rendering;

    X1: Funny_Token(Eq);                --  a token
    Web : XHTML.XHTML;                  --  a format
    Computer : Parentheses.LList;       --  another format
begin
    Print (X1, Web);
    Print (X1, Computer);
end Test_Rendering;

-- Thus, calling Print with a token renders it into
-- the format specified by the second parameter.


package Rendering is

    -- only Printables can be rendered. Ubiquitous type:

    type Printable is abstract tagged null record;
    function Image (P : Printable) return String is abstract;


    --  establish a maximum of number if different kinds of token:
    type Token_Kind is range 0 .. 1_000;

    type Token(Kind : Token_Kind) is abstract tagged null record;
    function Contents (T: Token) return Printable'Class is abstract;
    --  what is in T, for rendering

    -- some real tokens:

    subtype Funny_Token_Kind is Token_Kind range 1 .. 2;
    Star : constant Funny_Token_Kind := 1;
    Eq : constant Funny_Token_Kind := 2;
    type Funny_Token(Kind : Funny_Token_Kind) is new Token(Kind) with
      null record;
    overriding function Contents (T: Funny_Token) return Printable'Class;

    -- common procedures for rendering into a parenthesized format:

    type Format is abstract tagged null record;
    procedure Tag_O(T : Token'Class; Output : in out Format) is abstract;
    procedure Tag_C(T : Token'Class; Output : in out Format) is abstract;
    --  output opening and closing "brackets", e.g. HTML tags

    -- Finally, every token on can be rendered into any format:

    procedure Print (Item : Token'Class;
                     Output : in out Format'Class);
end Rendering;


with Write_String;
package body Rendering is

    procedure Print(Item : Token'Class; Output : in out Format'Class) is
        Value : constant Printable'Class := Contents (Item);
        --  token content
    begin
        Tag_O(Item, Output);
        Write_String(Image(Value));
        Tag_C(Item, Output);
    end Print;

    --  provide for printable versions of Funny_Tokens:

    type Funny_Printable is new Printable with record
        Repr : Character;
    end record;

    overriding function Image (P : Funny_Printable) return String is
      -- String representation of (presumably) a Funny_Token
    begin
        return String'(1 => P.Repr);
    end Image;

    function Contents (T: Funny_Token) return Printable'Class is
    begin
        case T.Kind is
            when Star => return Funny_Printable'(Repr => '*');
            when Eq => return Funny_Printable'(Repr => '=');
        end case;
    end Contents;

end Rendering;


package Rendering.XHTML is

    --  Overridings for XHTML output.  Each token's represenation is
    --  placed within XHTML tags.  For example, a Star kind of input
    --  token might look like:
    --
    --   "<pre>*</pre>"

    type HTML_Tag is new Printable with private;
    overriding function Image (P : HTML_Tag) return String;

    type XHTML is new Format with
        record
            null;                       --  config data?
        end record;
    procedure Emit_Tag (Paren : HTML_Tag'Class; Output : in out XHTML);

    overriding
    procedure Tag_O(T : Token'Class; Output : in out XHTML);
    overriding
    procedure Tag_C(T : Token'Class; Output : in out XHTML);

private
    type Name_Ref is access constant String;
    type HTML_Tag is new Printable with record
        Is_Opening : Boolean;
        Name : Name_Ref;
    end record;
end Rendering.XHTML;


package Rendering.Parentheses is

    --  Overridings for parenthesized output.  Output format is
    --  intended to look something like this (for a Star kind of
    --  input token):
    --
    --   "(pre *)"

    type Label is new Printable with private;
    overriding function Image (P : Label) return String;

    type LList is new Format with
        record
            null;                       --  config data?
        end record;
    procedure Emit_Label (Paren : Label'Class; Output : in out LList);

    overriding
    procedure Tag_O(T : Token'Class; Output : in out LList);
    overriding
    procedure Tag_C(T : Token'Class; Output : in out LList);

private
    type Name_Ref is access constant String;
    type Label is new Printable with record
        Name : Name_Ref;
    end record;
end Rendering.Parentheses;



with Write_String;
package body Rendering.XHTML is


    --  Build a table that associates a Printable (of tag names) to be
    --  placed around each possible kind of token for rendering.

    Pre_name : aliased constant String := "pre";
    Code_name : aliased constant String := "code";
    Default_Name : aliased constant String := "";

    Tags : constant array (Token_Kind) of HTML_Tag :=
      (Eq => HTML_Tag'(Is_Opening => True, Name => Pre_Name'Access),
       Star => HTML_Tag'(Is_Opening => True, Name => Code_Name'Access),
       others => HTML_Tag'(Is_Opening => True, Name => Default_Name'Access));

    procedure Tag_O(T : Token'Class; Output : in out XHTML) is
        To_Be_Put : HTML_Tag := Tags(T.Kind);
    begin
        To_Be_Put.Is_Opening := True;
        Emit_Tag(To_Be_Put, Output);
    end Tag_O;

    procedure Tag_C(T : Token'Class; Output : in out XHTML) is
        To_Be_Put : HTML_Tag := Tags(T.Kind);
    begin
        To_Be_Put.Is_Opening := False;
        Emit_Tag(To_Be_Put, Output);
    end Tag_C;

    function Image (P : HTML_Tag) return String is
    begin
        case P.Is_Opening is
            when True => return "<" & P.Name.all & ">";
            when False => return "</" & P.Name.all & ">";
        end case;
    end Image;

    procedure Emit_Tag (Paren : HTML_Tag'Class; Output : in out XHTML) is
    begin
        Write_String(Image(Paren));
    end Emit_Tag;

end Rendering.XHTML;


with Write_String;
package body Rendering.Parentheses is

    --  First, build a table that associates a Printable for labelling
    --  a rendered token, for each possible kind of token.

    Pre_name : aliased constant String := "pre";
    Code_name : aliased constant String := "code";
    Default_Name : aliased constant String := "";

    Labels : constant array (Token_Kind) of Label :=
      (Eq => Label'(Name => Pre_Name'Access),
       Star => Label'(Name => Code_Name'Access),
       others => Label'(Name => Default_Name'Access));

    procedure Tag_O(T : Token'Class; Output : in out LList) is
    begin
        Write_String("(");
        Emit_Label(Labels(T.Kind), Output);
    end Tag_O;

    procedure Tag_C(T : Token'Class; Output : in out LList) is
    begin
        Write_String(")");
    end Tag_C;

    function Image (P : Label) return String is
    begin
        return P.Name.all;
    end Image;

    procedure Emit_Label (Paren : Label'Class; Output : in out LList) is
    begin
        Write_String(Image(Paren) & " ");
    end Emit_Label;

end Rendering.Parentheses;


with Ada.Text_IO;
procedure Write_String (Text : String) is
begin
    Ada.Text_IO.Put(Text);
end Write_String;



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
                   ` (2 preceding siblings ...)
  2011-04-06 15:51 ` Georg Bauhaus
@ 2011-04-07  0:36 ` Randy Brukardt
  2011-04-08 11:16 ` Brian Drummond
  2011-04-19  9:08 ` Natasha Kerensikova
  5 siblings, 0 replies; 37+ messages in thread
From: Randy Brukardt @ 2011-04-07  0:36 UTC (permalink / raw)


"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrnipof32.2fnq.lithiumcat@sigil.instinctive.eu...
...
> The problem on which I want your opinion is designing the interface
> between the parser and the renderer. The point is to be able to "plug"
> may renderer into the parser, and thereby obtain a different kind of
> output, with as little code rewrite as possible (hence the idea of
> refactoring the "parser" into a re-usable part).

The tool that creates the formatted versions of the Ada Standard (and other 
documents) has pretty much this same problem. It had to take source files 
coded for an obsolete tool (Scribe) and convert them to modern (in 2000) 
formats. I defined a pair of abstract tagged types to serve as the bottom of 
the hierarchy, Input_Type and Output_Type, and then defined a set of 
operations for each. There then are multiple concrete implementations of 
each (file and buffer for Input, Text, HTML, and RTF for Output).

There is a central part of the code that uses dispatching calls to each to 
implement the actual command set of the source text files. (Note that in my 
design, the meaning of the text is fixed, only the source of the text 
changes, while you want a bit more flexibility in the parsing -- which could 
be accomplished by putting more of the parsing into the Input object. In 
hindsight, I probably should have done that as well.)

You can find the entire source code to the tool at 
http://www.ada-auth.org/arm.html (look for the formatting tool about halfway 
down the page). Note that this was a "working" program not necessarily 
designed for use as an example, so it is pretty large.

                                   Randy.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 12:17 ` Georg Bauhaus
@ 2011-04-07 18:56   ` Natasha Kerensikova
  2011-04-08 11:49     ` Stephen Leake
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-07 18:56 UTC (permalink / raw)

Hello,

On 2011-04-06, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
>  * use a task that serves as an event queue. The task will collect
> parsing events and forward them as necessary.  The parser is then
> completely ignorant of any rendering.

I have absolutely no idea of what it might even remotely look like. I
guess my C past, where task means thread, which means complex and
unreadable and extremely difficult to debug, doesn't help conceiving
what this possibility.

I wouldn't mind a deeper description of this idea, but I don't want to
tax too much of your time.

> As there are several input formats, one might either activate the
> task matching the input format, or make a task exhibit different
> behavior depending on the input format.

I might have been too tainted by my C implementation, but I found then
that attempting to use another input format, even close (like Textile or
Creole), required so much added flexibility and unknowns that I couldn't
find out how to make a defined interface. So I gave up having several
input format (except through several independent projects).

Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 12:20 ` Dmitry A. Kazakov
@ 2011-04-07 19:14   ` Natasha Kerensikova
  2011-04-07 20:31     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-07 19:14 UTC (permalink / raw)

Hello,

On 2011-04-06, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> On Wed, 6 Apr 2011 10:11:46 +0000 (UTC), Natasha Kerensikova wrote:
>
>> I thought the most idiomatic way of putting together a bunch of
>> callbacks in Ada would be to use an Interface, and then rely on dynamic
>> dispatch.
>
> It also can be an abstract base type with primitive operations defined
> null. The advantage is that you can have a "null function" and non-null
> implementations with a type, which you cannot with an interface.

That's an interesting idea, however that prevents a renderer object from
being based on another type, like Controlled or Limited_Controlled -
unless I base the abstract type on them.

The null function fact seems very interesting though. Is it possible to
test externally whether a given dispatched function is null? I'm asking
because in my C implementation, a NULL callback was a meant to
communicate to the parser that the associated active character should
no longer be considered active, thereby switching off that particular
feature (which is very different from having a no-op callback).

>> The approach using tagged types implementing an interface seems heavier.
>
> You should not try to pack everything onto one types hierarchy. The parser
> and renderer should likely be two different hierarchies glued together
> through a mix-in in a third object aggregating both implementation.

Actually I don't understand why there would be a second or even a third
object in there. Would you mind expanding?

In the example I proposed, the only object was the parser, which is
tagged in order to provide dynamic dispatching. That's what would be in
the renderer package. The parser package would be only a procedure, to
which the client hand over the flow control, and which uses the
callbacks from the tagged instance provided by the client. I don't see
how to get from this to something where the parser and/or the client has
a tagged object.

>> So what would be the best approach to interface a parser and a renderer?
>
> The parsers I implemented used primitive operations as sematic callbacks.

That sound exactly like what I proposed, doesn't it?

Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 15:51 ` Georg Bauhaus
@ 2011-04-07 19:44   ` Natasha Kerensikova
  2011-04-07 20:52     ` Dmitry A. Kazakov
                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-07 19:44 UTC (permalink / raw)

Hello,

On 2011-04-06, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> On 06.04.11 12:11, Natasha Kerensikova wrote:
>
>> I thought the most idiomatic way of putting together a bunch of
>> callbacks in Ada would be to use an Interface, and then rely on dynamic
>> dispatch. This also provides a nice way to embed the renderer state, in
>> the tagged type that implement the Interface.
>
> Can be done, and one example is given further below.
>
> Another way is to use "generic interfaces" for callback
> communication, a more traditional way I think. For example,

I have trouble abstracting concept out of your examples, so if you don't
mind I will provide my own, which I understand, and maybe we can work
out a common ground that I understand from there.

The direct transposition of my C implementation would be something like

   type Renderer_Callbacks is record
      Emphasis:    access function (Contents: String) return String;
      Normal_Text: access function (Contents: String) return String;
      Paragraph:   access function (Contents: String) return String;
   end record;

   function Parser (Renderer: Renderer_Callbacks; Input: String)
     return String;

(except with 19 callbacks instead of 3)

Now at least I understand the idea of using generics instead:

   generic
      with function Emphasis (Contents: String) return String is <>;
      with function Normal_Text (Contents: String) return String is <>;
      with function Paragraph (Contents: String) return String is <>;
   function Parser (Renderer: Renderer_Callbacks; Input: String)
     return String;

However I am a bit skeptical about whether or not generics is actually a
better approach than the record of accesses. Generics is certainly a
higher-level feature than record and access, but I don't think that's
enough to prefer one over the other. On the other hand, I understand the
criticism of generics posted here (with my apologies for not remember
who had which arguments), with all the problems caused by the function
not actually existing until it's instanced.

I believe such generics to be a lesser evil than generic types, and
similarly I believe access to subprograms to be lesser evils than access
to data. However I wouldn't be surprised if parser instantiation means
duplicating the code for each parser. Even if it's the compiler that
actually performs the code duplication, I'm not very at ease with it.

All in all, I can't find any argument in favor of replacing the record
of accesses with the generic approach.

> Anyway, here is another approach, probably not very original,
> certainly lacking proper modularization and other things
> that I did't see, sorry, but if offers another idea.  There aren't
> any generics in it.
>
> But there is a procedure
>
>     procedure Print (Item : Token'Class;
>                      Output : in out Format'Class);
>
> The idea is that given any token, Print renders the token
> Item into any format passed for Output.

I genuinely read several times your example, and I can't figure out what
thing performs what.

Here are things that I do understand: there is a parser, which reads the
input data, there a renderer, which outputs the formatted data, and
there is a client, which provides the input data and does something
useful with the formatted data. I don't really see where in that picture
that Print procedure can be found.

In the example above, thing are simple enough for me to understand: the
client hands over control to the parser, along with a reference to the
renderer and the input data. From there, the parser hands over control
to the renderer through callbacks for specific semantic units.

For example, imagine the input "foo *bar* baz". The parser would be
implemented so that the following callback sequence happens:

   part1 = Normal_Text("foo ");
   subpart2 = Normal_Text("bar");
   part2 = Emphasis(subpart2);
   part3 = Normal_Text(" bar");
   result = Paragraph(part1 & part2 & part3);

I mentioned in another post that I used NULL callbacks to turn off a
feature; using this example, if Emphasis is NULL, the star becomes an
inactive character, and the parser would trigger the following callback
sequence instead:

   part = Normal_Text("foo *bar* baz");
   result = Paragraph(part);

So this is the boxes I know of, and a design I'm familiar with.

Do you think you can base on that an explanation of your example that
that dumb C-brain-washed little me can understand?

Thanks in advance,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 19:14   ` Natasha Kerensikova
@ 2011-04-07 20:31     ` Dmitry A. Kazakov
  2011-04-08 13:51       ` Natasha Kerensikova
  0 siblings, 1 reply; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-07 20:31 UTC (permalink / raw)


On Thu, 7 Apr 2011 19:14:33 +0000 (UTC), Natasha Kerensikova wrote:

> Hello,
> 
> On 2011-04-06, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
>> On Wed, 6 Apr 2011 10:11:46 +0000 (UTC), Natasha Kerensikova wrote:
>>
>>> I thought the most idiomatic way of putting together a bunch of
>>> callbacks in Ada would be to use an Interface, and then rely on dynamic
>>> dispatch.
>>
>> It also can be an abstract base type with primitive operations defined
>> null. The advantage is that you can have a "null function" and non-null
>> implementations with a type, which you cannot with an interface.
> 
> That's an interesting idea, however that prevents a renderer object from
> being based on another type, like Controlled or Limited_Controlled -
> unless I base the abstract type on them.
> 
> The null function fact seems very interesting though. Is it possible to
> test externally whether a given dispatched function is null?

I meant "null" in the sense of an elementary implementation, e.g.

   function Get_Line (...) return String is
   begin
      raise End_Error;
   end String;

With interfaces you can do:

   procedure Bar (...) is null;

but cannot

   function Get_Line (...) return String is null;

and furthermore you cannot provide any implementation for Get_Line.
Interface is such a broken concept.

> I'm asking
> because in my C implementation, a NULL callback was a meant to
> communicate to the parser that the associated active character should
> no longer be considered active, thereby switching off that particular
> feature (which is very different from having a no-op callback).

Why should you test for anything? Just make the implementation do what is
required to do in order to achieve the desired effect.

>>> The approach using tagged types implementing an interface seems heavier.
>>
>> You should not try to pack everything onto one types hierarchy. The parser
>> and renderer should likely be two different hierarchies glued together
>> through a mix-in in a third object aggregating both implementation.
> 
> Actually I don't understand why there would be a second or even a third
> object in there. Would you mind expanding?

Parser should not know anything about rendering. Likewise it should know
nothing about the source. It is better to decouple such things into
independent hierarchies. Objects instantiating each hierarchy can be
aggregated or mixed-in by the user.

> In the example I proposed, the only object was the parser, which is
> tagged in order to provide dynamic dispatching. That's what would be in
> the renderer package. The parser package would be only a procedure, to
> which the client hand over the flow control, and which uses the
> callbacks from the tagged instance provided by the client. I don't see
> how to get from this to something where the parser and/or the client has
> a tagged object.

Parser is such an object encapsulating the parsing state. You certainly
would like to have it reentrant, so you need to keep that state somewhere.
The source object encapsulates the state of the source being parsed, e.g. a
file keeps its current position. The renderer object holds the rendering
context etc. An ability to keep hidden state is key advantage of an OO
design over naked procedures tossed here and there.

>>> So what would be the best approach to interface a parser and a renderer?
>>
>> The parsers I implemented used primitive operations as sematic callbacks.
> 
> That sound exactly like what I proposed, doesn't it?

Yes it does.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 19:44   ` Natasha Kerensikova
@ 2011-04-07 20:52     ` Dmitry A. Kazakov
  2011-04-07 22:09     ` Simon Wright
  2011-04-07 22:13     ` Georg Bauhaus
  2 siblings, 0 replies; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-07 20:52 UTC (permalink / raw)


On Thu, 7 Apr 2011 19:44:22 +0000 (UTC), Natasha Kerensikova wrote:

> The direct transposition of my C implementation would be something like
> 
>    type Renderer_Callbacks is record
>       Emphasis:    access function (Contents: String) return String;
>       Normal_Text: access function (Contents: String) return String;
>       Paragraph:   access function (Contents: String) return String;
>    end record;
> 
>    function Parser (Renderer: Renderer_Callbacks; Input: String)
>      return String;

An OO approach:

   type Abstract_Context is abstract
       new Ada.Finalization.Limited_Controlled with private;
   procedure On_Emphasis
             (  Context : in out Abstract_Context;
                Token   : String
             )  is null;
   ...

   type Abstract_Source is abstract ...;
   function Get_Next_Line (Source : in out Abstract_Source)
       return String is abstract;
   ...

   type Parser is ... -- Abstract if many parsing methods to coexist
   procedure Parse
             (  Object  : in out Parser;
                Source  : in out Abstract_Source'Class;
                Context : in out Abstract_Context'Class
             );

or using mix-in:

   type Parser
        (  Source  : not null access Abstract_Source'Class;
           Context : not null access Abstract_Context'Class
        )  is ...; 
   procedure Parse (Object : in out Parser);

A renderer may directly implement Abstract_Context, but usually one makes
it mix-in in order to have an ability for many different renderers to have
a hierarchy of their own independently on any parsing issues. This is what
I suppose Georg's example of Print procedure should illustrate.

[ Abstract_Context may render, generate AST, interperet. Abstract_Source
may encapsulate a text file, text stream, string, XML-ish garbage etc. ]

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 19:44   ` Natasha Kerensikova
  2011-04-07 20:52     ` Dmitry A. Kazakov
@ 2011-04-07 22:09     ` Simon Wright
  2011-04-08 14:03       ` Natasha Kerensikova
  2011-04-07 22:13     ` Georg Bauhaus
  2 siblings, 1 reply; 37+ messages in thread
From: Simon Wright @ 2011-04-07 22:09 UTC (permalink / raw)


Natasha Kerensikova <lithiumcat@gmail.com> writes:

> Now at least I understand the idea of using generics instead:
>
>    generic
>       with function Emphasis (Contents: String) return String is <>;
>       with function Normal_Text (Contents: String) return String is <>;
>       with function Paragraph (Contents: String) return String is <>;
>    function Parser (Renderer: Renderer_Callbacks; Input: String)
>      return String;

I think the Renderer argument shouldn't be there?

Don't see what's wrong with providing a No_Op function which returns its
input unchanged? unless you do something specific for a null?

> However I am a bit skeptical about whether or not generics is actually a
> better approach than the record of accesses. Generics is certainly a
> higher-level feature than record and access, but I don't think that's
> enough to prefer one over the other. On the other hand, I understand the
> criticism of generics posted here (with my apologies for not remember
> who had which arguments), with all the problems caused by the function
> not actually existing until it's instanced.

Not sure I remember that, generics aren't _that_ bad!

> I believe such generics to be a lesser evil than generic types, and
> similarly I believe access to subprograms to be lesser evils than access
> to data. However I wouldn't be surprised if parser instantiation means
> duplicating the code for each parser. Even if it's the compiler that
> actually performs the code duplication, I'm not very at ease with it.

I'd let it get on with it. You're unlikely to have many instantiations
in one program, I'd have thought?

Personally I tend to feel uneasy with access-to-subprogram.

> All in all, I can't find any argument in favor of replacing the record
> of accesses with the generic approach.

Nor can I.


In your original post, you said

> I also though of implementing each callback once, as standalone
> procedures, and then use tagged procedures to call them, but that
> seems very messy too.

Presumably you'll implement each callback once, as a standalone function,
and pop them in the record/call the generic as appropriate?




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 19:44   ` Natasha Kerensikova
  2011-04-07 20:52     ` Dmitry A. Kazakov
  2011-04-07 22:09     ` Simon Wright
@ 2011-04-07 22:13     ` Georg Bauhaus
  2011-04-08 15:30       ` Natasha Kerensikova
  2 siblings, 1 reply; 37+ messages in thread
From: Georg Bauhaus @ 2011-04-07 22:13 UTC (permalink / raw)



> I have trouble abstracting concept out of your examples, so if you don't
> mind I will provide my own, which I understand, and maybe we can work
> out a common ground that I understand from there.

Understood. (I know that example would be a bit involved. Too
much of it will confuse me, too, reliably.)


> Now at least I understand the idea of using generics instead:
>
>     generic
>        with function Emphasis (Contents: String) return String is<>;
>        with function Normal_Text (Contents: String) return String is<>;
>        with function Paragraph (Contents: String) return String is<>;
>     function Parser (Renderer: Renderer_Callbacks; Input: String)
>       return String;

I'll write about generics, first, because I think that even
if you decide against them (or their "spirit"), I think it is worth
knowing what they do. (There is no problem with functions not existing,
for example; Dmitry's war against generics is built on a different
fundament :-) Dmitry, I apologize!

The part between "generic" and "function" indeed corresponds to function
pointers, so that *is* the Renderer_Callbacks part, in Ada. Hence, the
parameter Renderer is not needed.

      generic
          with function Emphasis (Contents: String) return String is<>;
          ...
      function Parser (Input: String) return String;

The body of Parser will call Emphasis as needed (which is very much like
a function pointer)etc.

I'm assuming that the string returned is the rendered text.

In a C struct of function pointers, you'd make them point
to any function that fits the bill.
You do the same with generics.

    function HTML_Rendering_Parser is new Parser
       (Emphasis => EM_printing_function,
        Normal_Text => Identity,
        Paragraph => P_printing_function);

where EM_printing_function, Identity, and P_printing_function
are defined anywhere you like.  Or, in C99,

struct Parser HTMLRenderingParser = {
       .Emphasis = EM_printing_function,
       .Normal_Text = Identity,
       .Paragraph = P_Printing_function
    };

The difference is that in C (or equivalent Ada), the function
Parser is not a template to be instantiated and done, possibly
at compile time, but instead needs an object HTMLRenderingParser.
The similarity is that the functions are available to function
Parser in both cases.
In C, you pass an object that holds function pointers.
In Ada, using generics, you just pass function "addresses"
when instantiating,

In O-O only programming, you'd pass an object that corresponds
to a struct with pointers to functions and possibly a pointer
to another such struct (for inheritance of implementation).

Generic body, then:

      function Parser (Input: String) return String is
      begin
         loop
             ...
             subpart2 := Normal_Text("bar");
             part2 := Emphasis (subpart2);
             ...
         end loop;
      end Parser;

Where Emphasis stands for whatever function is passed
when instantiating Parser, etc.

But, away with generics, even though they will provide checking
and performance gains.  (No indirections through function pointers,
and guaranteed function existence! Instantiate with NOP functions
as needed.)



>> Anyway, here is another approach, probably not very original,
>> certainly lacking proper modularization and other things
>> that I did't see, sorry, but if offers another idea.  There aren't
>> any generics in it.
>>
>> But there is a procedure
>>
>>      procedure Print (Item : Token'Class;
>>                       Output : in out Format'Class);
>>
>> The idea is that given any token, Print renders the token
>> Item into any format passed for Output.
>
> I genuinely read several times your example, and I can't figure out what
> thing performs what.

OK.  There was one more actor.  I'll remove it in the next
step.  But first, assume that Tokens stand for atoms
or composites,


+--------+               +---------+
| Parser |  -> Token ->  | Printer | -> Rendered Token
+--------+               +---------+
                           |
                           v
              knows output format (polymorphic)

I understand you wanted the parser to be in control of the
printer, i.e., have Parser call Emphasis etc., instead.
I'll come to that.

Here is the parsing scheme I had in mind. The following is
simplified because it omits the composites (the semantic
units, which would be sort of handled via Printables, to be
extracted from "tokens").  The example establishes a different
kind of loose coupling. It shows Print at the center of
polymorphic behavior:

procedure Parsing is
     Output_Format: Format := Configuration.Choose;
begin
     loop
        declare
            X : Token'Class := Parser.Next_Token;
        begin
	   Print (X, Format);
        end;
     end loop;
end Parsing;

This control structure pulls tokens. Parser could be a "lazy"
object, producing tokens as needed.  (At this point, can you
see why I thought that, for example, the loop above could be
a task? Or the parser?  But that is a different story and of
no concern here.)

Then, next in the control structure, is a call on Print.
Print takes two arguments whose run-time type is not known.
By design, Print renders any kind of token into any format.
Print will therefore look at both X and Format, and find out what
they are (inspect their run-time tags).
Both X and Format could share O-O "interfaces", respectively.
Or X may be a variant record, the variants reflecting the different
kinds of token.

In the C style O-O you mentioned, Print would do that by
calling functions available through function pointers in
both X's struct and Format's struct.  Or, to learn about X,
Print could just look at a distinguishing field  of X.

Your idea seems different in that the parser will call the rendering
procedures whenever *it* thinks it has something ready to be rendered.
That's fine, I think.

Keeping your current design, just for the record,  you could establish
an interface in the logical sense,

    type Renderer_Callbacks is abstract tagged private;

    function Emphasis
       (This: Renderer_Callbacks; Contents: String) return String
    is abstract;

    function Normal_Text
       (This : Renderer_Callbacks; Contents: String) return String
    is abstract;

    function Paragraph
       (This : Renderer_Callbacks; Contents: String) return String
    is abstract;

Add derive types that will override behavior for
different output formats. Then

    function Parser
      (R : Renderer_Callbacks'Class; Input : String) return String;

The calls within Parser will be dispatching, depending on the
type of the actual renderer object.

And yes, if you don't start the hierarchy from reusable
functions that do not need to be overridden, there can be
code bloat.

This would not be the case with generics, which you
instantiate with whatever reusable functions you like!
:-)





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
                   ` (3 preceding siblings ...)
  2011-04-07  0:36 ` Randy Brukardt
@ 2011-04-08 11:16 ` Brian Drummond
  2011-04-19  9:08 ` Natasha Kerensikova
  5 siblings, 0 replies; 37+ messages in thread
From: Brian Drummond @ 2011-04-08 11:16 UTC (permalink / raw)

On Wed, 06 Apr 2011 10:11:46 +0000, Natasha Kerensikova wrote:

> Hello,
> 
> before wasting too much time of anybody, I want to make it clear that
> I'm asking about interface design in Ada which might actually never turn
> up into real code. I'm still unsure about ever writing Ada code, and how
> bad my previous thread went playing no small part in that. However I'm
> still curious about how this particular problem might be solved in Ada.
> 
> I wrote a while back a C library to parse Markdown text. I call "parser"
> the code that takes a text assumed to be formatted in Markdown as input,
> and communicates in a still-to-be-defined way with another component,
> which I call "renderer", and which outputs the same text but using
> another formatting, for example HTML or PDF.
> 
> The problem on which I want your opinion is designing the interface
> between the parser and the renderer. The point is to be able to "plug"
> may renderer into the parser, and thereby obtain a different kind of
> output, with as little code rewrite as possible (hence the idea of
> refactoring the "parser" into a re-usable part).

One approach to the renderer : streaming I/O.

I don't know if it's the right approach for your problem, but Ada's 
stream I/O is flexible and capable of interesting things.

Ada.Streams allows you to overload your own Read and Write procedures for 
existing types, and define them for new types.

I have played with this a little, though not for your specific purpose.

It looks as if it ought to be possible to create PDF_Stream, HTML_Stream, 
RTF_Stream etc and output to any or all of them.

Sketch of one possible approach, where HTML_Stream, PDF_Stream are 
derived from the root stream class somehow... 
NOt tested : this may not work for multiple types of stream.

-- expose only this to the parser
type style is (none, bold, italic);

type formatted_string is record
   format : style;
   content : bounded_string;
end record;
(Or use discriminants for arbitrary string length.
It is up to the parser to supply formatted_strings)

type element is (ruler, page_break);

-- the rest is internal to the renderer

type V_String_Array is array (style) of V_String;
-- see Barnes p.402 on v-strings and ragged arrays

html_tags : constant V_String_Array := (+"", +"<b>", ...);
html_end_tags : constant V_String_Array := (+"", +"</b>", ...);

procedure write_formatted_string (stream : HTML_Stream; 
                                  f_string : formatted_string) is
begin
   write(html_tags(f_string.format));
   write(f_string.content);
   write(html_tags(f_string.format));
end write_html_formatted_string;

procedure write__element(stream : HTML_Stream) is ...

procedure write_formatted_string (stream : PDF_Stream; 
                                  f_string : formatted_string) is ...

for formatted_string'write use write_formatted_string;
for element'write use write_element; 

(then stream I/o as normal. Possibly open the correct stream type 
according to the supplied filename extension?)

My previous experiment with streams is described at

http://groups.google.com/group/comp.lang.ada/browse_thread/
thread/9642027a78f96963/8449fae50ece601f?q=group:comp.lang.ada
+insubject:newbie+author:brian_drummond%40btconnect.com#8449fae50ece601f

Here, I was trying to use generic Write procedures to write many (two in 
the example!) different enumerations, with control over how each 
enumeration appeared over the outputs. I ran into difficulties, and had 
to use renames to overcome them.

You may have some similar difficulty overloading procedures to write  
e.g. the formatted_string to different types of stream. It may be 
possible to resolve this by instantiating generic procedures, and 
renaming, as in my example above.

- Brian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 18:56   ` Natasha Kerensikova
@ 2011-04-08 11:49     ` Stephen Leake
  0 siblings, 0 replies; 37+ messages in thread
From: Stephen Leake @ 2011-04-08 11:49 UTC (permalink / raw)


Natasha Kerensikova <lithiumcat@gmail.com> writes:

> Hello,
>
> On 2011-04-06, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
>>  * use a task that serves as an event queue. The task will collect
>> parsing events and forward them as necessary.  The parser is then
>> completely ignorant of any rendering.
>
> I have absolutely no idea of what it might even remotely look like. I
> guess my C past, where task means thread, which means complex and
> unreadable and extremely difficult to debug, doesn't help conceiving
> what this possibility.

That is one of the best features of Ada; tasking is built into the
language, in a way that eliminates many of the problems that you
describe.

Still, it's wise to avoid tasking when you are just starting to learn
Ada. 

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 20:31     ` Dmitry A. Kazakov
@ 2011-04-08 13:51       ` Natasha Kerensikova
  2011-04-08 14:21         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-08 13:51 UTC (permalink / raw)

Hello,

On 2011-04-07, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> On Thu, 7 Apr 2011 19:14:33 +0000 (UTC), Natasha Kerensikova wrote:
>> The null function fact seems very interesting though. Is it possible to
>> test externally whether a given dispatched function is null?
>
> With interfaces you can do:
>
>    procedure Bar (...) is null;
>
> but cannot
>
>    function Get_Line (...) return String is null;

Well, I guess it makes sense: you can have a procedure that just does
nothing, but for a function it's not an option, a return value as to be
came up somehow.

> and furthermore you cannot provide any implementation for Get_Line.
> Interface is such a broken concept.

I can see how it can be a problem in some circumstances, but not in the
particular case I had in mind. I think.

>> I'm asking
>> because in my C implementation, a NULL callback was a meant to
>> communicate to the parser that the associated active character should
>> no longer be considered active, thereby switching off that particular
>> feature (which is very different from having a no-op callback).
>
> Why should you test for anything? Just make the implementation do what is
> required to do in order to achieve the desired effect.

Well, that test thing is an implementation detail, on which I believe is
too early to spend time. To make it sensible, here is the full history
on how I came up with testing for null callback in my C implementation:

First, I want to allow the client to specify to the parser a list of
enable features. For example, depending on the level of trust in the
input, one might want to allow or forbid inline HTML in Markdown. The
obvious and universal way of doing so would be to have a record of
booleans, one for each feature, and provide it to the parser along with
the renderer.

Then, it so happens that the set of features matches exactly the set of
callbacks. And disabling a feature means the associated callback will
never be used, so it can have any value (including an invalid one, at
least in C). Moreover in C, a function pointer can always be NULL, so
this case has to be somehow taken into account.

So it turns out that instead of using a dedicated boolean to specify
whether the client wants a feature to be enabled or not, I can use
whether the associated callback is NULL or not. It a sort of coalescing
the record of callbacks and the record of booleans into a single record.

Now your mention of null procedure made me wonder whether the same trick
can be used in Ada. For that, I would need to make the parser change its
behavior depending on whether a particular procedure is null or not.

But again, it's clearly too early in the design of a Markdown parsing
library to decide anything about it. Just like it's too early to decide
whether the parser should involve functions returning a String or
procedures appending to an in out Unbounded_String. It's just that I
went on a tangent to ask about a particular feature that might exist or
not in the language.

>>>> The approach using tagged types implementing an interface seems heavier.
>>>
>>> You should not try to pack everything onto one types hierarchy. The parser
>>> and renderer should likely be two different hierarchies glued together
>>> through a mix-in in a third object aggregating both implementation.
>> 
>> Actually I don't understand why there would be a second or even a third
>> object in there. Would you mind expanding?
>
> Parser should not know anything about rendering. Likewise it should know
> nothing about the source. It is better to decouple such things into
> independent hierarchies. Objects instantiating each hierarchy can be
> aggregated or mixed-in by the user.

I wholeheartedly agree to the decoupling, and that was an assumption I
had before even thinking about design. Well, at least for decoupling
between parser and renderer; depending on the particular design I can
live with the client being intricately couple with either the renderer
or the parser, but not both.

However, I still don't see how means having different hierarchies. Maybe
I just can't think object-oriented enough, but for now I can only come
up with one hierarchy (for the parser, the renderer being only one
subprogram), or none at all (using a variant record to hold the
semantics, the parser has subprograms handing back such variant record
instance, and the renderer just checking the discriminant and acting
accordingly, again only through static subprograms).

I just can't imagine any design that has a use of several hierarchy, for
now. (when writing the OP I could imagine only one design, and now I've
already reached two designs, sketched above, so I'm progressing)

>> In the example I proposed, the only object was the renderer, which is
>> tagged in order to provide dynamic dispatching. That's what would be in
>> the renderer package. The parser package would be only a procedure, to
>> which the client hand over the flow control, and which uses the
>> callbacks from the tagged instance provided by the client. I don't see
>> how to get from this to something where the parser and/or the client has
>> a tagged object.
>
> Parser is such an object encapsulating the parsing state. You certainly
> would like to have it reentrant, so you need to keep that state somewhere.
> The source object encapsulates the state of the source being parsed, e.g. a
> file keeps its current position. The renderer object holds the rendering
> context etc. An ability to keep hidden state is key advantage of an OO
> design over naked procedures tossed here and there.

Well, I completely agree on the benefits of keeping a state hidden. The
only "naked" procedure in my example is the parser, which does keep its
state hidden, just its hidden in local variable and not in an object.

As long as the design is the client calling exactly once the parser,
there is no need for an object to keep a hidden state, the local stack
is enough, isn't it?

Moreover, even going all the way to OO design and keeping all hidden
states in objects, it doesn't mean such objects have to belong to any
hierarchy, right? The example I proposed was based on callbacks, which
necessarily involves some form of dynamic dispatching (here, in the
renderer) but as far as I can tell, parser or coordinator objects don't
need to be tagged or in any hierarchy, or am I missing something?

Thanks for your insights,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 22:09     ` Simon Wright
@ 2011-04-08 14:03       ` Natasha Kerensikova
  2011-04-08 19:06         ` Jeffrey Carter
  2011-04-08 19:59         ` Simon Wright
  0 siblings, 2 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-08 14:03 UTC (permalink / raw)

Hello,

On 2011-04-07, Simon Wright <simon@pushface.org> wrote:
> Natasha Kerensikova <lithiumcat@gmail.com> writes:
>
>> Now at least I understand the idea of using generics instead:
>>
>>    generic
>>       with function Emphasis (Contents: String) return String is <>;
>>       with function Normal_Text (Contents: String) return String is <>;
>>       with function Paragraph (Contents: String) return String is <>;
>>    function Parser (Renderer: Renderer_Callbacks; Input: String)
>>      return String;
>
> I think the Renderer argument shouldn't be there?

Yes, it's a mistake caused by a too hasty cut'n'paste too late in the
evening. My bad.

> Don't see what's wrong with providing a No_Op function which returns its
> input unchanged? unless you do something specific for a null?

Yes, as I said in my reply to Dmitry, I thought of disabling in the
parser the feature associated to a callback set to null. In the
"foo *bar* baz" example I provided, using a No_Op for Emphasis would
result in "<p>foo  baz</p>", while disabling the emphasis feature means
considering the star as an inactive character, resulting in
"<p>foo *bar* baz</p>".

>> However I am a bit skeptical about whether or not generics is actually a
>> better approach than the record of accesses. Generics is certainly a
>> higher-level feature than record and access, but I don't think that's
>> enough to prefer one over the other. On the other hand, I understand the
>> criticism of generics posted here (with my apologies for not remember
>> who had which arguments), with all the problems caused by the function
>> not actually existing until it's instanced.
>
> Not sure I remember that, generics aren't _that_ bad!

Well, maybe my lack of understanding of Ada made me draw conclusions
that are much too machinean than warranted - I have seen it happen too
often with journalist to imagine I couldn't fall in the same trap.

>> I believe such generics to be a lesser evil than generic types, and
>> similarly I believe access to subprograms to be lesser evils than access
>> to data. However I wouldn't be surprised if parser instantiation means
>> duplicating the code for each parser. Even if it's the compiler that
>> actually performs the code duplication, I'm not very at ease with it.
>
> I'd let it get on with it. You're unlikely to have many instantiations
> in one program, I'd have thought?

Well, the example from my C implementation used 6 renderers, selectable
through command-line flags. That would mean 6 instantiations in an Ada
equivalent.

Granted, it's not that "many".

> Personally I tend to feel uneasy with access-to-subprogram.

Maybe my extensive of C pointers, to subprograms or data alike, made me
more at ease with accesses than Ada alone warrants.

>> All in all, I can't find any argument in favor of replacing the record
>> of accesses with the generic approach.
>
> Nor can I.

Great :-) But now I find my arguments for replacing generics with a
record of accesses quite weak too. Could it be that none of them is
extremely better than the other?

> In your original post, you said
>
>> I also though of implementing each callback once, as standalone
>> procedures, and then use tagged procedures to call them, but that
>> seems very messy too.
>
> Presumably you'll implement each callback once, as a standalone function,
> and pop them in the record/call the generic as appropriate?

Yes, that is fulfilled with both the record and the generic approaches.
However for example when the renderer is a tagged object, I can't see
how to avoid ending up with "methods" that are nothing but wrapper
around a real (shared) callback.

Thanks for your help,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-08 13:51       ` Natasha Kerensikova
@ 2011-04-08 14:21         ` Dmitry A. Kazakov
  2011-04-12 15:58           ` Natasha Kerensikova
  0 siblings, 1 reply; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-08 14:21 UTC (permalink / raw)

On Fri, 8 Apr 2011 13:51:51 +0000 (UTC), Natasha Kerensikova wrote:

>>> I'm asking
>>> because in my C implementation, a NULL callback was a meant to
>>> communicate to the parser that the associated active character should
>>> no longer be considered active, thereby switching off that particular
>>> feature (which is very different from having a no-op callback).
>>
>> Why should you test for anything? Just make the implementation do what is
>> required to do in order to achieve the desired effect.
> 
> Well, that test thing is an implementation detail, on which I believe is
> too early to spend time.

No, it is not an implementation, because the check happens on the caller's
side. Implementation detail is always a callee's thing .

> Then, it so happens that the set of features matches exactly the set of
> callbacks.

If the property is one of the callback, it is the callback's responsibility
to check it. Moving it to the parser is fragile design.

> I just can't imagine any design that has a use of several hierarchy, for
> now. (when writing the OP I could imagine only one design, and now I've
> already reached two designs, sketched above, so I'm progressing)

When your parser sums two numbers it does not have + as a primitive
operation. Does it? If numbers may have a hierarchy of their own, why
renderer cannot? 

> As long as the design is the client calling exactly once the parser,
> there is no need for an object to keep a hidden state, the local stack
> is enough, isn't it?

Maybe. Note that more states you glue together by maintaining single
hierarchy of "everything", more difficult it becomes for single stack. BTW,
parsing infix expressions is much simpler with two stacks. You might also
wish not to use the standard stack for such things.

> but as far as I can tell, parser or coordinator objects don't
> need to be tagged or in any hierarchy, or am I missing something?

You never know it in advance. It depends on the parser design. A
table-driven parser might be extended rather by extending the tables it
uses. Other parsers might be extended directly.

In Ada you lose nothing making it tagged. Differently to C++ where
"virtual" has non-zero overhead, in Ada making a procedure primitive has
zero performance cost, because the target will be resolved statically.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-07 22:13     ` Georg Bauhaus
@ 2011-04-08 15:30       ` Natasha Kerensikova
  0 siblings, 0 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-08 15:30 UTC (permalink / raw)

Hello,

On 2011-04-07, Georg Bauhaus <rm-host.bauhaus@maps.futureapps.de> wrote:
> The part between "generic" and "function" indeed corresponds to function
> pointers, so that *is* the Renderer_Callbacks part, in Ada. Hence, the
> parameter Renderer is not needed.

Yes, I was aware of that, I just too hastily cut'n'paste when I was to
tired to check efficiently I write it right.

>       generic
>           with function Emphasis (Contents: String) return String is<>;
>           ...
>       function Parser (Input: String) return String;
>
> The body of Parser will call Emphasis as needed (which is very much like
> a function pointer)etc.
>
> I'm assuming that the string returned is the rendered text.

Yes, however it's a quick mock-up, and I honestly have no idea whether
it's a good way of returning the rendered text, or whether I should go
for something else like Unbounded_String or Streams or something else.
I still believe such a design decision lies downstream to this
discussion (even though I'm actually interested in reading arguments
towards an answer).

> In a C struct of function pointers, you'd make them point
> to any function that fits the bill.
> You do the same with generics.

Yes, that's how I picked up the generic approach so easily :-)

> OK.  There was one more actor.  I'll remove it in the next
> step.  But first, assume that Tokens stand for atoms
> or composites,
>
>
> +--------+               +---------+
>| Parser |  -> Token ->  | Printer | -> Rendered Token
> +--------+               +---------+
>                            |
>                            v
>               knows output format (polymorphic)

OK. What that Token object could possibly look like was the reason I
couldn't see in my OP how to use a design based on offline parsing or
renderer-controlled. More precisely, what I was missing (which is much
easier to word now that I caught it -- see below) was how I could
possibly encode the semantics except through the choice of a relevant
callback function.

> I understand you wanted the parser to be in control of the
> printer, i.e., have Parser call Emphasis etc., instead.
> I'll come to that.

Actually, I didn't really *wanted* to let the parser control the code
flow. It was more that I couldn't imagine how to do it otherwise; but
the whole point of this thread was to discover how this could be done in
Ada, whether parser-controlled or not.

> procedure Parsing is
>      Output_Format: Format := Configuration.Choose;
> begin
>      loop
>         declare
>             X : Token'Class := Parser.Next_Token;
>         begin
> 	   Print (X, Format);
>         end;
>      end loop;
> end Parsing;
>
> This control structure pulls tokens. Parser could be a "lazy"
> object, producing tokens as needed.  (At this point, can you
> see why I thought that, for example, the loop above could be
> a task? Or the parser?  But that is a different story and of
> no concern here.)

Indeed, so far I understand, and I do see how this can be turned into
task, though I still don't know what benefit it could possibly have.
However, I still have trouble to decide in which box this loop would
live, between parser, client or (unlikely) renderer.

> Then, next in the control structure, is a call on Print.
> Print takes two arguments whose run-time type is not known.
> By design, Print renders any kind of token into any format.
> Print will therefore look at both X and Format, and find out what
> they are (inspect their run-time tags).
> Both X and Format could share O-O "interfaces", respectively.
> Or X may be a variant record, the variants reflecting the different
> kinds of token.

The last phrase above was exactly what opened my eyes to another type of
design, over the record of callbacks I already know. Huge thanks for
that.

Here is that second design, which I think is slightly different from
your proposition, but at least I feel I master it to the point of being
able to actually implement it.

    type Token_Kind is (Normal_Text, Emphasis, Header, Paragraph, Rule);

    type Token(Kind : Token_Kind := Normal_Text) is record
       case Kind is
          when Normal_Text => Contents : String;
          when Emphasis    => Contents : String;
          when Paragraph   => Contents : String;
          when Header =>
             Level : Level_Range;
             Contents : String;
          when Rule => null;
       end record;

    --  Now in the renderer:

    function HTML_Tag (Tag : String; Contents : String) return String
    begin
      return "<" & Tag & ">" & Contents & "</" Tag & ">"
    end function;

    function HTML_Render (T : Token) return String
    begin
       case T.Kind is
          when Normal_Text =>
             return T.Contents;
          when Emphasis =>
             return HTML_Tag ("em", T.Contents);
          when Header =>
             return HTML_Tag ("h" & T.Level'Image, T.Contents);
          when Paragraph =>
             return HTML_Tag ("p", T.Contents);
          when Rule =>
             return "<hr>";
       end case;
    end function;

    function XHTML_Render (T : Token) return String
    begin
       case T.Kind is
          when Rule =>
             return "<hr />";
          when others =>
             return HTML_Render (T);
       end case;
    end function;

I'm not sure this actually works, there might be some syntax errors or
some other low-level Ada issue; I'm writing this more to convey the
idea than to have something compilable (I'd first learn a lot more about
the actual Ada syntax before bothering you with something that is
supposed to compile).

This is still thought in a parser-controlled way, with the parser
procedure that makes Token objects (hence making the variant record
mutable (if I actually managed to write right)), but having such a Token
handled in this way, this opens up new forms of parsers. For example the
loop you proposed above, for example in the client realm, and a parser
that would produce one such Token after the other. Or put the loop in
the renderer, with (X)HTML_Render being a renderer-internal helper
function.

And using something more complex than a String for Contents, like some
kind of list of Tokens, this would allow offline parsing too.

And for more complex renderers, instead of a huge Case, I could use a
map linking Token_Kind to the correct function, or something like that.

A whole new world of designs is opening up to me \o/

> In the C style O-O you mentioned, Print would do that by
> calling functions available through function pointers in
> both X's struct and Format's struct.  Or, to learn about X,
> Print could just look at a distinguishing field  of X.

Actually I don't understand that part yet.

Basically how I see the whole problem of designing an interface between
a parser and a renderer, can be reduced to how to encode the semantics
that the parser extract from the input text, and hand them over to the
renderer to encode it into the output text.

So my very first design used the callback choice by the parser to encode
the semantics, and the callback arguments are the parameters associated
to the type of token being processed. There is obviously one set of
callback, and that set is conceptually indexed by each Markdown feature.
So there is a need for only one dispatch table (per renderer), which
means an abstract tagged type or an interface, and one level of
inheritance. The parser or the client wouldn't have such tables.

Now with my recent enlightenment, I can see the semantics encoded in the
variant record, with parameters stored inside that object. But then I
don't feel the need for any dynamic dispatch -- unless you consider my
big Case as a kind of dynamic dispatch, but no matter what, at some
point the renderer will have to "decide" which actual printing function
to use.

However, I still don't see how to need two sets of dynamic dispatch
tables...

> Your idea seems different in that the parser will call the rendering
> procedures whenever *it* thinks it has something ready to be rendered.
> That's fine, I think.

Indeed, I think so too. I was just worried by the fact I wasn't able to
come up with anything I could implement with a flow controlled by the
client or the renderer. However I can't think of any actual situation
where the parser must no control the flow, at least for a situation of
Markdown to whatever markup processing.

> Keeping your current design, just for the record,  you could establish
> an interface in the logical sense,
>
>     type Renderer_Callbacks is abstract tagged private;
>
>     function Emphasis
>        (This: Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
>     function Normal_Text
>        (This : Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
>     function Paragraph
>        (This : Renderer_Callbacks; Contents: String) return String
>     is abstract;
>
> Add derive types that will override behavior for
> different output formats. Then
>
>     function Parser
>       (R : Renderer_Callbacks'Class; Input : String) return String;
>
> The calls within Parser will be dispatching, depending on the
> type of the actual renderer object.

Yes, so far this is inside the realm of what I can understand.

> And yes, if you don't start the hierarchy from reusable
> functions that do not need to be overridden, there can be
> code bloat.

This is the tough spot: I haven't been able to find out how to avoid
code bloat in the situation I described (with three sets of extensions
and two markup targets).

> This would not be the case with generics, which you
> instantiate with whatever reusable functions you like!

Indeed, and that's I'm still currently leaning towards generics or
record of access to subprogram or variant record approaches :-)

Thanks a lot for your help,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-08 14:03       ` Natasha Kerensikova
@ 2011-04-08 19:06         ` Jeffrey Carter
  2011-04-08 19:59         ` Simon Wright
  1 sibling, 0 replies; 37+ messages in thread
From: Jeffrey Carter @ 2011-04-08 19:06 UTC (permalink / raw)

On 04/08/2011 07:03 AM, Natasha Kerensikova wrote:
>
> Yes, as I said in my reply to Dmitry, I thought of disabling in the
> parser the feature associated to a callback set to null. In the
> "foo *bar* baz" example I provided, using a No_Op for Emphasis would
> result in "<p>foo  baz</p>", while disabling the emphasis feature means
> considering the star as an inactive character, resulting in
> "<p>foo *bar* baz</p>".

Without knowing much about your domain, I'd think

type Parse_Result_Info is ...;

generic -- Parse
    with procedure Render (Parse_Result : in Parse_Result_Info);
procedure Parse (...);

would provide you with this kind of discrimination. A Render that does something 
for the '*' construct would recognize it and take the desired action, while a 
Render that doesn't would give the results you want. Another advantage is that 
you have no limit on the number of constructs that you can handle, rather than 
the fixed number required when using a separate subprogram for each construct.

The limitation here seems to be that Parse is going to parse according to a 
fixed grammar. Perhaps it could be given a grammar as a parameter to make it 
more flexible, or it may be possible to provide more generic formal parameters 
that define how it parses.

-- 
Jeff Carter
"Help! Help! I'm being repressed!"
Monty Python & the Holy Grail
67

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-08 14:03       ` Natasha Kerensikova
  2011-04-08 19:06         ` Jeffrey Carter
@ 2011-04-08 19:59         ` Simon Wright
  2011-04-12 16:13           ` Natasha Kerensikova
  1 sibling, 1 reply; 37+ messages in thread
From: Simon Wright @ 2011-04-08 19:59 UTC (permalink / raw)


Natasha Kerensikova <lithiumcat@gmail.com> writes:

>> Don't see what's wrong with providing a No_Op function which returns its
>> input unchanged? unless you do something specific for a null?
>
> Yes, as I said in my reply to Dmitry, I thought of disabling in the
> parser the feature associated to a callback set to null. In the
> "foo *bar* baz" example I provided, using a No_Op for Emphasis would
> result in "<p>foo  baz</p>", while disabling the emphasis feature means
> considering the star as an inactive character, resulting in
> "<p>foo *bar* baz</p>".

I had in mind more that using my No_Op (I think it was Georg who used
the better name Identity) would result in "<p>foo bar baz</p>".

In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
No_Op ("bar") would just return "bar".



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-08 14:21         ` Dmitry A. Kazakov
@ 2011-04-12 15:58           ` Natasha Kerensikova
  2011-04-12 17:14             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-12 15:58 UTC (permalink / raw)

Hello,

On 2011-04-08, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> On Fri, 8 Apr 2011 13:51:51 +0000 (UTC), Natasha Kerensikova wrote:
>>>> I'm asking
>>>> because in my C implementation, a NULL callback was a meant to
>>>> communicate to the parser that the associated active character should
>>>> no longer be considered active, thereby switching off that particular
>>>> feature (which is very different from having a no-op callback).
>>>
>>> Why should you test for anything? Just make the implementation do what is
>>> required to do in order to achieve the desired effect.
>> 
>> Well, that test thing is an implementation detail, on which I believe is
>> too early to spend time.
>
> No, it is not an implementation, because the check happens on the caller's
> side. Implementation detail is always a callee's thing .

I'm afraid we might not talking about the same "implementation" here, I
might have misused the word.

Your "implementation" is the "callback implementation" (callee) which
indeed has nothing to do with the parser (caller), that's the hole point
of distinguishing *specification* and implementation.

My "implementation" is the "parser implementation", that is whether to
use a NULL callback as an input parameter to disable the feature
associated to the callback. That would actually be reflected in the
parser specification, but there I used the word "implementation" as
opposed to "design", which is the current phase.

I believe it's premature to discuss how the client should specify which
Markdown features to enable or disable in the parser, when how the
client and the parser will communicate is not even decided yet. That's
why I meant by "that test thing is an implementation detail".

>> Then, it so happens that the set of features matches exactly the set of
>> callbacks.
>
> If the property is one of the callback, it is the callback's responsibility
> to check it. Moving it to the parser is fragile design.

I don't think whether the access-to-subprogram provided to the parser
being null or not is actually a property of the callback itself. In such
a case, there would be no callback at all.

>> I just can't imagine any design that has a use of several hierarchy, for
>> now. (when writing the OP I could imagine only one design, and now I've
>> already reached two designs, sketched above, so I'm progressing)
>
> When your parser sums two numbers it does not have + as a primitive
> operation. Does it? If numbers may have a hierarchy of their own, why
> renderer cannot? 

I'm actually not sure what a "primitive operation" actually is. I've
actually never heard of OOP-style class hierarchy among numbers in Ada,
so I might be missing something big.

Still, I can't imagine how to compartmentalize that whole Markdown
pipeline in such a way that I would need several class hierarchies.

>> As long as the design is the client calling exactly once the parser,
>> there is no need for an object to keep a hidden state, the local stack
>> is enough, isn't it?
>
> Maybe. Note that more states you glue together by maintaining single
> hierarchy of "everything", more difficult it becomes for single stack. BTW,
> parsing infix expressions is much simpler with two stacks. You might also
> wish not to use the standard stack for such things.

There I'm completely lost.

I didn't have the impression of gluing several states together in my
first design. I haven't never looked at infix expressions, and
fortunately Markdown doesn't contain any. And on top of that, I'm not
sure exactly what you are referring to with "standard stack"; is it a
special Ada concept, or is it the standard x86 stack where function
return information and parameters are stored?

> In Ada you lose nothing making it tagged. Differently to C++ where
> "virtual" has non-zero overhead, in Ada making a procedure primitive has
> zero performance cost, because the target will be resolved statically.

Interesting to know. But it still makes sense to not tag types that
don't need it, right?

It seems to boil down to a matter of extensibility. In that particular
case, I can't see any way of designing the parser in a way that it can
be extended into one of the few Markdown extensions I know. So I've
considered the parser to be basically non-extensible, meant only to be
re-used as is. Is it an intrinsically bad design, or can it be an
acceptable compromise?

Thanks for your comments,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-08 19:59         ` Simon Wright
@ 2011-04-12 16:13           ` Natasha Kerensikova
  2011-04-12 17:22             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-12 16:13 UTC (permalink / raw)


Hello,

On 2011-04-08, Simon Wright <simon@pushface.org> wrote:
> Natasha Kerensikova <lithiumcat@gmail.com> writes:
>
>>> Don't see what's wrong with providing a No_Op function which returns its
>>> input unchanged? unless you do something specific for a null?
>>
>> Yes, as I said in my reply to Dmitry, I thought of disabling in the
>> parser the feature associated to a callback set to null. In the
>> "foo *bar* baz" example I provided, using a No_Op for Emphasis would
>> result in "<p>foo  baz</p>", while disabling the emphasis feature means
>> considering the star as an inactive character, resulting in
>> "<p>foo *bar* baz</p>".
>
> I had in mind more that using my No_Op (I think it was Georg who used
> the better name Identity) would result in "<p>foo bar baz</p>".

Indeed, and this is *not* the result I want: if the star is supposed to
be a character like any other (and not the indicator of emphasis), then
it should be in the result too: "<p>foo *bar* baz</p>".

> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
> No_Op ("bar") would just return "bar".

And that's exactly why disabling emphasis should be done on the parser
level: the callback only has the semantic information ("render bar
emphasized") without any knowledge of how it was obtained (the source
could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
callback can reconstruct the intact input.


Natasha



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-12 15:58           ` Natasha Kerensikova
@ 2011-04-12 17:14             ` Dmitry A. Kazakov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-12 17:14 UTC (permalink / raw)


On Tue, 12 Apr 2011 15:58:04 +0000 (UTC), Natasha Kerensikova wrote:

> On 2011-04-08, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
>> On Fri, 8 Apr 2011 13:51:51 +0000 (UTC), Natasha Kerensikova wrote:
>>> Then, it so happens that the set of features matches exactly the set of
>>> callbacks.
>>
>> If the property is one of the callback, it is the callback's responsibility
>> to check it. Moving it to the parser is fragile design.
> 
> I don't think whether the access-to-subprogram provided to the parser
> being null or not is actually a property of the callback itself.

No, it is not. This is what makes such design fragile.

>> When your parser sums two numbers it does not have + as a primitive
>> operation. Does it? If numbers may have a hierarchy of their own, why
>> renderer cannot? 
> 
> I'm actually not sure what a "primitive operation" actually is. I've
> actually never heard of OOP-style class hierarchy among numbers in Ada,
> so I might be missing something big.

Nevertheless you are successfully using numbers with things which are not
numbers. Which is the point. Why a renderer cannot be used independently on
parser? At least for testing purpose?

> I didn't have the impression of gluing several states together in my
> first design.

Renderer, source/input, parser have logically independent states.
 
 I haven't never looked at infix expressions, and
> fortunately Markdown doesn't contain any. And on top of that, I'm not
> sure exactly what you are referring to with "standard stack"; is it a
> special Ada concept, or is it the standard x86 stack where function
> return information and parameters are stored?

Let's call it local stack.

>> In Ada you lose nothing making it tagged. Differently to C++ where
>> "virtual" has non-zero overhead, in Ada making a procedure primitive has
>> zero performance cost, because the target will be resolved statically.
> 
> Interesting to know. But it still makes sense to not tag types that
> don't need it, right?

No. IMO there are only three reasons why something is not tagged:

1. [premature] Optimization: tagged types are by-reference and have stored
tag.

2. Language problem: in Ada some types cannot be tagged.

3. Class-wide behavior, i.e. T'Class.

> It seems to boil down to a matter of extensibility.

Rather to the least constraint. You should not introduce constraints
without a reason.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-12 16:13           ` Natasha Kerensikova
@ 2011-04-12 17:22             ` Dmitry A. Kazakov
  2011-04-12 19:02               ` Simon Wright
  2011-04-12 21:54               ` Randy Brukardt
  0 siblings, 2 replies; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-12 17:22 UTC (permalink / raw)


On Tue, 12 Apr 2011 16:13:30 +0000 (UTC), Natasha Kerensikova wrote:

> On 2011-04-08, Simon Wright <simon@pushface.org> wrote:

>> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
>> No_Op ("bar") would just return "bar".
> 
> And that's exactly why disabling emphasis should be done on the parser
> level: the callback only has the semantic information ("render bar
> emphasized") without any knowledge of how it was obtained (the source
> could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
> callback can reconstruct the intact input.

That looks wrong to me. I think that obvious design is to move it to the
renderer:

   HTML.Emphasize ("bar")  -> "<b>bar</b>"
   ASCII.Emphasize ("bar")  -> "*bar*"
   Plain_Text.Emphasize ("bar")  -> "bar"
   Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
   ...

Parser just calls On_Emphasize, the implementation of routes it to the
renderer's Emphasize, which figures out what to do.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-12 17:22             ` Dmitry A. Kazakov
@ 2011-04-12 19:02               ` Simon Wright
  2011-04-13  8:20                 ` Natasha Kerensikova
  2011-04-12 21:54               ` Randy Brukardt
  1 sibling, 1 reply; 37+ messages in thread
From: Simon Wright @ 2011-04-12 19:02 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 12 Apr 2011 16:13:30 +0000 (UTC), Natasha Kerensikova wrote:
>
>> On 2011-04-08, Simon Wright <simon@pushface.org> wrote:
>
>>> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
>>> No_Op ("bar") would just return "bar".
>> 
>> And that's exactly why disabling emphasis should be done on the parser
>> level: the callback only has the semantic information ("render bar
>> emphasized") without any knowledge of how it was obtained (the source
>> could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
>> callback can reconstruct the intact input.
>
> That looks wrong to me. I think that obvious design is to move it to the
> renderer:
>
>    HTML.Emphasize ("bar")  -> "<b>bar</b>"
>    ASCII.Emphasize ("bar")  -> "*bar*"
>    Plain_Text.Emphasize ("bar")  -> "bar"
>    Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
>    ...
>
> Parser just calls On_Emphasize, the implementation of routes it to the
> renderer's Emphasize, which figures out what to do.

Clearly you and I think similarly here. But Natasha is (as well as being
the potential implementer) the customer.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-12 17:22             ` Dmitry A. Kazakov
  2011-04-12 19:02               ` Simon Wright
@ 2011-04-12 21:54               ` Randy Brukardt
  1 sibling, 0 replies; 37+ messages in thread
From: Randy Brukardt @ 2011-04-12 21:54 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:1ovsbvdul64pw$.1q49g3o7n296m$.dlg@40tude.net...
> On Tue, 12 Apr 2011 16:13:30 +0000 (UTC), Natasha Kerensikova wrote:
>
>> On 2011-04-08, Simon Wright <simon@pushface.org> wrote:
>
>>> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
>>> No_Op ("bar") would just return "bar".
>>
>> And that's exactly why disabling emphasis should be done on the parser
>> level: the callback only has the semantic information ("render bar
>> emphasized") without any knowledge of how it was obtained (the source
>> could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
>> callback can reconstruct the intact input.
>
> That looks wrong to me. I think that obvious design is to move it to the
> renderer:
>
>   HTML.Emphasize ("bar")  -> "<b>bar</b>"
>   ASCII.Emphasize ("bar")  -> "*bar*"
>   Plain_Text.Emphasize ("bar")  -> "bar"
>   Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
>   ...
>
> Parser just calls On_Emphasize, the implementation of routes it to the
> renderer's Emphasize, which figures out what to do.

Right, that's how it works in the Ada standard formatter tool. The output 
class takes a high-level representation of items and then writes them 
appropriately for the desired output.

                           Randy.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-12 19:02               ` Simon Wright
@ 2011-04-13  8:20                 ` Natasha Kerensikova
  2011-04-13  8:37                   ` Dmitry A. Kazakov
  2011-04-13 22:33                   ` Randy Brukardt
  0 siblings, 2 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-13  8:20 UTC (permalink / raw)

Hello,

On 2011-04-12, Simon Wright <simon@pushface.org> wrote:
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>
>> On Tue, 12 Apr 2011 16:13:30 +0000 (UTC), Natasha Kerensikova wrote:
>>
>>> On 2011-04-08, Simon Wright <simon@pushface.org> wrote:
>>
>>>> In other words, HTML.Emphasize ("bar") would return "<b>bar</b>", but
>>>> No_Op ("bar") would just return "bar".
>>> 
>>> And that's exactly why disabling emphasis should be done on the parser
>>> level: the callback only has the semantic information ("render bar
>>> emphasized") without any knowledge of how it was obtained (the source
>>> could have been "foo *bar* baz" or "foo _bar_ baz"). Therefore no
>>> callback can reconstruct the intact input.
>>
>> That looks wrong to me. I think that obvious design is to move it to the
>> renderer:
>>
>>    HTML.Emphasize ("bar")  -> "<b>bar</b>"
>>    ASCII.Emphasize ("bar")  -> "*bar*"
>>    Plain_Text.Emphasize ("bar")  -> "bar"
>>    Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
>>    ...
>>
>> Parser just calls On_Emphasize, the implementation of routes it to the
>> renderer's Emphasize, which figures out what to do.
>
> Clearly you and I think similarly here. But Natasha is (as well as being
> the potential implementer) the customer.

I'm afraid you're both missing the issue. Dmitry's example is clearly
about different kind of emphasis, while I'm talking about *disabling*
emphasis.

I understand that emphasis is such a mild feature that you might miss
the point of why disabling it at all, so let's consider another Markdown
feature: inline HTML tag. According to Markdown specification, when
encountering "<script>" in the input text, it is supposed to be output
as-is, i.e. generate a HTML script tag. If Markdown is used to format
untrusted input, for example blog comments, it makes sense to prevent
the whole system from outputing arbitrary HTML tags, right?

Moreover, in a context where it's clear that HTML tags are not active,
for example here, there are legitimate uses of text that looks like the
tag. So the "right" thing to do is, in my opinion, to treat the tag as
regular text and escape whatever needs to be escaped depending on the
target. Obviously, the first part is on parser level (e.g. calling
Normal_Text callback) while the second is on renderer level. Therefore,
to make the first part happen, the parser needs to be somehow instructed
that inline HTML feature is disabled.

Another feature that I disabled in blog comments is headers, but because
of aesthetics rather than security. It's an interesting example on top
of inline HTML, because there is technically no information lost in the
parser, so the renderer can theoretically reconstruct the input to make
believe the feature is disabled (though it would be a convoluted
renderer, the information of whether it is a span-level vs block-level
HTML tag is difficult (but not impossible) to reconstruct without
peeking into parser internal state (which we agree is not an option)).
On the other hand, Markdown allows two formats to specify headers, which
are indistinguishable in the renderer, which makes it impossible for it
to fake the feature removal.

These dangerous features are what made me want to cripple the parser in
the first place, and I thought it makes no sense to allow only a few
features to be disabled when I can just as easily allow all of them to
be independently turned on or off -- hence my example of disabling
emphasis.

Are my motivations clearer now, or is it still just a whim of the
customer imposing a fragile design?

Thanks for your comments,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-13  8:20                 ` Natasha Kerensikova
@ 2011-04-13  8:37                   ` Dmitry A. Kazakov
  2011-04-13 11:06                     ` Georg Bauhaus
  2011-04-13 22:33                   ` Randy Brukardt
  1 sibling, 1 reply; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-13  8:37 UTC (permalink / raw)


On Wed, 13 Apr 2011 08:20:59 +0000 (UTC), Natasha Kerensikova wrote:

> On 2011-04-12, Simon Wright <simon@pushface.org> wrote:
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>
>>>    HTML.Emphasize ("bar")  -> "<b>bar</b>"
>>>    ASCII.Emphasize ("bar")  -> "*bar*"
>>>    Plain_Text.Emphasize ("bar")  -> "bar"
>>>    Gtk_Text_Buffer.Emphasize ("bar")  -> sets tags around the text slice
>>>    ...
>>>
>>> Parser just calls On_Emphasize, the implementation of routes it to the
>>> renderer's Emphasize, which figures out what to do.
>>
>> Clearly you and I think similarly here. But Natasha is (as well as being
>> the potential implementer) the customer.
> 
> I'm afraid you're both missing the issue. Dmitry's example is clearly
> about different kind of emphasis, while I'm talking about *disabling*
> emphasis.

Which is a feature of the renderer:

   type Emphasis_Mode is (Off, On);

   HTML.Set_Emphasis (Off);
   ASCII.Set_Emphasis (On);
   ...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-13  8:37                   ` Dmitry A. Kazakov
@ 2011-04-13 11:06                     ` Georg Bauhaus
  2011-04-13 12:46                       ` Dmitry A. Kazakov
  0 siblings, 1 reply; 37+ messages in thread
From: Georg Bauhaus @ 2011-04-13 11:06 UTC (permalink / raw)


On 4/13/11 10:37 AM, Dmitry A. Kazakov wrote:
> On Wed, 13 Apr 2011 08:20:59 +0000 (UTC), Natasha Kerensikova wrote:
>
>> On 2011-04-12, Simon Wright<simon@pushface.org>  wrote:
>>> "Dmitry A. Kazakov"<mailbox@dmitry-kazakov.de>  writes:
>>>
>>>>     HTML.Emphasize ("bar")  ->  "<b>bar</b>"
>>>>     ASCII.Emphasize ("bar")  ->  "*bar*"
>>>>     Plain_Text.Emphasize ("bar")  ->  "bar"
>>>>     Gtk_Text_Buffer.Emphasize ("bar")  ->  sets tags around the text slice
>>>>     ...
>>>>
>>>> Parser just calls On_Emphasize, the implementation of routes it to the
>>>> renderer's Emphasize, which figures out what to do.
>>>
>>> Clearly you and I think similarly here. But Natasha is (as well as being
>>> the potential implementer) the customer.
>>
>> I'm afraid you're both missing the issue. Dmitry's example is clearly
>> about different kind of emphasis, while I'm talking about *disabling*
>> emphasis.
>
> Which is a feature of the renderer:
>
>     type Emphasis_Mode is (Off, On);
>
>     HTML.Set_Emphasis (Off);
>     ASCII.Set_Emphasis (On);
>     ...
>

Are you sure?  The design at issue might be that there are
many possible mappings from input symbols to, let's
say tokens, and then tokens are rendered into some format,
and in multiple ways per format.

Thus for example, with just HTML as output format, the software's
intended function would be configurable to handle all of these
cases (I might be making some up, though):

*x* -> T'(Bold, 'x') -> "<strong>x</strong>"
*x* -> T'(Bold, 'x') -> "*x*"
*x* -> T'(Normal, 'x') -> "x"
*x* -> T'(Empty, No_Character) -> ""
*x* -> T'(Bold, No_Character) -> "<strong></strong>"

So *x* is parsed in different ways, and T'(X, C)
is rendered in different ways, even though there is
just one input text and just one output format (in this
example; there would another multiple of these for other
output formats).

So either mapping would be configurable: both the parser
and the renderer.

The question then becomes, is it the best idea to configure
the parser by selecting rendering callbacks and testing
them for null?




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-13 11:06                     ` Georg Bauhaus
@ 2011-04-13 12:46                       ` Dmitry A. Kazakov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry A. Kazakov @ 2011-04-13 12:46 UTC (permalink / raw)


On Wed, 13 Apr 2011 13:06:16 +0200, Georg Bauhaus wrote:

> On 4/13/11 10:37 AM, Dmitry A. Kazakov wrote:
>> On Wed, 13 Apr 2011 08:20:59 +0000 (UTC), Natasha Kerensikova wrote:
>>
>>> On 2011-04-12, Simon Wright<simon@pushface.org>  wrote:
>>>> "Dmitry A. Kazakov"<mailbox@dmitry-kazakov.de>  writes:
>>>>
>>>>>     HTML.Emphasize ("bar")  ->  "<b>bar</b>"
>>>>>     ASCII.Emphasize ("bar")  ->  "*bar*"
>>>>>     Plain_Text.Emphasize ("bar")  ->  "bar"
>>>>>     Gtk_Text_Buffer.Emphasize ("bar")  ->  sets tags around the text slice
>>>>>     ...
>>>>>
>>>>> Parser just calls On_Emphasize, the implementation of routes it to the
>>>>> renderer's Emphasize, which figures out what to do.
>>>>
>>>> Clearly you and I think similarly here. But Natasha is (as well as being
>>>> the potential implementer) the customer.
>>>
>>> I'm afraid you're both missing the issue. Dmitry's example is clearly
>>> about different kind of emphasis, while I'm talking about *disabling*
>>> emphasis.
>>
>> Which is a feature of the renderer:
>>
>>     type Emphasis_Mode is (Off, On);
>>
>>     HTML.Set_Emphasis (Off);
>>     ASCII.Set_Emphasis (On);
>>     ...
> 
> Are you sure?

Yes, if disabling emphasis was the issue.

> The design at issue might be that there are
> many possible mappings from input symbols to,

That should not change the design. The mapping belongs neither to the
parser nor to the renderer. The parser calls On_Emphasis. That may look
into the current mapping and finally call to

    Renderer.Unformatted_Text (Text).

> The question then becomes, is it the best idea to configure
> the parser by selecting rendering callbacks and testing
> them for null?

The best idea is to decouple mapping from both. Parsers usually deal with
some kind of intermediate object (syntax tree, operand/argument stacks,
etc), which in this case were the mapping object:

   Source --> Parser --> Context
   |_ File               |_ Computation
   |_ Stream             |_ Compilation
   |_ String             |_ Markup --> Renderer
                                       |_ HTML
                                       |_ ...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-13  8:20                 ` Natasha Kerensikova
  2011-04-13  8:37                   ` Dmitry A. Kazakov
@ 2011-04-13 22:33                   ` Randy Brukardt
  2011-04-14  6:55                     ` Natasha Kerensikova
  1 sibling, 1 reply; 37+ messages in thread
From: Randy Brukardt @ 2011-04-13 22:33 UTC (permalink / raw)


"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu...
...
> These dangerous features are what made me want to cripple the parser in
> the first place, and I thought it makes no sense to allow only a few
> features to be disabled when I can just as easily allow all of them to
> be independently turned on or off -- hence my example of disabling
> emphasis.
>
> Are my motivations clearer now, or is it still just a whim of the
> customer imposing a fragile design?

Your intentions are fine, but I still don't think you should be trying to 
modify the behavior of the parser; that's the job for the "interpretation" 
layer. Maybe that's because of my compiler background, but what you are 
trying to do is very similar to a compiler, or to the Ada Standard 
formatter, or many other batch-oriented tools.

In those sort of tools, the parser (input layer) simply organizes the 
information from the input into a common form. It's the layer that sits 
between the input and the output layer (render in your case) that does the 
operations that depend on things other than the input itself. It's highly 
unlikely that you could avoid having such a layer at all (something has to 
connect the input and the output), and this is the place to do stuff that 
does not clearly have to do with the input or the output (such as 
transformations).

In your specific case, I believe that preventing "execution" of embedded 
HTML and the like is the job of the output layer (renderer), because that 
way it is impossible to forget a case and allow something through. In the RM 
Formatter tool, that is accomplished by having all text that is intended to 
be visible in the output format go through a particular output interface: 
"Ordinary_Text". And that interface is responsible for quoting any 
characters that might be interpreted as commands ("<", ">", "&" for HTML, 
"\" for RTF, and so on.) You would have a separate interface for anything 
that you wanted to output directly (so that it could be executed), such as 
your script example.

It's very important that you isolate all of the rendering in a single 
interface, so that if you have to track down a bug caused by allowing 
something bad into the output (and trust me, you will :-), you only need to 
look in a single place for the problem. You don't want to have to try to 
figure out whether the parser should have prevented the problem, or the 
output layer, or something else, because it's really easy to think that some 
other layer should handle something. (This is especially a problem in 
multi-person projects, where fixing something is always someone else's 
responsibility.) If the rule is that the renderer should always making 
everything it outputs harmless unless it is explicitly instructed otherwise, 
you'll have a lot less trouble.

To take an example, an Ada compiler doesn't "modify the behavior of the 
parser" to deal with comments or strings in the source; these are treated as 
single elements and aren't parsed at all. If one of these needs to be 
output, it will just be output with the renderer making any transformations 
needed to keep the output safe. Thus, there is no need to look inside of 
these constructs to see what is in them.

Similarly, the handling of the command language for the RM formatter doesn't 
change. What option settings do is change the actual effect of the various 
commands, and choose particular input and output formats (such as the source 
files to use, and whether to output in HTML or RTF or something else).

As previously suggested, look at the design of the RM Formatter to see one 
way to do this.

                                           Randy.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-13 22:33                   ` Randy Brukardt
@ 2011-04-14  6:55                     ` Natasha Kerensikova
  2011-04-15  0:22                       ` Randy Brukardt
  0 siblings, 1 reply; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-14  6:55 UTC (permalink / raw)

Hello,

On 2011-04-13, Randy Brukardt <randy@rrsoftware.com> wrote:
> "Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
> news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu...
> ...
>> These dangerous features are what made me want to cripple the parser in
>> the first place, and I thought it makes no sense to allow only a few
>> features to be disabled when I can just as easily allow all of them to
>> be independently turned on or off -- hence my example of disabling
>> emphasis.
>>
>> Are my motivations clearer now, or is it still just a whim of the
>> customer imposing a fragile design?
>
> Your intentions are fine, but I still don't think you should be trying to 
> modify the behavior of the parser; that's the job for the "interpretation" 
> layer. Maybe that's because of my compiler background, but what you are 
> trying to do is very similar to a compiler, or to the Ada Standard 
> formatter, or many other batch-oriented tools.

Well, I intended to do both, modify the parser behavior and put some
logic on the interpretation/output layer.

Isn't it the parser role to tell whether the string "<script>" is normal
text or an HTML tag? That's the kind of modification I was thinking
about.

Isn't it the HTML renderer role to escape angular bracket when the
script "<script>" is normal text? I believe it is, because the escaping
is HTML-specific. It wouldn't need the same escaping if the output was
PDF, for example.

Isn't it again the renderer role to make whatever sense it can out of a
"<script>" tag depending on the output format? For HTML output it's a
simple copy, but it seems non-trivial for a PDF output, and impossible
for a plain-text output. But that's not something for the parser to
worry about.

> In your specific case, I believe that preventing "execution" of embedded 
> HTML and the like is the job of the output layer (renderer), because that 
> way it is impossible to forget a case and allow something through. In the RM 
> Formatter tool, that is accomplished by having all text that is intended to 
> be visible in the output format go through a particular output interface: 
> "Ordinary_Text". And that interface is responsible for quoting any 
> characters that might be interpreted as commands ("<", ">", "&" for HTML, 
> "\" for RTF, and so on.) You would have a separate interface for anything 
> that you wanted to output directly (so that it could be executed), such as 
> your script example.

In my case, escaping special character like angular bracket so that they
are considered normal text when it is normal text, is indeed something
on the renderer level. But this is different from enabling or disabling
language features.

> If the rule is that the renderer should always making everything it
> outputs harmless unless it is explicitly instructed otherwise, you'll
> have a lot less trouble.

I never intended not to follow that rule. But a script tag *is*
harmless, if the input can be trusted.

Now if it was a matter of forbidding specifically the script-tag, while
allowing others deemed "harmless", then I agree it should be done on the
renderer level. But changing the language grammar to wipe out the very
concept of inline HTML tag is definitely something to be handled in the
parser.

> To take an example, an Ada compiler doesn't "modify the behavior of the 
> parser" to deal with comments or strings in the source; these are treated as 
> single elements and aren't parsed at all. If one of these needs to be 
> output, it will just be output with the renderer making any transformations 
> needed to keep the output safe. Thus, there is no need to look inside of 
> these constructs to see what is in them.

Does an Ada compiler modify the behavior of the parser when selecting
Ada83 vs Ada95 vs Ada05? That's exactly what this is about here: it's
different feature sets, except that for convenience and coherence the
features are not enabled or disabled individually.

The standard Markdown grammar might look like this:

...
Span_Element ::= Normal_Text | Emphasis | Code_Span | ...
Emphasis ::= "*" Span_Element "*" | "_" Span_Element "_"
Code_Span ::= "`" Inner_Code_Span "`"
Inner_Code_Span ::= Code_Text | Code_Span
...

Now when I'm talking about "disabling emphasis", I mean parsing the
following grammar instead:

...
Span_Element ::= Normal_Text | Code_Span | ...
Code_Span ::= "`" Inner_Code_Span "`"
Inner_Code_Span ::= Code_Text | Code_Span
...

This is of course very different from "rendering emphasis spans like
normal text" or "apply no formatting to mark emphasis" or whatever. It's
just ensuring that the feature cannot cause any harm by preventing its
very existence. How can you make it any safer than that?

Thanks for your insights,
Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-14  6:55                     ` Natasha Kerensikova
@ 2011-04-15  0:22                       ` Randy Brukardt
  0 siblings, 0 replies; 37+ messages in thread
From: Randy Brukardt @ 2011-04-15  0:22 UTC (permalink / raw)

"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrniqd6if.2fnq.lithiumcat@sigil.instinctive.eu...
> Hello,
>
> On 2011-04-13, Randy Brukardt <randy@rrsoftware.com> wrote:
>> "Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message
>> news:slrniqan7b.2fnq.lithiumcat@sigil.instinctive.eu...
>> ...
>>> These dangerous features are what made me want to cripple the parser in
>>> the first place, and I thought it makes no sense to allow only a few
>>> features to be disabled when I can just as easily allow all of them to
>>> be independently turned on or off -- hence my example of disabling
>>> emphasis.
>>>
>>> Are my motivations clearer now, or is it still just a whim of the
>>> customer imposing a fragile design?
>>
>> Your intentions are fine, but I still don't think you should be trying to
>> modify the behavior of the parser; that's the job for the 
>> "interpretation"
>> layer. Maybe that's because of my compiler background, but what you are
>> trying to do is very similar to a compiler, or to the Ada Standard
>> formatter, or many other batch-oriented tools.
>
> Well, I intended to do both, modify the parser behavior and put some
> logic on the interpretation/output layer.
>
> Isn't it the parser role to tell whether the string "<script>" is normal
> text or an HTML tag? That's the kind of modification I was thinking
> about.

Not really, but I suspect that we are talking about different things. To me, 
"a parser" is a very specific piece of technology (yacc being one example, 
but of course they can be hand-coded as well). These days, people seem to be 
lumping a lot of non-parser stuff into the term "parser". To take a concrete 
example, an "XML parser" is some very complex piece of software. But hardly 
anything it does has anything to do with parsing! The syntax of XML is so 
simply that no parser is actually needed, just a smart scanner. (None of my 
HTML tools [nor the RM tool] have a formal parser, because the input 
language is so simple that a couple of helpers in the scanner is 
sufficient.)

Anyway, it doesn't make sense to "modify" a parser, because that implies 
that you are taking a different input grammer. And doing that means that you 
have a *different* parser. It might make sense in some circumstances to take 
multiple input languages, but I would consider that (and implement that) as 
a forest of different parsers (one per grammar) with a common output format. 
(That is, I would create an abstract Parser type, and then create a separate 
derived object to represent the specific parser. Again, look at the RM 
formatter code to see how I did it there.)

It might make sense to "modify" a scanner or some other phase, but there too 
the best organization probably is a forest of object (chose the right one 
for the job). If there was a minor difference, I'd probably control it with 
an "options" parameter when the object is set up.

> Isn't it the HTML renderer role to escape angular bracket when the
> script "<script>" is normal text? I believe it is, because the escaping
> is HTML-specific. It wouldn't need the same escaping if the output was
> PDF, for example.

Yes, see the rest of my message.

> Isn't it again the renderer role to make whatever sense it can out of a
> "<script>" tag depending on the output format? For HTML output it's a
> simple copy, but it seems non-trivial for a PDF output, and impossible
> for a plain-text output. But that's not something for the parser to
> worry about.

Yes, this is exactly what I was suggesting.

>> In your specific case, I believe that preventing "execution" of embedded
>> HTML and the like is the job of the output layer (renderer), because that
>> way it is impossible to forget a case and allow something through. In the 
>> RM
>> Formatter tool, that is accomplished by having all text that is intended 
>> to
>> be visible in the output format go through a particular output interface:
>> "Ordinary_Text". And that interface is responsible for quoting any
>> characters that might be interpreted as commands ("<", ">", "&" for HTML,
>> "\" for RTF, and so on.) You would have a separate interface for anything
>> that you wanted to output directly (so that it could be executed), such 
>> as
>> your script example.
>
> In my case, escaping special character like angular bracket so that they
> are considered normal text when it is normal text, is indeed something
> on the renderer level. But this is different from enabling or disabling
> language features.

Right. I normally do that in the middle layer. That is, the parser returns 
the structures that it finds, and then the middle layer decides what to do 
with (including ignoring them).

But I wouldn't even consider trying to allow "commands" or whatever you are 
trying to parse in the input. I've always required them to be escaped 
somehow. So perhaps we're solving different problems.

>> If the rule is that the renderer should always making everything it
>> outputs harmless unless it is explicitly instructed otherwise, you'll
>> have a lot less trouble.
>
> I never intended not to follow that rule. But a script tag *is*
> harmless, if the input can be trusted.

The number one rule of secure programming is that *no* input can be trusted. 
Yes, we all violate that from time-to-time, but it is a good rule to keep in 
mind.

> Now if it was a matter of forbidding specifically the script-tag, while
> allowing others deemed "harmless", then I agree it should be done on the
> renderer level. But changing the language grammar to wipe out the very
> concept of inline HTML tag is definitely something to be handled in the
> parser.

As I said, that's a *different* parser from one that supports HTML. I'd use 
a different object to represent each, rather than trying to share them.

>> To take an example, an Ada compiler doesn't "modify the behavior of the
>> parser" to deal with comments or strings in the source; these are treated 
>> as
>> single elements and aren't parsed at all. If one of these needs to be
>> output, it will just be output with the renderer making any 
>> transformations
>> needed to keep the output safe. Thus, there is no need to look inside of
>> these constructs to see what is in them.
>
> Does an Ada compiler modify the behavior of the parser when selecting
> Ada83 vs Ada95 vs Ada05? That's exactly what this is about here: it's
> different feature sets, except that for convenience and coherence the
> features are not enabled or disabled individually.

No, absolutely not. There is only one grammar for the compiler (an extended 
Ada 2005); anything not supported is flagged by the middle layer (the 
semantic pass).

There is a practical reason for this; error handling by parsers tends to be 
somewhere between sorta OK and terrible. We can provide much more targeted 
error messages (like "Silly programmer, you used not null, an Ada 2005 
feature, in your Ada 83 program" :-) by putting them into the middle pass.

That's probably one reason that I tend to avoid parsing at all when possible 
(just keeping the scanning part).

> The standard Markdown grammar might look like this:
>
> ...
> Span_Element ::= Normal_Text | Emphasis | Code_Span | ...
> Emphasis ::= "*" Span_Element "*" | "_" Span_Element "_"
> Code_Span ::= "`" Inner_Code_Span "`"
> Inner_Code_Span ::= Code_Text | Code_Span
> ...
>
> Now when I'm talking about "disabling emphasis", I mean parsing the
> following grammar instead:
>
> ...
> Span_Element ::= Normal_Text | Code_Span | ...
> Code_Span ::= "`" Inner_Code_Span "`"
> Inner_Code_Span ::= Code_Text | Code_Span
> ...
>
> This is of course very different from "rendering emphasis spans like
> normal text" or "apply no formatting to mark emphasis" or whatever. It's
> just ensuring that the feature cannot cause any harm by preventing its
> very existence. How can you make it any safer than that?

As I said, these are thus different parsers. The table-driven parsers that I 
typically use can't be modified, so the issue never comes up. OTOH, if the 
grammar is simple enough that a hand-written parser would do, I probably 
would write a parser at all and just use the scanner directly (that's what 
the RM Formatter does).

And I'd control the scanner/parser by a combination of separate objects and 
a parameter to the create object routine to set whatever settings.

But then again, I hate call-back subprograms, and would only use them when 
there is no other solution. An OO solution would work well here, so I don't 
see any reason to use unstructured call-backs. Thus, using them as some sort 
of parameter control isn't an idea that I would ever intend to use (and it 
seems unnecessarily tricky on top of that). At best, it's premature 
optimization (you're saving one byte somewhere, and perhaps one compare 
instruction, although on a lot of architechtures, it probably doesn't save 
any instructions).

                                                                      Randy.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-06 10:11 Parser interface design Natasha Kerensikova
                   ` (4 preceding siblings ...)
  2011-04-08 11:16 ` Brian Drummond
@ 2011-04-19  9:08 ` Natasha Kerensikova
  2011-04-19 12:35   ` Ludovic Brenta
  2011-04-19 17:28   ` Jeffrey Carter
  5 siblings, 2 replies; 37+ messages in thread
From: Natasha Kerensikova @ 2011-04-19  9:08 UTC (permalink / raw)

Hello,

On 2011-04-06, Natasha Kerensikova <lithiumcat@gmail.com> wrote:
> before wasting too much time of anybody, I want to make it clear that
> I'm asking about interface design in Ada which might actually never turn
> up into real code. I'm still unsure about ever writing Ada code, and how
> bad my previous thread went playing no small part in that. However I'm
> still curious about how this particular problem might be solved in Ada.
>
> [...]
>
> So what would be the best approach to interface a parser and a renderer?

It turns out that I'm coming out of this discussion in a state that is
surprisingly similar to that which I had when coming out of my previous
thread here (about S-expressions).

From the replies I had, it seems I'm clearly, obviously and deeply wrong
according to everybody, and yet I can't even begin to understand what is
wrong and it which way it is wrong.

It looks like I'm too stupid to use Ada, so I guess I should rather keep
writing my crappy code in C and stop bothering people with my flawed
ideas.

Thanks a lot for all your comments, and I sincerely apologise for having
wasted so much of your time. I should have known I don't have what it
takes.

Natasha

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-19  9:08 ` Natasha Kerensikova
@ 2011-04-19 12:35   ` Ludovic Brenta
  2011-04-20 10:44     ` Brian Drummond
  2011-04-19 17:28   ` Jeffrey Carter
  1 sibling, 1 reply; 37+ messages in thread
From: Ludovic Brenta @ 2011-04-19 12:35 UTC (permalink / raw)

Natasha Kerensikova wrote on comp.lang.ada:
> Hello,
>
> On 2011-04-06, Natasha Kerensikova <lithium...@gmail.com> wrote:
>
>> before wasting too much time of anybody, I want to make it clear that
>> I'm asking about interface design in Ada which might actually never turn
>> up into real code. I'm still unsure about ever writing Ada code, and how
>> bad my previous thread went playing no small part in that. However I'm
>> still curious about how this particular problem might be solved in Ada.
>
>> [...]
>
>> So what would be the best approach to interface a parser and a renderer?
>
> It turns out that I'm coming out of this discussion in a state that is
> surprisingly similar to that which I had when coming out of my previous
> thread here (about S-expressions).
>
> From the replies I had, it seems I'm clearly, obviously and deeply wrong
> according to everybody, and yet I can't even begin to understand what is
> wrong and it which way it is wrong.
>
> It looks like I'm too stupid to use Ada, so I guess I should rather keep
> writing my crappy code in C and stop bothering people with my flawed
> ideas.
>
> Thanks a lot for all your comments, and I sincerely apologise for having
> wasted so much of your time. I should have known I don't have what it
> takes.
>
> Natasha

I have not had the time to follow this thread (therefore, rest assured
you have not wasted any of my time -- other things have) but I must
say I am saddened by the outcome.  I think you do have what it takes
to be a good software engineer: the ability to think in abstract terms
and the will to create better designs (as opposed to haphazardly
cobbling code together).
Also, I do not think this is a language problem.  What you can write
in C, you can also write in Ada.  I suggest you do that, without too
much concern about the beauty of your design, to get a good practical
working knowledge of Ada.  In a second step, try to eliminate as many
pointers as you can.  That will certainly give you ideas for a better
design.

--
Ludovic Brenta.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-19  9:08 ` Natasha Kerensikova
  2011-04-19 12:35   ` Ludovic Brenta
@ 2011-04-19 17:28   ` Jeffrey Carter
  1 sibling, 0 replies; 37+ messages in thread
From: Jeffrey Carter @ 2011-04-19 17:28 UTC (permalink / raw)

On 04/19/2011 02:08 AM, Natasha Kerensikova wrote:
>
> It looks like I'm too stupid to use Ada, so I guess I should rather keep
> writing my crappy code in C and stop bothering people with my flawed
> ideas.

No, if you're too stupid to use Ada, you shouldn't be developing S/W at all.

I don't think from what I've read that you're too stupid to use Ada. You have a 
strong C background, which colors your approach to things. For example, C uses 
lots of visible pointers, while good Ada people try to avoid them, and never 
make them visible if that's at all possible. If you present a C-like approach, 
people here will probably find fault with it.

Then there are philosophical differences from people who post here. Kazakov 
wants everything to be implemented using programming by extension, and dislikes 
generics. I avoid programming by extension whenever possible, and make extensive 
use of generics. Others have other views. None of us are shy about expressing 
our opinions.

So nothing you present here will receive 100% approval. Just because some of us 
don't like something doesn't mean it's bad.

-- 
Jeff Carter
"We call your door-opening request a silly thing."
Monty Python & the Holy Grail
17

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Parser interface design
  2011-04-19 12:35   ` Ludovic Brenta
@ 2011-04-20 10:44     ` Brian Drummond
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Drummond @ 2011-04-20 10:44 UTC (permalink / raw)

On Tue, 19 Apr 2011 05:35:12 -0700, Ludovic Brenta wrote:

> Natasha Kerensikova wrote on comp.lang.ada:

>>> So what would be the best approach

>> From the replies I had, it seems I'm clearly, obviously and deeply
>> wrong according to everybody, and yet I can't even begin to understand
>> what is wrong and it which way it is wrong.

May I please echo the other cries of NO! and add:

perhaps the problem was in asking "the best" of something, anything, 
anywhere. That is usually doomed to failure by drowning in a sea of 
mutually contradictory opinions...

> Also, I do not think this is a language problem.  What you can write in
> C, you can also write in Ada.  
... and probably better, or at least more safely.

I believe the approach you first posited was a perfectly viable one, and 
has the great merit of being one you understand well. 

I am also learning Ada (though after C and C++ it feels more like healing 
to me) and typically I'll write mostly what I am familiar with : picking 
and learning ONE new technique in the process : such as using generics 
for your callbacks.

Best will come in time; meanwhile, I am satisfied with good enough, and 
learning one step at a time.

Hoping not to discourage,
- Brian

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-04-20 10:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-06 10:11 Parser interface design Natasha Kerensikova
2011-04-06 12:17 ` Georg Bauhaus
2011-04-07 18:56   ` Natasha Kerensikova
2011-04-08 11:49     ` Stephen Leake
2011-04-06 12:20 ` Dmitry A. Kazakov
2011-04-07 19:14   ` Natasha Kerensikova
2011-04-07 20:31     ` Dmitry A. Kazakov
2011-04-08 13:51       ` Natasha Kerensikova
2011-04-08 14:21         ` Dmitry A. Kazakov
2011-04-12 15:58           ` Natasha Kerensikova
2011-04-12 17:14             ` Dmitry A. Kazakov
2011-04-06 15:51 ` Georg Bauhaus
2011-04-07 19:44   ` Natasha Kerensikova
2011-04-07 20:52     ` Dmitry A. Kazakov
2011-04-07 22:09     ` Simon Wright
2011-04-08 14:03       ` Natasha Kerensikova
2011-04-08 19:06         ` Jeffrey Carter
2011-04-08 19:59         ` Simon Wright
2011-04-12 16:13           ` Natasha Kerensikova
2011-04-12 17:22             ` Dmitry A. Kazakov
2011-04-12 19:02               ` Simon Wright
2011-04-13  8:20                 ` Natasha Kerensikova
2011-04-13  8:37                   ` Dmitry A. Kazakov
2011-04-13 11:06                     ` Georg Bauhaus
2011-04-13 12:46                       ` Dmitry A. Kazakov
2011-04-13 22:33                   ` Randy Brukardt
2011-04-14  6:55                     ` Natasha Kerensikova
2011-04-15  0:22                       ` Randy Brukardt
2011-04-12 21:54               ` Randy Brukardt
2011-04-07 22:13     ` Georg Bauhaus
2011-04-08 15:30       ` Natasha Kerensikova
2011-04-07  0:36 ` Randy Brukardt
2011-04-08 11:16 ` Brian Drummond
2011-04-19  9:08 ` Natasha Kerensikova
2011-04-19 12:35   ` Ludovic Brenta
2011-04-20 10:44     ` Brian Drummond
2011-04-19 17:28   ` Jeffrey Carter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox