Starter project: getopt

comp.lang.ada
 help / color / mirror / Atom feed

* Starter project: getopt_long in Ada
@ 2011-11-25 10:47 Natasha Kerensikova
  2011-11-25 16:39 ` Georg Bauhaus
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-25 10:47 UTC (permalink / raw)


Hello,

I have been tinkering with Ada for a while now, and I'm starting to feel
ready to write code that is (somewhat) useful, and to publish it.

I'm starting with a pure-Ada implementation of a getopt_long-like
command line argument processor. The motivation is that next project
will like have a command-line interface (before I go into other fancy
I/O), and I like having long option names, especially in scripts I'm
likely to re-read in the future. While I have found a few
implementations of getopt, for getopt_long I have only found binding to
the C function, which involves C strings and C arrays.

So I wrote a Getopt_Long package, along with a test suite, that can be
found at:
http://fossil.instinctive.eu/natools/dir?ci=tip
(I went against fossil advice of requiring going through a CAPTCHA
before reading the files, it might change in the future if non-human
hits do strain the server.)

Everything is under ISC licence, so anybody can re-use it, even though
it's currently still very young and under development.

I would gladly welcome any constructive comment about this code, whether
it's on the design, the implementation, the style, or anything else. The
only reason for publishing it at this point is to ask for feedback in
order to progress. I'm aware that a code review is a lot to ask, but I
would really welcome any help towards improving my mastery of Ada.


Thanks in advance for your comments and your help,
Natasha



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-25 10:47 Starter project: getopt_long in Ada Natasha Kerensikova
@ 2011-11-25 16:39 ` Georg Bauhaus
  2011-11-26 18:13   ` Natasha Kerensikova
  2011-11-27  8:21   ` Yannick Duchêne (Hibou57)
  2011-11-27  8:05 ` Yannick Duchêne (Hibou57)
  2011-11-27  8:09 ` Yannick Duchêne (Hibou57)
  2 siblings, 2 replies; 15+ messages in thread
From: Georg Bauhaus @ 2011-11-25 16:39 UTC (permalink / raw)

On 25.11.11 11:47, Natasha Kerensikova wrote:

> So I wrote a Getopt_Long package, along with a test suite, that can be
> found at:
> http://fossil.instinctive.eu/natools/dir?ci=tip

Looking at Natools.Getopt_Long.Process, there is one design
that I think is less clear than could be, seen from the use
case. If I were to write the event handling procedures, I'd be
noticing that the current design suggests writing a larger number
of "linguistically" unconnected subprograms to be passed
to Process. But, in fact, they will likely be logically connected
subprograms, all to do with parsing the same command line.

I'd prefer these handlers to be "linguistically" collected in
one place (and a different sets of handlers collected in
different places, respectively). In order to achieve this,
an abstract type can state the list of handling procedures,
with null defaults for operations the programmer does not
wish to do anything special. Defining this type has the
advantages that it makes programmers write software organized
into typed structure, yielding

1) definite places to look for the handling procedures
2) handlers that can share common data for communicating
   parsing state among them
3) human readers that know about 1 and 2 because they can
   look for concrete types derived from the abstract type.

I think the advantages outweigh a minor loss in flexibility,
if you'd want to call flexibility the ability to pass a
collection of subprograms each declared just anywhere.

Influencing the shape of this type, Process's parameters
such as Posixly_Correct might mean that the handling
procedures may want to determine whether or not Posixly_Correct
is true.

So, then, my first idea will be along these lines:

package Natools.Handlers is

    type Id_And_Argument_Handler
       (Posixly_Correct :  Boolean;
        ...)
    is abstract tagged private;

    procedure Callback
        (Handler : in out Id_And_Argument_Handler;
         Id : Option_Id;
         Argument : String) is abstract;

    procedure Missing_Argument
        (Handler : in out Id_And_Argument_Handler;
         Id : Option_Id) is null;

    etc.

end Natools.Handlers;

Then, the parameter profile of Natools.Getopt_Long.Process
beceomes shorter:

package Natools.Getopt_Long is

    etc.

    procedure Process
       (...
        Actions : Id_And_Argument_Handler'Class;
        ...);

    etc.

end Natools.Getopt_Long;

I did not look at the implementation.

- Georg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-25 16:39 ` Georg Bauhaus
@ 2011-11-26 18:13   ` Natasha Kerensikova
  2011-11-26 20:47     ` Jeffrey Carter
  2011-11-28 15:49     ` Georg Bauhaus
  2011-11-27  8:21   ` Yannick Duchêne (Hibou57)
  1 sibling, 2 replies; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-26 18:13 UTC (permalink / raw)

Hello,

On 2011-11-25, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> I'd prefer these handlers to be "linguistically" collected in
> one place (and a different sets of handlers collected in
> different places, respectively).

I completely agree that the error handler are indeed connected. I'm
still genuinely wondering whether they should be grouped with the
argument callback like you suggest or not.

> In order to achieve this,
> an abstract type can state the list of handling procedures,
> with null defaults for operations the programmer does not
> wish to do anything special.

You seem to be missing one important point about these handlers,
probably because it's not documented well enough: the error callbacks
are currently defaulting to null, but that null does not mean nothing is
performed when such an error is encountered (i.e. a null procedure), it
means that an Option_Error exception is raised.

In most use cases I can imagine, an application encountering an error in
argument parsing will just print a "usage" text and exit with an error
code, that can be done simply by catching Natools.Getopt_Long.Option_Error
exception.

However some tools not only print the generic "usage" text, they also
print what was wrong in the given arguments. Some tools even do so in a
runtime-selected language (e.g. locale facilities), which prevents using
blindly the exception message. That's how I came up with the callbacks,
which allows to communicate what went wrong (and then the ability to
recover from such errors was a happy side-effect I decided to keep; but
I have never encountered any command-line tool that does recover from
command-line argument errors).

That's why I'm wondering (and implicitly asking your opinion) whether
the "normal" callback should be bundled with the error handling
callbacks.

All applications will have to provide a useful callback for argument
processing, even when they don't do any error handling beyond printing
"usage" text and aborting. So maybe it would make sense to keep the
access-to-procedure Callback argument, and create an
Option_Error_Handler class, whose default value in Process would be
something that just raises Option_Error (which would by the way make
Process body cleaner by not having to check null access).

On the other hand, I could also go for an Option_Handler abstract class,
with abstract regular Callback and with overrideable implementations for
error handlers that raise Option_Error.

> Defining this type has the
> advantages that it makes programmers write software organized
> into typed structure, yielding
>
> 1) definite places to look for the handling procedures
> 2) handlers that can share common data for communicating
>    parsing state among them
> 3) human readers that know about 1 and 2 because they can
>    look for concrete types derived from the abstract type.

While I certainly agree with this in general, in this particular case
I'm skeptical about (2), since I can't think of any example where error
handlers are not fatal, which leaves only one handler that only has to
communicate with itself.

The error handlers are really only a workaround to have in effect
exceptions with structured arguments.

> I think the advantages outweigh a minor loss in flexibility,
> if you'd want to call flexibility the ability to pass a
> collection of subprograms each declared just anywhere.

Yes, I'm completely sold on the idea of using a dispatching call instead
of error handler accesses. And still unsure about the normal handler.

> Influencing the shape of this type, Process's parameters
> such as Posixly_Correct might mean that the handling
> procedures may want to determine whether or not Posixly_Correct
> is true.

I think it depends on who is supposed to set these parameters. For
example Long_Only should not be changed while Process is running, so I
don't like it being a record component or the result of a function.
Having it a type discriminant like you suggest feels like an abuse of
the type discriminant feature, but I cannot manage to tell why (and I
might be wrong, considering I'm still too new to have accurate
feelings).

Posixly_Correct could be changed on-the-fly (even though in the C code I
took inspiration from it was a compile-time option), even though I don't
see any reason to do so (a meta-flag?).

I will give it some deeper thought.

> So, then, my first idea will be along these lines:
>
> package Natools.Handlers is
>
>     type Id_And_Argument_Handler
>        (Posixly_Correct :  Boolean;
>         ...)
>     is abstract tagged private;
>
>     procedure Callback
>         (Handler : in out Id_And_Argument_Handler;
>          Id : Option_Id;
>          Argument : String) is abstract;
>
>     procedure Missing_Argument
>         (Handler : in out Id_And_Argument_Handler;
>          Id : Option_Id) is null;
>
>     etc.
>
> end Natools.Handlers;

I'm afraid it will not work, since Option_Id is a formal type from
generic package Natools.Getopt_Long.

However, is there anything wrong with having something like the
following?

generic
   type Option_Id is (<>);
package Natools.Getopt_Long is
   Option_Error : exception;
   package Handlers is
      type Argument_Handler is abstract tagged null record;
      procedure Missing_Argument
        (Handler : in out Argument_Handler;
         Id      : Option_Id);   --  implementation raises Option_Error;
      --  others handlers, maybe including regular argument callback
   end Handlers;
   procedure Process ( -- arguments to be written later
end Natools.Getopt_Long;

Or is the inner package looking for trouble?

> I did not look at the implementation.

I'm sure there are very useful lessons for me to draw from the
implementation. Still I'm very grateful for your comments on the
specification, I think that's the best way for me to make progress at
this point.

Thanks a lot for your help,
Natasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-26 18:13   ` Natasha Kerensikova
@ 2011-11-26 20:47     ` Jeffrey Carter
  2011-11-28 15:49     ` Georg Bauhaus
  1 sibling, 0 replies; 15+ messages in thread
From: Jeffrey Carter @ 2011-11-26 20:47 UTC (permalink / raw)

On 11/26/2011 11:13 AM, Natasha Kerensikova wrote:
>
> In most use cases I can imagine, an application encountering an error in
> argument parsing will just print a "usage" text and exit with an error
> code, that can be done simply by catching Natools.Getopt_Long.Option_Error
> exception.

If one is willing to forgo the nuanced error handling one gets with the 
exceptional-case handlers in your design and instead have something like this, 
I'd probably go with a function that returns a list of arguments and raises an 
exception if that's not possible:

Invalid_Argument : exception;

type Argument_Info (Has_Option : Boolean := False) is record
    Number : Positive;
    -- The argument number for the option, if there is one, or the argument, if
    -- there is no option.
    Value  : Ada.Strings.Unbounded.Unbounded_String;
    -- The value of the argument (associated with Option if applicable).

    case Has_Option is
    when False =>
       null;
    when True =>
       Option : Option_ID; -- The option Value is for, if any.
    end case;
end record;

type Argument_List is array (Positive range <>) of Argument_Info;

function Arguments (Options : in Option_Definition) return Argument_List;
-- Processes the arguments on the command line based on Options and returns
-- them in Argument_List.
-- Raises Invalid_Argument if the arguments cannot be processed according
-- to Options (missing the argument to an option that requires one, or an
-- unknown option).

The exception message can indicate the reason the arguments couldn't be 
processed ("Missing argument" or "Unknown option") and the offending argument 
number. Or one could use more specific exceptions, Missing_Argument and 
Unknown_Option, with the argument number in the exception message. That's pretty 
much a matter of taste. The ARM seems to prefer blanket exceptions 
(Argument_Error); I usually like more precision.

Including the argument number allows the client to look at the raw arguments if 
so desired.

Yet another design might be to pass unknown options back as non-option arguments 
and let the client decide what to do with them, in which case a missing argument 
would be the only exceptional case.

Exceptions indicate exceptional situations that the package is not prepared to 
handle; they are not necessarily errors. That's usually up to the client to 
decide. So I generally don't like to name exceptions with _Error. A good example 
is attempting to read past the EOF with Ada.Text_IO. End_Of_File returns True if 
there's a blank line followed by EOF remaining in the file; the only way to 
process a final blank line is to read until End_Error is raised. That's not an 
error; it's a deliberate act by the client in order to properly process the 
entire file. I'd call it EOF_Encountered.

While it seems unlikely, a client might want to allow a client to use one of 
multiple, mutually exclusive option definitions, in which case the inability to 
process the arguments is not an error, but an indication that the client should 
try a different option definition.

A couple of other minor comments:

The declaration of Option_Error is far from where it is referenced in the spec. 
I like things to be as close to where they're referenced as possible.

The "default" constants are not referenced in the spec. They might be useful as 
default parameter values, and in the comments that currently refer to their values.

HTH.

-- 
Jeff Carter
"He didn't get that nose from playing ping-pong."
Never Give a Sucker an Even Break
110

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-25 10:47 Starter project: getopt_long in Ada Natasha Kerensikova
  2011-11-25 16:39 ` Georg Bauhaus
@ 2011-11-27  8:05 ` Yannick Duchêne (Hibou57)
  2011-11-27 12:39   ` Natasha Kerensikova
  2011-11-27  8:09 ` Yannick Duchêne (Hibou57)
  2 siblings, 1 reply; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-27  8:05 UTC (permalink / raw)


Le Fri, 25 Nov 2011 11:47:17 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> So I wrote a Getopt_Long package, along with a test suite, that can be
> found at:
> http://fossil.instinctive.eu/natools/dir?ci=tip
Still reading getopt_long.ads, and a two cents comment:  
“Option_Definitions” may be renamed into “Options_DB”, and parameters  
named “Options” into “DB” for more clarity. Just a personal feeling, I  
don't assert anything.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-25 10:47 Starter project: getopt_long in Ada Natasha Kerensikova
  2011-11-25 16:39 ` Georg Bauhaus
  2011-11-27  8:05 ` Yannick Duchêne (Hibou57)
@ 2011-11-27  8:09 ` Yannick Duchêne (Hibou57)
  2 siblings, 0 replies; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-27  8:09 UTC (permalink / raw)


Le Fri, 25 Nov 2011 11:47:17 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> So I wrote a Getopt_Long package, along with a test suite, that can be
> found at:
> http://fossil.instinctive.eu/natools/dir?ci=tip
Oops, suspected erroneous wording in “getopt_long.ads”:
> --    definitions. Callback is called for each identified option with its
> --    idea and the option argument if any, or the empty string otherwise.
You wanted to mean “with its ID”, aren't you ?

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-25 16:39 ` Georg Bauhaus
  2011-11-26 18:13   ` Natasha Kerensikova
@ 2011-11-27  8:21   ` Yannick Duchêne (Hibou57)
  2011-11-27 12:30     ` Natasha Kerensikova
  1 sibling, 1 reply; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-27  8:21 UTC (permalink / raw)


Le Fri, 25 Nov 2011 17:39:31 +0100, Georg Bauhaus  
<rm.dash-bauhaus@futureapps.de> a écrit:
> package Natools.Handlers is
>
>     type Id_And_Argument_Handler
>        (Posixly_Correct :  Boolean;
>         ...)
>     is abstract tagged private;
>
>     procedure Callback
>         (Handler : in out Id_And_Argument_Handler;
>          Id : Option_Id;
>          Argument : String) is abstract;
>
>     procedure Missing_Argument
>         (Handler : in out Id_And_Argument_Handler;
>          Id : Option_Id) is null;
>
>     etc.
>
> end Natools.Handlers;
I had the same question in my mind too. I am in favor of tagged type above  
access to subprogram, but was thinking this was perhaps just a matter of  
personal preferences. Do someone know about some on-line papers talking in  
deep about tagged types vs access to subprograms ?

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-27  8:21   ` Yannick Duchêne (Hibou57)
@ 2011-11-27 12:30     ` Natasha Kerensikova
  2011-11-27 15:11       ` Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-27 12:30 UTC (permalink / raw)

Hello,

On 2011-11-27, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> I had the same question in my mind too. I am in favor of tagged type above  
> access to subprogram, but was thinking this was perhaps just a matter of  
> personal preferences.

That does not answer to your question below, but just for clarification,
don't really have a strong preference for access to subprograms over
tagged types, but I do admit that my heavy C background makes have
absolutely nothing against access types, especially access to
subprograms (which are immune to most of access types problems like
dangling stuff or leaks).

It's just that I don't like at all having a tagged type for only one
dispatching operation (and no obvious need for internal state). I have
mixed feelings for two operations, but starting from three I do prefer
grouping them in a tagged type. I didn't do that there for various human
reasons, and I do regret it.

Of course that's only when considering related operations. In another
project that I will publish soon (still needs a bit of polishing), I use
two access-to-subprograms, but they are really meant to be completely
different sources (one creates tokens from input while the other outputs
the token, and the whole point of the separation is to have different
input-analysis and output-generation that can plugged together). So in
that case, I count them as two independant single-operation cases.

> Do someone know about some on-line papers talking in  
> deep about tagged types vs access to subprograms ?

I don't know anything like that, and I would also be really interested
in reading one (assuming it exists).

My wild guess is that tagged types would need two dereferences while
access to subprogram only one, but that might end up being optimized
into the same thing, and it's probably too little a performance
difference to matter in most situations. I guess tagged types are more
readable than not null anonymous access to subprogram, due to the lower
amount of text. For named access to subprogram, I don't expect much of a
difference.

Natasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-27  8:05 ` Yannick Duchêne (Hibou57)
@ 2011-11-27 12:39   ` Natasha Kerensikova
  2011-11-27 14:52     ` Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-27 12:39 UTC (permalink / raw)

Hello,

On 2011-11-27, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> “Option_Definitions” may be renamed into “Options_DB”, and parameters  
> named “Options” into “DB” for more clarity. Just a personal feeling, I  
> don't assert anything.

Thanks for the suggestion. I was uncomfortable with Option_Definitions
because I usually use singular nouns for type names. I tried things like
Option_List, but it's not a list in a CS sense.

On the other hand, Options_DB reminds of the wild abbreviations in my C
life. Option_Database maybe?

Or Option_Container? But isn't that a bit too implementation-oriented,
the client probably cares more about the function of the object than
about the kind of stuff that exists inside, so shouldn't the type name
reflect client's point of view?

Thanks for ponting it out,
Natasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-27 12:39   ` Natasha Kerensikova
@ 2011-11-27 14:52     ` Yannick Duchêne (Hibou57)
  0 siblings, 0 replies; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-27 14:52 UTC (permalink / raw)


Le Sun, 27 Nov 2011 13:39:03 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> On the other hand, Options_DB reminds of the wild abbreviations in my C
> life. Option_Database maybe?
There is difference between an abbreviation only the writer know what it  
means, and an abbreviation everyone know what it means. If abbreviations  
are uncommon in Ada prose, this is not because Ada users hate  
abbreviations, but because they don't like things others can't read.

DB is a well know word, and every one will read it as Data_Base, except in  
an audio processing application.

Similarly, they is no fear to have to write HTML and not  
Hyper_Text_Markup_Language, which on the opposite, would deserve  
readability. When possible, shorter is better (while the choice of a good  
name, also vary depending on the scope size).

> Or Option_Container? But isn't that a bit too implementation-oriented,
It's up to you, you are the author :) , I just feel you can get ride of  
the word Option in this name, as this is already implied by the package  
name. Not the same for an Option parameter, standing for an option,  
solely. What I like with the word DB here, is that it underline the  
specific role while being short.

But please, keep in mind this is just my feeling.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-27 12:30     ` Natasha Kerensikova
@ 2011-11-27 15:11       ` Yannick Duchêne (Hibou57)
  2011-11-28  8:21         ` Natasha Kerensikova
  0 siblings, 1 reply; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-27 15:11 UTC (permalink / raw)


Le Sun, 27 Nov 2011 13:30:28 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> especially access to
> subprograms (which are immune to most of access types problems like
> dangling stuff or leaks).
Yes, indeed (cheese)

> It's just that I don't like at all having a tagged type for only one
> dispatching operation (and no obvious need for internal state).
If you think Specification, you should not say “I” here.

> Of course that's only when considering related operations. In another
> project that I will publish soon (still needs a bit of polishing), I use
> two access-to-subprograms, but they are really meant to be completely
> different sources (one creates tokens from input while the other outputs
> the token, and the whole point of the separation is to have different
> input-analysis and output-generation that can plugged together). So in
> that case, I count them as two independant single-operation cases.
Your Markdown processor ?

> My wild guess is that tagged types would need two dereferences while
> access to subprogram only one,
With an access to a subprogram, there is a reference to an address and an  
address dereference; with a tagged type, there is a reference to an  
instance and a selector (is that the good word? I'm not sure). I see two  
for both.

> but that might end up being optimized
> into the same thing, and it's probably too little a performance
> difference to matter in most situations.
With batched operations (an iteration like the one your interface provides  
fall in this case), the difference can become really tiny I believe. You  
can imagine some way for a compiler to optimize it.

> Natasha


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-27 15:11       ` Yannick Duchêne (Hibou57)
@ 2011-11-28  8:21         ` Natasha Kerensikova
  2011-11-28 13:02           ` Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-28  8:21 UTC (permalink / raw)

On 2011-11-27, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> Le Sun, 27 Nov 2011 13:30:28 +0100, Natasha Kerensikova  
><lithiumcat@gmail.com> a écrit:
>> It's just that I don't like at all having a tagged type for only one
>> dispatching operation (and no obvious need for internal state).
> If you think Specification, you should not say “I” here.

I'm sorry, but I don't understand what you mean there.

>> Of course that's only when considering related operations. In another
>> project that I will publish soon (still needs a bit of polishing), I use
>> two access-to-subprograms, but they are really meant to be completely
>> different sources (one creates tokens from input while the other outputs
>> the token, and the whole point of the separation is to have different
>> input-analysis and output-generation that can plugged together). So in
>> that case, I count them as two independant single-operation cases.
> Your Markdown processor ?

More or less yes. It's actually a generic (lightweight) markup
processor, and Markdown is only one of the input-to-token callback sets.
I'm not sure exactly what kind of expressive power is has, but it's at
least suitable for Creole and Textile too.

The rough design is based around an array created by the client for the
library engine, whose elements are a record containing a lexer callback
(the input-to-token part), a renderer callback (token-to-output), an
indication for engine to not keep calling every entry all the time, and
a priority that breaks ties.

The engine calls the input-to-token part when adequate, which updates
the current position and returns a Token'Class object, which is then fed
to the corresponding token-to-output callback.

One thing I don't like much in this design is that the token type must
match: for example when parsing a link, the input-to-token will create a
Link_Token that contains the linked URI, a title and the link text. The
token-to-output will need all that information, so I'm casting the
Token'Class back into a Link_Token. But what if the client mismatched
the callbacks and it got a String_Token instead? That raises a run-time
exception, while the information is already there at compile-time.

Well in theory an input-to-token callback could create different token
types (e.g. when the link title is optional, instead of putting an
empty string in a Link_Token, have a Titled_Link_Token and a
titleless Link_Token). But I would gladly force each input-to-token
callback to create a single token type if that can be statically checked
against the expected token-to-output type.

Anyway, back to the original point, the library will also provide as
examples standard Markdown, various Markdown extensions, Creole and
Textile input-to-token callback sets, and some output-to-token callback
sets as well (I can't think of anything other than HTML and XHTML right
now). The whole token layer is meant to make them independent, so it
would be counterproductive to make one class for each combination only
to remove two accesses-to-subprogram.

>> My wild guess is that tagged types would need two dereferences while
>> access to subprogram only one,
> With an access to a subprogram, there is a reference to an address and an  
> address dereference; with a tagged type, there is a reference to an  
> instance and a selector (is that the good word? I'm not sure). I see two  
> for both.

Maybe I'm too tainted with C++ implementation details, but what I
imagined was the access-to-subprogram containing the address of the
address of the subprogram, so that's one dereference, while a tagged
type instance would contain a reference to a dispatch table (shared by
all instances) which itself contains a reference to the actual code.

It feels like a waste of resources to have a one-entry dispatch table,
but I'm not sure dealing with it optimally is really worth the extra
complexity in compilers.

Natasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-28  8:21         ` Natasha Kerensikova
@ 2011-11-28 13:02           ` Yannick Duchêne (Hibou57)
  0 siblings, 0 replies; 15+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-28 13:02 UTC (permalink / raw)


Le Mon, 28 Nov 2011 09:21:51 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
>>> It's just that I don't like at all having a tagged type for only one
>>> dispatching operation (and no obvious need for internal state).
>> If you think Specification, you should not say “I” here.
>
> I'm sorry, but I don't understand what you mean there.
Oops, I was unclear, indeed.

Rewording: a specification is probably not just intended to you, and even  
yourself, may behave like someone else some future day. If You (the “I”  
you used) don't see a need for something right now, and drop an  
abstraction for this sole purpose, this may not be a good thing for your  
interface. One may think reusable and abstract when setting up an  
interface, not just think about the actual needs of the moment. Questions  
like “and what if ?” may be good track. As tagged type can to more while  
still being capable of the less, compared to access to subprogram, and  
nothing prevent the use of tagged types, then the idea of tagged types  
should not be dropped with the argument “I don't need at the moment”.  
Otherwise, you will have to change the interface later, and its always  
better to update an implementation than an interface.

Sorry for the smalltalk… I guess you already know all of that (red-face).

> [important snip dropped to be short]
>The rough design is based around an array created by the client for the
> library engine, whose elements are a record containing a lexer callback
> (the input-to-token part), a renderer callback (token-to-output), an
> indication for engine to not keep calling every entry all the time, and
> a priority that breaks ties.
>
> The engine calls the input-to-token part when adequate, which updates
> the current position and returns a Token'Class object, which is then fed
> to the corresponding token-to-output callback.
>
> [important snip dropped to be short]

I won't comment, as I am not sure I have clear mind about it.

> Maybe I'm too tainted with C++ implementation details, but what I
> imagined was the access-to-subprogram containing the address of the
> address of the subprogram, so that's one dereference, while a tagged
> type instance would contain a reference to a dispatch table (shared by
> all instances) which itself contains a reference to the actual code.
>
> It feels like a waste of resources to have a one-entry dispatch table,
> but I'm not sure dealing with it optimally is really worth the extra
> complexity in compilers.
Both are implementation assumptions. An access type may not be  
dereferenced straight away, at least with Ada which know access type level  
check, and an access type may as much be an index in some array, among  
other things. Many time, you have to use “.all” with access types, and  
when not there, it's there, implicitly: this “.all” does not seems  
warranted to me, to not involve any operation. On the other side, the same  
with tagged types. If a compiler know a XXX_Type'Class only has instance  
of a single known type, then it can avoid all dispatching tables, which  
are anywhere a particular implementation (as an example, SmallEiffel used  
conditional in some case, instead of a VMT, and could even do direct call  
in some other more rare cases). There is no way to assume an access to  
procedure is more or less efficient than what is semantically a  
dispatching call.


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-26 18:13   ` Natasha Kerensikova
  2011-11-26 20:47     ` Jeffrey Carter
@ 2011-11-28 15:49     ` Georg Bauhaus
  2011-11-28 17:18       ` Natasha Kerensikova
  1 sibling, 1 reply; 15+ messages in thread
From: Georg Bauhaus @ 2011-11-28 15:49 UTC (permalink / raw)

On 26.11.11 19:13, Natasha Kerensikova wrote:

>  So maybe it would make sense to keep the
> access-to-procedure Callback argument, and create an
> Option_Error_Handler class, whose default value in Process would be
> something that just raises Option_Error (which would by the way make
> Process body cleaner by not having to check null access).
> 
...
> While I certainly agree with this in general, in this particular case
> I'm skeptical about (2), since I can't think of any example where error
> handlers are not fatal, which leaves only one handler that only has to
> communicate with itself.

Two use cases for stateful handlers that I can think of are

1) -v -v ... increasing verbosity, i.e., handling means counting
   the number of times an option has occured

2) Two options -A and -B must not occur ensemble, in any order or
   frequency (This is "inspired" by really complicated setups, such
   as the nightmarish command line interface of ffmpeg.)

In either case, the handlers would have to have a way to be
informed about prior occurrences.

> Yes, I'm completely sold on the idea of using a dispatching call instead
> of error handler accesses. And still unsure about the normal handler.

Both mechanisms should work. Though, for reasons of reduced complexity,
and streamlined interface, I'd prefer just one mechanism used for either
of regular handlers and error handlers. Doesn't hurt, and if later
the regular callback mechanism needs extension, the interface of
Process needs little change.

Do not worry about efficiency of indirect calls. It seems that there
is very little difference at the assembly level, in particular
when the compiler "dispatches" to the right procedure at
compile time, as is the default with Ada. (If the programmer
has not requested runtime dispatching explicitly.)

For illustration, in traditional Ada, one may pass a callback as a
generic formal subprogram. There is then no pointer. The following example
is from a translation with optimization turned on, but without any inlining
and checks turned off:

using a procedure pointer:

   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	83 ec 08             	sub    $0x8,%esp
   6:	c7 04 24 01 00 00 00 	movl   $0x1,(%esp)
   d:	ff d0                	call   *%eax
   f:	c9                   	leave
  10:	c3                   	ret

using a formal generic procedure:

  30:	55                   	push   %ebp
  31:	89 e5                	mov    %esp,%ebp
  33:	83 ec 04             	sub    $0x4,%esp
  36:	c7 04 24 02 00 00 00 	movl   $0x2,(%esp)
  3d:	e8 de ff ff ff       	call   20 <see_if_dispatching__assign.158>
  42:	c9                   	leave
  43:	c3                   	ret

Don't know if there is a difference at all in terms of
efficiency.

However, with checks turned on, the pointer version show a longer
list of instructions than that of the generic version.

> Having it a type discriminant like you suggest feels like an abuse of
> the type discriminant feature, but I cannot manage to tell why (and I
> might be wrong, considering I'm still too new to have accurate
> feelings).

If you remember that a discriminant acts a a subtype
constraint, your worries might be fewer: Once the constraint
has a value, the value will suitably restrict the set of
possible values of the type.  This might also mean
that operations can now select specific behavior depending
on the discriminant constraint. This constrains the set
of possible behaviors.

> However, is there anything wrong with having something like the
> following?
> 
> generic
>    type Option_Id is (<>);
> package Natools.Getopt_Long is
>    Option_Error : exception;
>    package Handlers is
>       type Argument_Handler is abstract tagged null record;
>       procedure Missing_Argument
>         (Handler : in out Argument_Handler;
>          Id      : Option_Id);   --  implementation raises Option_Error;
>       --  others handlers, maybe including regular argument callback
>    end Handlers;
>    procedure Process ( -- arguments to be written later
> end Natools.Getopt_Long;
> 
> Or is the inner package looking for trouble?

No, seems fine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Starter project: getopt_long in Ada
  2011-11-28 15:49     ` Georg Bauhaus
@ 2011-11-28 17:18       ` Natasha Kerensikova
  0 siblings, 0 replies; 15+ messages in thread
From: Natasha Kerensikova @ 2011-11-28 17:18 UTC (permalink / raw)

On 2011-11-28, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> On 26.11.11 19:13, Natasha Kerensikova wrote:
>> While I certainly agree with this in general, in this particular case
>> I'm skeptical about (2), since I can't think of any example where error
>> handlers are not fatal, which leaves only one handler that only has to
>> communicate with itself.
>
> Two use cases for stateful handlers that I can think of are
>
> 1) -v -v ... increasing verbosity, i.e., handling means counting
>    the number of times an option has occured
>
> 2) Two options -A and -B must not occur ensemble, in any order or
>    frequency (This is "inspired" by really complicated setups, such
>    as the nightmarish command line interface of ffmpeg.)
>
> In either case, the handlers would have to have a way to be
> informed about prior occurrences.

I think we had a bit of a misunderstanding about the word "state".

My point was that I couldn't (and still can't) think of any example
where an *internal* state for handlers is needed. Of course,
command-line flags have to change something, otherwise it's useless to
parse them, but they change a *global* state.
Or at least an externally-visible state, if you prefer to query a
global object rather than have a collection of global variables
(I prefer releasing everything related to argument parsing before
starting the actual processing, but that's a personal taste.)

The verbosity level is most likely an integer, conceptually in a global
state (even if it's hidden behind a function rather than in a global
variable), and -v handler only communicates with the global state.

For case 2 it's a bit more difficult to answer, it depends on the rest
of the design, whether the handler launches some actual processing or
only record the arguments in a friendlier format to be executed at some
later point.

Anyway, we actually agree on the necessity to have a state, and I admit
I didn't think of the possibly of having the parsing object (whose
dispatching subprograms replace the current access-to-subprograms) be
also a holder for the global state.

> Do not worry about efficiency of indirect calls.

To be honest I really don't. Especially since we were talking about
command-line argument parsing, which is seldom a performance-critical
piece of code.

And the difference cannot really be worse than an extra
dererefence/memory access, which is also negligible is most situation
even when performance matters. It might be significant for
memory-constrained devices, but I've never come close to any such
device (even though I would love to).

> For illustration, in traditional Ada, one may pass a callback as a
> generic formal subprogram. There is then no pointer. The following example
> is from a translation with optimization turned on, but without any inlining
> and checks turned off:
>
> using a procedure pointer:
>
>    0:	55                   	push   %ebp
>    1:	89 e5                	mov    %esp,%ebp
>    3:	83 ec 08             	sub    $0x8,%esp
>    6:	c7 04 24 01 00 00 00 	movl   $0x1,(%esp)
>    d:	ff d0                	call   *%eax
>    f:	c9                   	leave
>   10:	c3                   	ret
>
> using a formal generic procedure:
>
>   30:	55                   	push   %ebp
>   31:	89 e5                	mov    %esp,%ebp
>   33:	83 ec 04             	sub    $0x4,%esp
>   36:	c7 04 24 02 00 00 00 	movl   $0x2,(%esp)
>   3d:	e8 de ff ff ff       	call   20 <see_if_dispatching__assign.158>
>   42:	c9                   	leave
>   43:	c3                   	ret
>
> Don't know if there is a difference at all in terms of
> efficiency.

I would be quite confident there is no difference between these snippets
on 21st century i386 processors.

What does make a difference, although probably small, is that in the
first case an address has to somehow end up in eax, while the second has
a ready-to-use constant. I would guess that the address in eax is loaded
from memory (hence the extra dereference/memory access I keep
mentioning), or maybe somehow computed. And that's the efficiency
difference between both situations.

> If you remember that a discriminant acts a a subtype
> constraint, your worries might be fewer: Once the constraint
> has a value, the value will suitably restrict the set of
> possible values of the type.  This might also mean
> that operations can now select specific behavior depending
> on the discriminant constraint. This constrains the set
> of possible behaviors.

I will have to think deeper about it. Right now I'm mostly realizing
that I have no idea how discriminants and type expansion in interact:
can I expand a constrained root? Or do I have to constrain the instance
of the derived type? And few other considerations like that. It looks
like it's time to stop writing and go reading for a while.

Thanks a lot for the pointers,
Natasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-28 17:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-25 10:47 Starter project: getopt_long in Ada Natasha Kerensikova
2011-11-25 16:39 ` Georg Bauhaus
2011-11-26 18:13   ` Natasha Kerensikova
2011-11-26 20:47     ` Jeffrey Carter
2011-11-28 15:49     ` Georg Bauhaus
2011-11-28 17:18       ` Natasha Kerensikova
2011-11-27  8:21   ` Yannick Duchêne (Hibou57)
2011-11-27 12:30     ` Natasha Kerensikova
2011-11-27 15:11       ` Yannick Duchêne (Hibou57)
2011-11-28  8:21         ` Natasha Kerensikova
2011-11-28 13:02           ` Yannick Duchêne (Hibou57)
2011-11-27  8:05 ` Yannick Duchêne (Hibou57)
2011-11-27 12:39   ` Natasha Kerensikova
2011-11-27 14:52     ` Yannick Duchêne (Hibou57)
2011-11-27  8:09 ` Yannick Duchêne (Hibou57)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox