Data table text I/O package?

comp.lang.ada
 help / color / mirror / Atom feed

* Data table text I/O package?
@ 2005-06-15  9:57 Jacob Sparre Andersen
  2005-06-15 11:43 ` Preben Randhol
                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-15  9:57 UTC (permalink / raw)


I do quite a lot of work, where I manipulate data stored in (tabulator
separated) text files [1].  Does anybody know of a package which
handles the inclusion of a header line with the column names in an
elegant way?  It should preferably include automated testing that the
header is correct, when a file is opened, and automated creation of
the header when a file is created.

TIA,

Jacob

[1] Yes, I know that binary files are faster to read and write, but
    they complicate file transfer between different platforms and
    "visual inspection" of the data.

-- 
City X'ers mail van (building instructions):
               http://lego.jacob-sparre.dk/CityXers/Postbil/



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15  9:57 Data table text I/O package? Jacob Sparre Andersen
@ 2005-06-15 11:43 ` Preben Randhol
  2005-06-15 13:35   ` Jacob Sparre Andersen
  2005-06-15 19:30 ` Simon Wright
  2005-06-15 22:40 ` Lionel Draghi
  2 siblings, 1 reply; 68+ messages in thread
From: Preben Randhol @ 2005-06-15 11:43 UTC (permalink / raw)
  To: Jacob Sparre Andersen; +Cc: comp.lang.ada

Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (10:01) :
> I do quite a lot of work, where I manipulate data stored in (tabulator
> separated) text files [1].  Does anybody know of a package which
> handles the inclusion of a header line with the column names in an
> elegant way?  It should preferably include automated testing that the
> header is correct, when a file is opened, and automated creation of
> the header when a file is created.

Not sure what you are asking. Do you want to load the data into lists
comforming to a header name? You can use charles with maps and lists.

Or you want something that splits the line ?

Preben



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 11:43 ` Preben Randhol
@ 2005-06-15 13:35   ` Jacob Sparre Andersen
  2005-06-15 14:12     ` Preben Randhol
       [not found]     ` <20050615141236.GA90053@pvv.org>
  0 siblings, 2 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-15 13:35 UTC (permalink / raw)

Preben Randhol wrote:
> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (10:01) :

> > I do quite a lot of work, where I manipulate data stored in
> > (tabulator separated) text files [1].  Does anybody know of a
> > package which handles the inclusion of a header line with the
> > column names in an elegant way?  It should preferably include
> > automated testing that the header is correct, when a file is
> > opened, and automated creation of the header when a file is
> > created.
> 
> Not sure what you are asking. Do you want to load the data into
> lists comforming to a header name?

That was not what I was trying to ask for.  Generally I just run my
data analysis tools as "filters", where I can manage with processing
one line (or a few lines) at a time.

The important part is to have the checking of the headers and the
generation of Put_Line and Get_Line procedures automated based on a
record type (and not too much more).  Since I need records (for type
checking) and not just simple arrays, I can't manage with a generic
package, but have to put some code generation into the system (or can
I play some tricks with streams?).

> You can use charles with maps and lists.

I'll see if I can find the package in Charles, which does this.

> Or you want something that splits the line?

I have that already.

Jacob
-- 
Brakzand II:
    http://lego.jacob-sparre.dk/Transport/Skibe/Brakzand_II/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 13:35   ` Jacob Sparre Andersen
@ 2005-06-15 14:12     ` Preben Randhol
  2005-06-15 15:02       ` Jacob Sparre Andersen
       [not found]     ` <20050615141236.GA90053@pvv.org>
  1 sibling, 1 reply; 68+ messages in thread
From: Preben Randhol @ 2005-06-15 14:12 UTC (permalink / raw)
  To: Jacob Sparre Andersen; +Cc: comp.lang.ada

Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
> The important part is to have the checking of the headers and the
> generation of Put_Line and Get_Line procedures automated based on a
> record type (and not too much more).  Since I need records (for type
> checking) and not just simple arrays, I can't manage with a generic
> package, but have to put some code generation into the system (or can
> I play some tricks with streams?).

So the header might be:

Integer   Float   Text

and the data could be:

1         0.9     Start point
2         0.3     Minimum
3         6.0     End point

and then you want to check the data and validate that they are of the
correct type as indicated by the header? To generate the header
you want that the package finds out which type a certain data type is
and output this type in the header?

Preben



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 14:12     ` Preben Randhol
@ 2005-06-15 15:02       ` Jacob Sparre Andersen
  2005-06-15 16:17         ` Preben Randhol
  2005-06-15 18:58         ` Randy Brukardt
  0 siblings, 2 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-15 15:02 UTC (permalink / raw)


Preben Randhol wrote:
> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :

> > The important part is to have the checking of the headers and the
> > generation of Put_Line and Get_Line procedures automated based on
> > a record type (and not too much more).  Since I need records (for
> > type checking) and not just simple arrays, I can't manage with a
> > generic package, but have to put some code generation into the
> > system (or can I play some tricks with streams?).
> 
> So the header might be:
> 
> Integer   Float   Text

Not quite.  The headers would be field names, not just types.  I.e.:

Gene ID p-value Expression-level        Description     Human cromosome
GE29031 0.04539 245.45  Cyclin-B1       17

> and the data could be:
> 
> 1         0.9     Start point
> 2         0.3     Minimum
> 3         6.0     End point
> 
> and then you want to check the data and validate that they are of
> the correct type as indicated by the header? To generate the header
> you want that the package finds out which type a certain data type
> is and output this type in the header?

Sort of.  Except that I would use the names of the fields in the
record and not just the types of the fields.

One of my problems is that I have different kinds of files (in terms
of meaning of the numbers) where the types for all practical purposes
are the same.

But it seems like it might be more efficient to code a library like
that by hand for each case, even though it means that I miss the
automated checking (my main reason for using Ada).

Jacob (who should remember not to want the impossible every day)
-- 
ï¿½If you're going to have crime,
 it might as well be organized crime.ï¿½      -- Lord Vetinari




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
       [not found]     ` <20050615141236.GA90053@pvv.org>
@ 2005-06-15 15:40       ` Marius Amado Alves
  2005-06-15 19:18         ` Oliver Kellogg
       [not found]       ` <7adf1648bb99ca2bb4055ed8e6e381f4@netcabo.pt>
  1 sibling, 1 reply; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-15 15:40 UTC (permalink / raw)
  To: comp.lang.ada


On 15 Jun 2005, at 15:12, Preben Randhol wrote:

> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
>> The important part is to have the checking of the headers and the
>> generation of Put_Line and Get_Line procedures automated based on a
>> record type (and not too much more).  Since I need records (for type
>> checking) and not just simple arrays, I can't manage with a generic
>> package, but have to put some code generation into the system (or can
>> I play some tricks with streams?).

(Didn't get this message from Jacob.)

You have to generate code. I did that in the past. Ada records or types 
cannot be created dynamically. Ada is not reflexive. Open Ada is, but I 
haven't tried it yet.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
       [not found]       ` <7adf1648bb99ca2bb4055ed8e6e381f4@netcabo.pt>
@ 2005-06-15 15:46         ` Preben Randhol
       [not found]         ` <20050615154640.GA1921@pvv.org>
  1 sibling, 0 replies; 68+ messages in thread
From: Preben Randhol @ 2005-06-15 15:46 UTC (permalink / raw)
  To: comp.lang.ada

On Wed, Jun 15, 2005 at 04:40:53PM +0100, Marius Amado Alves wrote:
> You have to generate code. I did that in the past. Ada records or types 
> cannot be created dynamically. Ada is not reflexive. Open Ada is, but I 
> haven't tried it yet.

Open Ada?

-- 
Preben Randhol -------------- http://www.pvv.org/~randhol/Ada95 --
                 "For me, Ada95 puts back the joy in programming."



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
       [not found]         ` <20050615154640.GA1921@pvv.org>
@ 2005-06-15 16:14           ` Marius Amado Alves
       [not found]           ` <f04ccd7efd67fe197cc14cda89340779@netcabo.pt>
  1 sibling, 0 replies; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-15 16:14 UTC (permalink / raw)
  To: comp.lang.ada

> Open Ada?

Sorry, OpenAda. Originally from the USAF, now from Rational I think.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 15:02       ` Jacob Sparre Andersen
@ 2005-06-15 16:17         ` Preben Randhol
  2005-06-15 16:58           ` Dmitry A. Kazakov
  2005-06-15 18:58         ` Randy Brukardt
  1 sibling, 1 reply; 68+ messages in thread
From: Preben Randhol @ 2005-06-15 16:17 UTC (permalink / raw)
  To: Jacob Sparre Andersen; +Cc: comp.lang.ada

On Wed, Jun 15, 2005 at 05:02:37PM +0200, Jacob Sparre Andersen wrote:
> Preben Randhol wrote:
> > Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
> 
> > > The important part is to have the checking of the headers and the
> > > generation of Put_Line and Get_Line procedures automated based on
> > > a record type (and not too much more).  Since I need records (for
> > > type checking) and not just simple arrays, I can't manage with a
> > > generic package, but have to put some code generation into the
> > > system (or can I play some tricks with streams?).
> > 
> > So the header might be:
> > 
> > Integer   Float   Text
> 
> Not quite.  The headers would be field names, not just types.  I.e.:
> 
> Gene ID p-value Expression-level        Description     Human cromosome
> GE29031 0.04539 245.45  Cyclin-B1       17
> 
> > and the data could be:
> > 
> > 1         0.9     Start point
> > 2         0.3     Minimum
> > 3         6.0     End point
> > 
> > and then you want to check the data and validate that they are of
> > the correct type as indicated by the header? To generate the header
> > you want that the package finds out which type a certain data type
> > is and output this type in the header?
> 
> Sort of.  Except that I would use the names of the fields in the
> record and not just the types of the fields.
> 
> One of my problems is that I have different kinds of files (in terms
> of meaning of the numbers) where the types for all practical purposes
> are the same.

So you have different files with for example p-value, and in all the
p-value is a float?

If so, then I would have made a map something like:

   Gene              => "Text"
   ID                => "Float"
   p-value           => "My_Float"  (In case you have a special type) 
   Expression-level  => "Text"
   Description       => "Text"
   Human cromosome   => "Text"
   ...
   

   and when you read in the values you can do a 

   Is_Valid_Type (Column_Type : String; Value : String) return Boolean
   is
   begin
      if Column_Type = "Float" then
         declare
            F : Float := Float'Value (Value);
         begin
            return true;
         exception
            when => others
               return false;
         end;
      elsif Column_Type = "Integer" then
      ...

-- 
Preben Randhol -------------- http://www.pvv.org/~randhol/Ada95 --
"Have another drink, not-Corporal Nobby?" said Sergeant Colon unsteadily.
"I do not mind if I do, not-Sgt Colon," said Nobby.
        -- The joys of working undercover
                   (Terry Pratchett, Guards! Guards!)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
       [not found]           ` <f04ccd7efd67fe197cc14cda89340779@netcabo.pt>
@ 2005-06-15 16:20             ` Preben Randhol
  0 siblings, 0 replies; 68+ messages in thread
From: Preben Randhol @ 2005-06-15 16:20 UTC (permalink / raw)
  To: Marius Amado Alves; +Cc: comp.lang.ada

On Wed, Jun 15, 2005 at 05:14:39PM +0100, Marius Amado Alves wrote:
> >Open Ada?
> 
> Sorry, OpenAda. Originally from the USAF, now from Rational I think.

I see. More info here:

   http://www.cs.york.ac.uk/ftpdir/reports/YCS-2000-331.pdf

-- 
Preben Randhol -------------- http://www.pvv.org/~randhol/Ada95 --
                 �For me, Ada95 puts back the joy in programming.�



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 16:17         ` Preben Randhol
@ 2005-06-15 16:58           ` Dmitry A. Kazakov
  2005-06-15 17:30             ` Marius Amado Alves
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-15 16:58 UTC (permalink / raw)


On Wed, 15 Jun 2005 18:17:06 +0200, Preben Randhol wrote:

> On Wed, Jun 15, 2005 at 05:02:37PM +0200, Jacob Sparre Andersen wrote:
>> Preben Randhol wrote:
>>> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
>> 
>>> > The important part is to have the checking of the headers and the
>>> > generation of Put_Line and Get_Line procedures automated based on
>>> > a record type (and not too much more).  Since I need records (for
>>> > type checking) and not just simple arrays, I can't manage with a
>>> > generic package, but have to put some code generation into the
>>> > system (or can I play some tricks with streams?).
>>> 
>>> So the header might be:
>>> 
>>> Integer   Float   Text
>> 
>> Not quite.  The headers would be field names, not just types.  I.e.:
>> 
>> Gene ID p-value Expression-level        Description     Human cromosome
>> GE29031 0.04539 245.45  Cyclin-B1       17
>> 
>>> and the data could be:
>>> 
>>> 1         0.9     Start point
>>> 2         0.3     Minimum
>>> 3         6.0     End point
>>> 
>>> and then you want to check the data and validate that they are of
>>> the correct type as indicated by the header? To generate the header
>>> you want that the package finds out which type a certain data type
>>> is and output this type in the header?
>> 
>> Sort of.  Except that I would use the names of the fields in the
>> record and not just the types of the fields.
>> 
>> One of my problems is that I have different kinds of files (in terms
>> of meaning of the numbers) where the types for all practical purposes
>> are the same.
> 
> So you have different files with for example p-value, and in all the
> p-value is a float?
> 
> If so, then I would have made a map something like:
> 
>    Gene              => "Text"
>    ID                => "Float"
>    p-value           => "My_Float"  (In case you have a special type) 
>    Expression-level  => "Text"
>    Description       => "Text"
>    Human cromosome   => "Text"
>    ...
>    
> 
>    and when you read in the values you can do a 
> 
>    Is_Valid_Type (Column_Type : String; Value : String) return Boolean
>    is
>    begin
>       if Column_Type = "Float" then
>          declare
>             F : Float := Float'Value (Value);
>          begin
>             return true;
>          exception
>             when => others
>                return false;
>          end;
>       elsif Column_Type = "Integer" then
>       ...

One could have tagged objects and handles to them. Then the header string
could form a list of handles to the objects. The read loop could then look
like:

loop
   Get_Line (Buffer, Length);
   declare
      Line : constant String := Buffer (Buffer'First, Length);
   begin
      Pointer := Line'First;
      for Field in Handles_List'Range loop
         Get (Line, Pointer); -- Skip blanks
         Get (Line, Pointer, Ptr (Handles_List (Field)).all); -- Dispatches
      end loop;
   end;
   Get (Line, Pointer); -- Skip blanks
   if Pointer <= Line'Last then
      -- Unrecognized rest
   end if;
end loop;

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 16:58           ` Dmitry A. Kazakov
@ 2005-06-15 17:30             ` Marius Amado Alves
  2005-06-15 18:41               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-15 17:30 UTC (permalink / raw)
  To: comp.lang.ada

> One could have tagged objects and handles to them. Then the header 
> string
> could form a list of handles to the objects...

Or that, yes, in the absence of reflexivity. A list of polymorphs 
instead of a (dynamically created) record type. Standard. The Ada way. 
The way of any current mainstream OO language, really.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 17:30             ` Marius Amado Alves
@ 2005-06-15 18:41               ` Dmitry A. Kazakov
  2005-06-15 19:09                 ` Marius Amado Alves
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-15 18:41 UTC (permalink / raw)


On Wed, 15 Jun 2005 18:30:52 +0100, Marius Amado Alves wrote:

>> One could have tagged objects and handles to them. Then the header 
>> string
>> could form a list of handles to the objects...
> 
> Or that, yes, in the absence of reflexivity. A list of polymorphs 
> instead of a (dynamically created) record type. Standard. The Ada way. 
> The way of any current mainstream OO language, really.

BTW, completely unrealistic, but.

I'm unsure if this will be legal in Ada 2006:

declare
   type Record is tagged null record;
begin
   case Filed (1).Type is
      when Float_Type =>
         declare
            type R1 is new Record with record
               Float_Field_1 : Float;
            end record;
         begin
            case Filed (2).Type is
                when Float_Type =>
                    declare
                        type R2 is new R1 with record
                            Float_Field_2 : Float;
                        end record;
                    begin
                        ...
                              -- somewhere dee-e-e-ply nested:
                              type RN is new RN-1 with record ...
                             -- has all fields of all types!
                    end;
         end;
      when Int_Type =>
         ...
(:-))

Provided that this is legal, then one could then try to factor it out using
generics... Though the number of fields has to be fixed.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 15:02       ` Jacob Sparre Andersen
  2005-06-15 16:17         ` Preben Randhol
@ 2005-06-15 18:58         ` Randy Brukardt
  2005-06-16  9:55           ` Jacob Sparre Andersen
  1 sibling, 1 reply; 68+ messages in thread
From: Randy Brukardt @ 2005-06-15 18:58 UTC (permalink / raw)


"Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message
news:m2hdfzek8i.fsf@hugin.crs4.it...
> Preben Randhol wrote:
> > Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
>
> > > The important part is to have the checking of the headers and the
> > > generation of Put_Line and Get_Line procedures automated based on
> > > a record type (and not too much more).  Since I need records (for
> > > type checking) and not just simple arrays, I can't manage with a
> > > generic package, but have to put some code generation into the
> > > system (or can I play some tricks with streams?).
> >
> > So the header might be:
> >
> > Integer   Float   Text
>
> Not quite.  The headers would be field names, not just types.  I.e.:
>
> Gene ID p-value Expression-level        Description     Human cromosome
> GE29031 0.04539 245.45  Cyclin-B1       17

I may be dense, but isn't this the purpose of XML? If so, why reinvent the
wheel?

(I personally think XML is way overused, more because it *can* be used than
that it is worthwhile for the application. But this seems to be exactly the
application that it was designed for. You'll end up with something like XML
eventually anyway, why not start with it?)

                           Randy.






^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 18:41               ` Dmitry A. Kazakov
@ 2005-06-15 19:09                 ` Marius Amado Alves
  0 siblings, 0 replies; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-15 19:09 UTC (permalink / raw)
  To: comp.lang.ada

I think this is legal even in Ada 95 (renamed Record to Record_Type).

> I'm unsure if this will be legal in Ada 2006:
>
> declare
>    type Record_Type is tagged null record;
> begin
>    case Filed (1).Type is
>       when Float_Type =>
>          declare
>             type R1 is new Record_Type with record
>                Float_Field_1 : Float;
>             end record;
>          begin
>             case Filed (2).Type is
>                 when Float_Type =>
>                     declare
>                         type R2 is new R1 with record
>                             Float_Field_2 : Float;
>                         end record;
>                     begin
>                         ...
>                               -- somewhere dee-e-e-ply nested:
>                               type RN is new RN-1 with record ...
>                              -- has all fields of all types!
>                     end;
>          end;
>       when Int_Type =>
>          ...




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 15:40       ` Marius Amado Alves
@ 2005-06-15 19:18         ` Oliver Kellogg
  2005-06-17  9:02           ` Jacob Sparre Andersen
  0 siblings, 1 reply; 68+ messages in thread
From: Oliver Kellogg @ 2005-06-15 19:18 UTC (permalink / raw)


Marius Amado Alves <amado.alves@netcabo.pt> wrote:
>
> On 15 Jun 2005, at 15:12, Preben Randhol wrote:
>
>> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :
>>> The important part is to have the checking of the headers and the
>>> generation of Put_Line and Get_Line procedures automated based on a
>>> record type (and not too much more).  Since I need records (for type
>>> checking) and not just simple arrays, I can't manage with a generic
>>> package, but have to put some code generation into the system (or can
>>> I play some tricks with streams?).
>
> (Didn't get this message from Jacob.)
>
> You have to generate code. I did that in the past. Ada records or types 
> cannot be created dynamically. Ada is not reflexive. Open Ada is, but I 
> haven't tried it yet.
>

Auto_Text_IO ?

http://www.toadmail.com/~ada_wizard/ada/auto_text_io.html

HTH





^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15  9:57 Data table text I/O package? Jacob Sparre Andersen
  2005-06-15 11:43 ` Preben Randhol
@ 2005-06-15 19:30 ` Simon Wright
  2005-06-15 22:40 ` Lionel Draghi
  2 siblings, 0 replies; 68+ messages in thread
From: Simon Wright @ 2005-06-15 19:30 UTC (permalink / raw)


Jacob Sparre Andersen <sparre@nbi.dk> writes:

> I do quite a lot of work, where I manipulate data stored in
> (tabulator separated) text files [1].  Does anybody know of a
> package which handles the inclusion of a header line with the column
> names in an elegant way?  It should preferably include automated
> testing that the header is correct, when a file is opened, and
> automated creation of the header when a file is created.

I think this sounds like an ASIS application. You might look at Stephe
Leake's Auto_Text_IO .. Google finds it easily enough.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15  9:57 Data table text I/O package? Jacob Sparre Andersen
  2005-06-15 11:43 ` Preben Randhol
  2005-06-15 19:30 ` Simon Wright
@ 2005-06-15 22:40 ` Lionel Draghi
  2 siblings, 0 replies; 68+ messages in thread
From: Lionel Draghi @ 2005-06-15 22:40 UTC (permalink / raw)


Jacob Sparre Andersen a ï¿œcrit :
> I do quite a lot of work, where I manipulate data stored in (tabulator
> separated) text files [1].  Does anybody know of a package which
> handles the inclusion of a header line with the column names in an
> elegant way?

Not an answer, but you may grab some idea from ploticus input formats: 
http://ploticus.sourceforge.net/doc/dataformat.html
And maybe some idea from the C code...

-- 
Lionel Draghi



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 18:58         ` Randy Brukardt
@ 2005-06-16  9:55           ` Jacob Sparre Andersen
  2005-06-16 10:53             ` Marius Amado Alves
  2005-06-30  3:02             ` Randy Brukardt
  0 siblings, 2 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-16  9:55 UTC (permalink / raw)


Randy Brukardt wrote:

> I may be dense, but isn't this the purpose of XML? If so, why
> reinvent the wheel?

The purpose of XML is to be _the_ universal file format.

 a) I don't want a universal file format.

 b) I don't believe in a universal file format.

 c) XML is (almost) less readable than a binary file my purposes.

 d) I'm _not_ going to switch away from tabulator separated tables for
    purposes, where tabulator separated tables are a sensible
    representation of the data in textual form.

> (I personally think XML is way overused, more because it *can* be
> used than that it is worthwhile for the application. But this seems
> to be exactly the application that it was designed for. You'll end
> up with something like XML eventually anyway, why not start with
> it?)

I'm afraid you completely misunderstood my problem.  It is not a
matter of a selecting a file format.  It is the matter of
automagically generating code for reading and writing that file
format.

Jacob
-- 
"I am an old man now, and when I die and go to Heaven there are two matters
 on which I hope enlightenment. One is quantum electro-dynamics and the
 other is turbulence of fluids. About the former, I am rather optimistic."
 Sir Horace Lamb.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16  9:55           ` Jacob Sparre Andersen
@ 2005-06-16 10:53             ` Marius Amado Alves
  2005-06-16 12:24               ` Robert A Duff
  2005-06-16 14:01               ` Georg Bauhaus
  2005-06-30  3:02             ` Randy Brukardt
  1 sibling, 2 replies; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-16 10:53 UTC (permalink / raw)
  To: comp.lang.ada

On 16 Jun 2005, at 10:55, Jacob Sparre Andersen wrote:

> Randy Brukardt wrote:
>
>> I may be dense, but isn't this the purpose of XML? If so, why
>> reinvent the wheel?
>
>  d) I'm _not_ going to switch away from tabulator separated tables for
>     purposes, where tabulator separated tables are a sensible
>     representation of the data in textual form.

Indeed. XML is for semi-structured data and/or text data with Unicode 
etc. For tables of atomic data tab separated is better. More readable, 
efficient, sensible, not requiring a monster XML library.

> It is the matter of
> automagically generating code for reading and writing that file
> format.

Yes. This is interesting, useful, and easy. From the header you get the 
field names, from the first data line with deduce the data types. With 
these elements you can generate the record type and procedures to read 
the file. A trick I often use to deduce data types is based on 'Value:

function Get_Type (Value : String) return Data_Type is
    F : Float;
    I : Integer;
begin
    F := Float'Value (Value);
    return Type_Float;
exception
    when Constraint_Error =>
       begin
          I := Integer'Value (Value);
          return Type_Integer;
       exception
          when Constraint_Error =>
             return Type_String;
       end;
end;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 10:53             ` Marius Amado Alves
@ 2005-06-16 12:24               ` Robert A Duff
  2005-06-16 14:01               ` Georg Bauhaus
  1 sibling, 0 replies; 68+ messages in thread
From: Robert A Duff @ 2005-06-16 12:24 UTC (permalink / raw)

Marius Amado Alves <amado.alves@netcabo.pt> writes:

> Yes. This is interesting, useful, and easy. From the header you get the
> field names, from the first data line with deduce the data types. With
> these elements you can generate the record type and procedures to read
> the file.

Hmm.  Interesting idea.  But you will lose the full power of Ada's type
system.  You cannot, in general, deduce the type from the data, in Ada.
I mean, 123 could be any integer type, and a typical Ada program has
many integer types.

For that matter, how do you know 123 is not intended to be Type_String,
in your example below?

>... A trick I often use to deduce data types is based on 'Value:

I believe this trick will run afoul of RM-11.6.  It probably works in
practise, but I think that an implementation is allowed to return
Type_Float, no matter what string you pass to Value!

Did I mention that I don't like 11.6?  ;-)

> function Get_Type (Value : String) return Data_Type is
>     F : Float;
>     I : Integer;
> begin
>     F := Float'Value (Value);
>     return Type_Float;
> exception
>     when Constraint_Error =>
>        begin
>           I := Integer'Value (Value);
>           return Type_Integer;
>        exception
>           when Constraint_Error =>
>              return Type_String;
>        end;
> end;

- Bob

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 14:01               ` Georg Bauhaus
@ 2005-06-16 12:27                 ` Dmitry A. Kazakov
  2005-06-16 14:46                   ` Georg Bauhaus
  2005-06-16 13:26                 ` Marius Amado Alves
  1 sibling, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-16 12:27 UTC (permalink / raw)

On Thu, 16 Jun 2005 16:01:57 +0200, Georg Bauhaus wrote:

> Marius Amado Alves wrote:
> 
>> For tables of atomic data tab separated is better.
> 
> Note the crucial bits in this general statement.
> 
> 1) You had really better have *atomic* data.
> 
> 2) You had better have the format as your own format and
>  no data exchange with any system requiring "just
>  your table files, please".
> 
> Tab separated atomic data can be "semi-structured"
> too. Consider 04/06/05 and tell me wich calender date that
> is, in [choose country here].
> 
> It makes litte sense to say XML = semi, TAB = atomic without
> specifying what exactly you mean by semi-structure data.
> Consider
> 
>  <Date y="2005" m="June" d="04"/>
> 
> If a program maintains a table of calender dates
> for internal use, then 2005-06-04, or 2005 TAB 06 TAB 04
> save space and is easy to use. But it also restricts
> the table to an internal data format.

Not necessarily.

There is a better technique to parse strings than to tokenize them first.
Get rid of scanner. Just take the date from the current position of the
string and advance the position to the first character following the date.
Because the procedure that gets the date knows the format it also knows
where the date ends. It can also support various concurrent formats,
provided that they are distinguishable. This way you can parse a string
virtually knowing nothing about the formats of its fields. An additional
advantage is that error messages (if it comes to a more advanced system)
will be pretty easy to generate.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 14:01               ` Georg Bauhaus
  2005-06-16 12:27                 ` Dmitry A. Kazakov
@ 2005-06-16 13:26                 ` Marius Amado Alves
  2005-06-16 18:10                   ` Georg Bauhaus
  1 sibling, 1 reply; 68+ messages in thread
From: Marius Amado Alves @ 2005-06-16 13:26 UTC (permalink / raw)
  To: comp.lang.ada


On 16 Jun 2005, at 15:01, Georg Bauhaus wrote:

[a lot on data formats]

Georg, there was an example earlier (tabs simulated by 3 spaces here):

Gene ID   p-value   Expression-level   Description   Human cromosome
GE29031   0.04539   245.45             Cyclin-B1     17

So it's "really" atomic. Your arguments are valid, but do not apply to 
this case.

Incidently, this would generate

type Record_Type is
    record
       Gene_ID : String_Ptr;
       P_Value : Float;
       Expression_Level : Float;
       Description : String_Ptr;
       Human_Cromosome : Integer;
    end record;




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 10:53             ` Marius Amado Alves
  2005-06-16 12:24               ` Robert A Duff
@ 2005-06-16 14:01               ` Georg Bauhaus
  2005-06-16 12:27                 ` Dmitry A. Kazakov
  2005-06-16 13:26                 ` Marius Amado Alves
  1 sibling, 2 replies; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-16 14:01 UTC (permalink / raw)


Marius Amado Alves wrote:

> For tables of atomic data tab separated is better.

Note the crucial bits in this general statement.

1) You had really better have *atomic* data.

2) You had better have the format as your own format and
 no data exchange with any system requiring "just
 your table files, please".

Tab separated atomic data can be "semi-structured"
too. Consider 04/06/05 and tell me wich calender date that
is, in [choose country here].

It makes litte sense to say XML = semi, TAB = atomic without
specifying what exactly you mean by semi-structure data.
Consider

 <Date y="2005" m="June" d="04"/>

If a program maintains a table of calender dates
for internal use, then 2005-06-04, or 2005 TAB 06 TAB 04
save space and is easy to use. But it also restricts
the table to an internal data format.

Choice of TabSV depends on the requirements, doesn't it?
In particular on how many different programs will use the
data, who is going to "read" them in which ways, special
purpose or not, are there industry standards, etc..

I wonder whether Ada programmers will like a data format like
 
 Date'(y => 2005, m => -"June", d => 04)

and still keep saying that it must have scientifically proven
readability advantages, and that XML is verbose. And before
you answer, think of the word "habit".


Georg 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 12:27                 ` Dmitry A. Kazakov
@ 2005-06-16 14:46                   ` Georg Bauhaus
  2005-06-16 14:51                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-16 14:46 UTC (permalink / raw)


Dmitry A. Kazakov wrote:

> There is a better technique to parse strings than to tokenize them first.
> Get rid of scanner. Just take the date from the current position of the
> string and advance the position to the first character following the date.
> Because the procedure that gets the date knows the format it also knows
> where the date ends. It can also support various concurrent formats,
> provided that they are distinguishable. This way you can parse a string
> virtually knowing nothing about the formats of its fields. An additional
> advantage is that error messages (if it comes to a more advanced system)
> will be pretty easy to generate.

IIUC, what you describe is a (more binary) DTD, either language-standardised
or proprietary.

And also, what does the sentence "don't scan a string, and don't
produce tokens, but advance [something] to the first character
following the date that was taken[?] form the string" mean,
other than a contradiction in terms?



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 14:46                   ` Georg Bauhaus
@ 2005-06-16 14:51                     ` Dmitry A. Kazakov
  2005-06-20 11:19                       ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-16 14:51 UTC (permalink / raw)


On Thu, 16 Jun 2005 16:46:39 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
>> There is a better technique to parse strings than to tokenize them first.
>> Get rid of scanner. Just take the date from the current position of the
>> string and advance the position to the first character following the date.
>> Because the procedure that gets the date knows the format it also knows
>> where the date ends. It can also support various concurrent formats,
>> provided that they are distinguishable. This way you can parse a string
>> virtually knowing nothing about the formats of its fields. An additional
>> advantage is that error messages (if it comes to a more advanced system)
>> will be pretty easy to generate.
> 
> IIUC, what you describe is a (more binary) DTD, either language-standardised
> or proprietary.
> 
> And also, what does the sentence "don't scan a string, and don't
> produce tokens, but advance [something] to the first character
> following the date that was taken[?] form the string" mean,
> other than a contradiction in terms?

   Field_1 : Float;
   Field_2 : Integer;
   ...
   Line  : String := ...; -- The current line
   Pointer : Integer;  -- The current position in Line
   
   Pointer := Line'First;
   Get (Line, Pointer, Delimiters); -- Skip blanks
   Get (Line, Pointer, Field_1); -- Get field and move Pointer
   Get (Line, Pointer, Delimiters); -- Skip blanks
   Get (Line, Pointer, Field_2); -- Get field and move Pointer
   ...
   etc

Quite trivial.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 13:26                 ` Marius Amado Alves
@ 2005-06-16 18:10                   ` Georg Bauhaus
  0 siblings, 0 replies; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-16 18:10 UTC (permalink / raw)

Marius Amado Alves wrote:
> 
> On 16 Jun 2005, at 15:01, Georg Bauhaus wrote:
> 
> [a lot on data formats]
> 
> Georg, there was an example earlier (tabs simulated by 3 spaces here):
> 
> Gene ID   p-value   Expression-level   Description   Human cromosome
> GE29031   0.04539   245.45             Cyclin-B1     17

I did notice this example.

> So it's "really" atomic. Your arguments are valid, but do not apply to 
> this case.

That's hard to tell from this example. There is no TAB
inside the values, oK, but that doesn't make data atomic in an
application sense -- only the application knows.

And this is precisely a point of a well designed XML format: you have
a chance of naming the beginning and end of a value. It is up to the
designer of the document type to choose a suitable level of detail for
marking up the structure (and type) of both values and collections of
values. (Elements with attributes, subtrees of the document, using
domain specific notation in text.)

And let us hope that the text pipeline will leave the Tab
characters alone :-)

Georg 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-15 19:18         ` Oliver Kellogg
@ 2005-06-17  9:02           ` Jacob Sparre Andersen
  0 siblings, 0 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-17  9:02 UTC (permalink / raw)


Oliver Kellogg wrote:
> Marius Amado Alves wrote:
> >> Jacob Sparre Andersen <sparre@nbi.dk> wrote on 15/06/2005 (13:38) :

> >>> The important part is to have the checking of the headers and
> >>> the generation of Put_Line and Get_Line procedures automated
> >>> based on a record type (and not too much more).  Since I need
> >>> records (for type checking) and not just simple arrays, I can't
> >>> manage with a generic package, but have to put some code
> >>> generation into the system (or can I play some tricks with
> >>> streams?).

> > You have to generate code.

Yes.

> Auto_Text_IO ?
> 
> http://www.toadmail.com/~ada_wizard/ada/auto_text_io.html

I will have to hack it a bit for my purpose, but it looks like the
tool I need.

Now I just have to work around the lack of ASIS on Debian/PPC, but
that's relatively trivial.  Still, it would be nice if somebody could
explain (and solve) this bug report:

   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=117788

Jacob
-- 
ï¿½What fun is it being "cool" if you can't wear a sombrero?ï¿½



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16 14:51                     ` Dmitry A. Kazakov
@ 2005-06-20 11:19                       ` Georg Bauhaus
  2005-06-20 11:39                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-20 11:19 UTC (permalink / raw)


Dmitry A. Kazakov wrote:

>    Get (Line, Pointer, Field_2); -- Get field and move Pointer
>    ...
>    etc
> 
> Quite trivial.

And quite adventurous in any but an internal context.





^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 11:19                       ` Georg Bauhaus
@ 2005-06-20 11:39                         ` Dmitry A. Kazakov
  2005-06-20 18:25                           ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-20 11:39 UTC (permalink / raw)


On Mon, 20 Jun 2005 13:19:44 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
>>    Get (Line, Pointer, Field_2); -- Get field and move Pointer
>>    ...
>>    etc
>> 
>> Quite trivial.
> 
> And quite adventurous in any but an internal context.

Why?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 11:39                         ` Dmitry A. Kazakov
@ 2005-06-20 18:25                           ` Georg Bauhaus
  2005-06-20 18:45                             ` Preben Randhol
  2005-06-20 18:54                             ` Dmitry A. Kazakov
  0 siblings, 2 replies; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-20 18:25 UTC (permalink / raw)

Dmitry A. Kazakov wrote:
> On Mon, 20 Jun 2005 13:19:44 +0200, Georg Bauhaus wrote:
> 
> 
>>Dmitry A. Kazakov wrote:
>>
>>
>>>   Get (Line, Pointer, Field_2); -- Get field and move Pointer
>>>   ...
>>>   etc
>>>
>>>Quite trivial.
>>
>>And quite adventurous in any but an internal context.

If you are parsing data from outside, you have to know
the quality and structure of data (plus the pitfalls mentioned
by Robert Duff.) As to quality, just one inadvertently typed
space might be hazardous when it splits an atom in two... :)

(Think of a medium quality CSV file, and a number typed 3.1 5.
Oops!)

XML can help with this for example by identifying the bounds
of a data item, even if mistyped:
 <Distance km='3.1 5'/>
This will be noticed by the XML parser if it knows about km's
type (NMTOKEN). You could as well squeeze the space out using
either Ada.Strings or XML related technology. But in any case
there can be no doubt that the string "3.1 5" is a mistyped
number.

Georg 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 18:25                           ` Georg Bauhaus
@ 2005-06-20 18:45                             ` Preben Randhol
  2005-06-20 18:54                             ` Dmitry A. Kazakov
  1 sibling, 0 replies; 68+ messages in thread
From: Preben Randhol @ 2005-06-20 18:45 UTC (permalink / raw)
  To: comp.lang.ada

On Mon, Jun 20, 2005 at 08:25:13PM +0200, Georg Bauhaus wrote:
> XML can help with this for example by identifying the bounds
> of a data item, even if mistyped:
> <Distance km='3.1 5'/>

However only if it is computer generated...

> This will be noticed by the XML parser if it knows about km's
> type (NMTOKEN). You could as well squeeze the space out using
> either Ada.Strings or XML related technology. But in any case
> there can be no doubt that the string "3.1 5" is a mistyped
> number.

This depends if the parser is validating or not. Many parsers are not
validating. Especially if one use SAX.

-- 
Preben Randhol -------------- http://www.pvv.org/~randhol/Ada95 --
                 �For me, Ada95 puts back the joy in programming.�



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 18:25                           ` Georg Bauhaus
  2005-06-20 18:45                             ` Preben Randhol
@ 2005-06-20 18:54                             ` Dmitry A. Kazakov
  2005-06-21  9:24                               ` Georg Bauhaus
  2005-06-25 16:38                               ` Simon Wright
  1 sibling, 2 replies; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-20 18:54 UTC (permalink / raw)


On Mon, 20 Jun 2005 20:25:13 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
>> On Mon, 20 Jun 2005 13:19:44 +0200, Georg Bauhaus wrote:
>> 
>> 
>>>Dmitry A. Kazakov wrote:
>>>
>>>
>>>>   Get (Line, Pointer, Field_2); -- Get field and move Pointer
>>>>   ...
>>>>   etc
>>>>
>>>>Quite trivial.
>>>
>>>And quite adventurous in any but an internal context.
> 
> If you are parsing data from outside, you have to know
> the quality and structure of data (plus the pitfalls mentioned
> by Robert Duff.) As to quality, just one inadvertently typed
> space might be hazardous when it splits an atom in two... :)
>
> (Think of a medium quality CSV file, and a number typed 3.1 5.
> Oops!)

No, you just have to use different delimiters between and within the
fields. This is why in Ada parameters of a procedure call are separated by
commas rather than spaces.

Though is it about what syntax would be the best? Or is it about how to
parse something in a defined syntax?

> XML can help with this for example by identifying the bounds
> of a data item, even if mistyped:
>  <Distance km='3.1 5'/>
> This will be noticed by the XML parser if it knows about km's
> type (NMTOKEN).

Now consider a space between / and >:

<Distance km='3.15'/ >

XML adds here nothing, but a huge readability loss.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 18:54                             ` Dmitry A. Kazakov
@ 2005-06-21  9:24                               ` Georg Bauhaus
  2005-06-21  9:52                                 ` Jacob Sparre Andersen
  2005-06-21 10:42                                 ` Dmitry A. Kazakov
  2005-06-25 16:38                               ` Simon Wright
  1 sibling, 2 replies; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-21  9:24 UTC (permalink / raw)

Dmitry A. Kazakov wrote:

> No, you just have to use different delimiters between and within the
> fields.

"You just have to... ". No, gosh, the space was _mistyped_,
it wasn't intended. This goes for any typo irrespective of what
delimiter you choose. Now any reasonable CSV has far less offerings for
error correction facilities for typos like these than any reasonable
XML. By definition. (And, yes, I know you can construct syntax errors
in XML, too, if you think this is an argument ...)

Is it the typical Ada programmer's attitude to promote self-documenting
bracketing constructs only for program text, but never for data text?

> This is why in Ada parameters of a procedure call are separated by
> commas rather than spaces.
> 
> Though is it about what syntax would be the best? Or is it about how to
> parse something in a defined syntax?

HAving a "best syntax" requires a measure for syntax quality.
If you measure what a syntax can do in a heterogenous project
by applying your personal aesthetic preferences,
or your reading habits, or your programming skills, I have nothing to say.

If you care about robust data interchange in a "sloppy
field", you employ standard tools to help you get the correct
data.

> Now consider a space between / and >:
> 
> <Distance km='3.15'/ >
> 
> XML adds here nothing, but a huge readability loss.

Oh well... You mean

  Distance'(km => 3.15)

can be read well, whereas

  Distance'( km => 3.15 )

is a huger readability loss? Come on.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21  9:24                               ` Georg Bauhaus
@ 2005-06-21  9:52                                 ` Jacob Sparre Andersen
  2005-06-21 11:10                                   ` Georg Bauhaus
  2005-06-21 10:42                                 ` Dmitry A. Kazakov
  1 sibling, 1 reply; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-21  9:52 UTC (permalink / raw)


Georg Bauhaus wrote:

> "You just have to... ". No, gosh, the space was _mistyped_, it
> wasn't intended. This goes for any typo irrespective of what
> delimiter you choose. Now any reasonable CSV has far less offerings
> for error correction facilities for typos like these than any
> reasonable XML. By definition. (And, yes, I know you can construct
> syntax errors in XML, too, if you think this is an argument ...)
> 
> Is it the typical Ada programmer's attitude to promote
> self-documenting bracketing constructs only for program text, but
> never for data text?

Unlike Ada, XML is _not_ human-readable.

And if I want an error-correcting file format which isn't
human-readable, there are plenty to choose from, which are faster than
XML.

Jacob
-- 
"Sleep is just a cheap substitute for coffee"



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21  9:24                               ` Georg Bauhaus
  2005-06-21  9:52                                 ` Jacob Sparre Andersen
@ 2005-06-21 10:42                                 ` Dmitry A. Kazakov
  2005-06-21 11:41                                   ` Georg Bauhaus
  1 sibling, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-21 10:42 UTC (permalink / raw)


On Tue, 21 Jun 2005 11:24:34 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
>> No, you just have to use different delimiters between and within the
>> fields.
> 
> "You just have to... ". No, gosh, the space was _mistyped_,
> it wasn't intended. This goes for any typo irrespective of what
> delimiter you choose. Now any reasonable CSV has far less offerings for
> error correction facilities for typos like these than any reasonable
> XML. By definition. (And, yes, I know you can construct syntax errors
> in XML, too, if you think this is an argument ...)
> 
> Is it the typical Ada programmer's attitude to promote self-documenting
> bracketing constructs only for program text, but never for data text?

See below. It is a table. It has bracketing: rows and columns. This form
existed for centuries before XML. Who would print tables of logarithms in
XML?

>> This is why in Ada parameters of a procedure call are separated by
>> commas rather than spaces.
>> 
>> Though is it about what syntax would be the best? Or is it about how to
>> parse something in a defined syntax?
> 
> HAving a "best syntax" requires a measure for syntax quality.
> If you measure what a syntax can do in a heterogenous project
> by applying your personal aesthetic preferences,
> or your reading habits, or your programming skills, I have nothing to say.
> 
> If you care about robust data interchange in a "sloppy
> field", you employ standard tools to help you get the correct
> data.

That is a different problem for which I would use a well-defined binary
format instead of fancy 3.15. What is the *accuracy* of this value, huh?

>> Now consider a space between / and >:
>> 
>> <Distance km='3.15'/ >
>> 
>> XML adds here nothing, but a huge readability loss.
> 
> Oh well... You mean
> 
>   Distance'(km => 3.15)
> 
> can be read well, whereas
> 
>   Distance'( km => 3.15 )
> 
> is a huger readability loss? Come on.

Distance isn't a record. At least it should not be visible as such. Neither
distance is a type. The closest Ada's equivalent would be

   Distance => 3.15 km,

or

   Distance := 3.15 km;

But, lack of readability is not in the ugly </> brackets. Tabulated data
are readable because they are tabulated. That is: the names, the types and
units are *factored* out to the table header, which allows the reader to
concentrate on the *values*. Thus a table looks as:

Distance [km]   Temperature  [ï¿½C]  ...
3.15                29.0     ...
2.10                14.4     ...

This is readable.

To make difference more visible, consider bitmaps stored XML format. Would
you be able to recognize a person's face in it?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21  9:52                                 ` Jacob Sparre Andersen
@ 2005-06-21 11:10                                   ` Georg Bauhaus
  2005-06-21 12:35                                     ` Jacob Sparre Andersen
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-21 11:10 UTC (permalink / raw)

Jacob Sparre Andersen wrote:

> Unlike Ada, XML is _not_ human-readable.

First, this has been claimed many times without even an indication
of why this might be so. Again, compare

  <Date year = "2006" month = "December" day = "24"/>

and

  Date'(year => 2006, month => -"December", day => 24);

Could it be that you just don't like reading angle brackets?
Do the <...> smell like C++'s template parameter brackets?
Again, habits? I won't say that XML *looks* nice, but it's purpose
is not to look nice, this is not a Miss Dataformat Competition
where you cannot win without rounded curves o.K. by the lates fashion.
XML is supposed to support identifing data in text form.

Second, XML is meant to be easily accessible using text toos,
not to be printed as novels. As such, it is not a language for
writing prose, formal or not.
This is why the relevant standards define a notion of rendition.

> And if I want an error-correcting file format which isn't
> human-readable, there are plenty to choose from, which are faster than
> XML.

Such as...?
ASN.1 perhaps?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21 10:42                                 ` Dmitry A. Kazakov
@ 2005-06-21 11:41                                   ` Georg Bauhaus
  2005-06-21 12:44                                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-21 11:41 UTC (permalink / raw)

Dmitry A. Kazakov wrote:

> See below. It is a table. It has bracketing: rows and columns.

Back to step one: brackets in computer tables are not named,
a computer doesn't have accountants' abilities in pattern matching
when looking at rows and colums in a table. Again, I said XML is good
for parsing of data if you cannot tell in advance that the data stream
is totally free of errors. XML provides means to build robust data
streams in the absence of tight definitions and reliable procedures.

As for whitespace, read Stroustrup's article on defining operator
whitespace.

> This form
> existed for centuries before XML. Who would print tables of logarithms in
> XML?

You're missing the point: XML is *not* about rendering data.
Logarithms are logarithms, not printed logarithms, this is a second
step.  Data formats for exchange or storage on the one hand and
a print-out of some data on the other hand are two very different beasts,
with different purposes. Consider the MVC paradigm.

>>If you care about robust data interchange in a "sloppy
>>field", you employ standard tools to help you get the correct
>>data.
> 
> 
> That is a different problem for which I would use a well-defined binary
> format instead of fancy 3.15. What is the *accuracy* of this value, huh?

It is totally unimportant what you or I would want, sorry.
For a robust data interchange, absent comprehensive definitions
and guarantees about data production, you need redundancy, period.

The accuracy is well defined and most importantly,
it is up to the application, yours and mine repectively.
We both use the accuracy that is most appropriate, and I won't
tell you not to use an internal type when it suits your application.
I expect the same of you. If all I have to do is to store kilometers
measuring straight lines inside the Netherlands in a relational database,
I known the datatype I can use, no matter what you think is best
in your application.
 This has been discussed for years during the development of
XML Schema. What do you care about my accuracy as long as
I compute values from your data that are within application
bounds? 3.15 is as accurate as can be, and independent of
bits.

 > Distance isn't a record.

Huh? In data exchange it isn't your job to to tell others how they
should represent one particular distance.
Likewise, it's not my job to tell you not to think of print, so
to speak. But we both have to exchange all relevant data, and we
have to agree on element types and their attributes to represent
data we both need. This is about DTDs and the like, not about
using XML or not. Going from XML to ASN.1 or some format based
on Lisp list doesn't add much difference. We still both have to
know what an item means. Tags are good for helping with this because
they add information about items. Qualified notation so to speak.

> But, lack of readability is not in the ugly </> brackets. Tabulated data
> are readable because they are tabulated.

This is the *View* in MVC, XML is about *data*. So there is no point in
talking about final looks, it is important to know how data will have
to be seen. For example, can you debug datastreams using the simplest
tools? Think of a log file of a concurrent application, processing data
from several heterogenous input sources on the net.

> That is: the names, the types and
> units are *factored* out to the table header, which allows the reader to
> concentrate on the *values*. Thus a table looks as:
> 
> Distance [km]   Temperature  [°C]  ...
> 3.15                29.0     ...
> 2.10                14.4     ...
> 
> This is readable.

This is irrelevant in data exchange. This is print.

> To make difference more visible, consider bitmaps stored XML format. Would
> you be able to recognize a person's face in it?

You do know about NOTATION?
I think it is very hard to find someone suggesting that we should recode
bitmap graphics formats as pixel tags.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21 11:10                                   ` Georg Bauhaus
@ 2005-06-21 12:35                                     ` Jacob Sparre Andersen
  0 siblings, 0 replies; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-21 12:35 UTC (permalink / raw)

Georg Bauhaus wrote:
> Jacob Sparre Andersen wrote:

> > Unlike Ada, XML is _not_ human-readable.
> 
> First, this has been claimed many times without even an indication
> of why this might be so. Again, compare
> 
>   <Date year = "2006" month = "December" day = "24"/>
> 
> and
> 
>   Date'(year => 2006, month => -"December", day => 24);
> 
> Could it be that you just don't like reading angle brackets?

I definitely don't like reading _any_ brackets, when I'm looking at
data.

> Do the <...> smell like C++'s template parameter brackets?

They may.  But neither of the above two notations are sensible, when I
am playing with 30 ï¿½ 55k matrices.

> Again, habits? I won't say that XML *looks* nice, but it's purpose
> is not to look nice, this is not a Miss Dataformat Competition where
> you cannot win without rounded curves o.K. by the lates fashion.
> XML is supposed to support identifing data in text form.

Yes.  But for tabular data XML has much too much overhead and is thus
too difficult to read.

> > And if I want an error-correcting file format which isn't
> > human-readable, there are plenty to choose from, which are faster than
> > XML.
> 
> Such as...?
> ASN.1 perhaps?

I am not sure if ASN.1 includes error-correction, but it was one of
the options I had on my mind, when I wrote the sentence.  A much more
effective format would be based on an instantiation of Ada.Direct_IO
with some kind of checksum included in Element_Type.  My astrophysics
colleagues also have a nice format for multidimensional tables, but I
can't remember the name at the moment.

Jacob
-- 
                      CAUTION
               BLADE EXTREMELY SHARP
                KEEP OUT OF CHILDREN

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21 11:41                                   ` Georg Bauhaus
@ 2005-06-21 12:44                                     ` Dmitry A. Kazakov
  2005-06-21 21:01                                       ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-21 12:44 UTC (permalink / raw)


On Tue, 21 Jun 2005 13:41:25 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
>  
>> See below. It is a table. It has bracketing: rows and columns.
> 
> Back to step one: brackets in computer tables are not named,
> a computer doesn't have accountants' abilities in pattern matching
> when looking at rows and colums in a table. Again, I said XML is good
> for parsing of data if you cannot tell in advance that the data stream
> is totally free of errors.

No it is bad, because missing one bracket may lead to loss of the whole
data set. As a medium XML is as awful as readable.

> XML provides means to build robust data
> streams in the absence of tight definitions and reliable procedures.

> As for whitespace, read Stroustrup's article on defining operator
> whitespace.

Delimiter /= whitespace.

>> This form
>> existed for centuries before XML. Who would print tables of logarithms in
>> XML?
> 
> You're missing the point: XML is *not* about rendering data.

Sorry, but the thread's subject reads "Data table text I/O package". Text =
rendered data.

> Logarithms are logarithms, not printed logarithms, this is a second
> step.  Data formats for exchange or storage on the one hand and
> a print-out of some data on the other hand are two very different beasts,
> with different purposes. Consider the MVC paradigm.

This is obviously wrong, clearly print-outs serve both data exchange and
data storage when humans are involved.

>>>If you care about robust data interchange in a "sloppy
>>>field", you employ standard tools to help you get the correct
>>>data.
>> 
>> That is a different problem for which I would use a well-defined binary
>> format instead of fancy 3.15. What is the *accuracy* of this value, huh?
> 
> It is totally unimportant what you or I would want, sorry.
> For a robust data interchange, absent comprehensive definitions
> and guarantees about data production, you need redundancy, period.
> 
> The accuracy is well defined and most importantly,
> it is up to the application, yours and mine repectively.
> We both use the accuracy that is most appropriate, and I won't
> tell you not to use an internal type when it suits your application.
> I expect the same of you. If all I have to do is to store kilometers
> measuring straight lines inside the Netherlands in a relational database,
> I known the datatype I can use, no matter what you think is best
> in your application.

This is a wrong approach of course. Because the accuracy of the data is
*not* defined by the internal type used. And in any case the internal type
is irrelevant to the data format used. Note that binary format has nothing
to do with any internal format.

> This has been discussed for years during the development of
> XML Schema. What do you care about my accuracy as long as
> I compute values from your data that are within application
> bounds? 3.15 is as accurate as can be, and independent of
> bits.

Is it 3.14998751 or 3.150000? Floating-point numbers are intervals.
Transporting them you should either use explicit bounds: [3.1499, 3.1600]
or accuracy: 3.15 +/-0.0001. "As accurate as can be" is nice, but what if
the application is a gateway, which reads 3.15 as accurate as 4 bytes float
is and then sends it away? Two other applications communicating through it
and using long long float will be quite perplexed...

>> But, lack of readability is not in the ugly </> brackets. Tabulated data
>> are readable because they are tabulated.
> 
> This is the *View* in MVC, XML is about *data*. So there is no point in
> talking about final looks, it is important to know how data will have
> to be seen. For example, can you debug datastreams using the simplest
> tools? Think of a log file of a concurrent application, processing data
> from several heterogenous input sources on the net.

Really? A normal log file of our data acquisition and control system (3-4
nodes, 500-1000 channels each) is about 10-100 MB. A trace file of the same
system is typically about 10-100GB. The first is a highly dense binary
format. The second is dense ASCII. Do you know any editor capable to load
10GB? In UltraEdit you need about 10 minutes to wait, before it becomes
ready to do anything. Now, you propose me to convert all that into XML? How
much is SCSI terabyte now? But more importantly each extra byte of rubbish
you write is multiplied by the number of channels and their frequencies,
that costs system performance.

>> That is: the names, the types and
>> units are *factored* out to the table header, which allows the reader to
>> concentrate on the *values*. Thus a table looks as:
>> 
>> Distance [km]   Temperature  [ï¿½C]  ...
>> 3.15                29.0     ...
>> 2.10                14.4     ...
>> 
>> This is readable.
> 
> This is irrelevant in data exchange. This is print.
> 
>> To make difference more visible, consider bitmaps stored XML format. Would
>> you be able to recognize a person's face in it?
> 
> You do know about NOTATION?
> I think it is very hard to find someone suggesting that we should recode
> bitmap graphics formats as pixel tags.

So an image is not print whereas a table is?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21 12:44                                     ` Dmitry A. Kazakov
@ 2005-06-21 21:01                                       ` Georg Bauhaus
  2005-06-22 12:15                                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-21 21:01 UTC (permalink / raw)

Dmitry A. Kazakov wrote:

Let me first guess that many here have their largely
regular and homogenuous data in mind. I'm not talking
about this. We went off from what to do if you
don't have atomic, homogenous, unambigous data, sent
around.

1) If you have a nice arrangement of exactly one set of
   array-like data of guaranteed quality, there is little
   to win by using XML.

2) Given a data format much like in (1),
   if you can pick up the phone and ring the other end
   of the data-sending connection, and say, 'Uhm, we have
   seen a slight change in the data text table, could you
   explain ...' or similar, you are privileged.

3) I you think that every bunch of data is sent in agreeable
   format, I could be telling you a few stories, though
   not in public.

>>Back to step one: brackets in computer tables are not named,
>>a computer doesn't have accountants' abilities in pattern matching
>>when looking at rows and colums in a table. Again, I said XML is good
>>for parsing of data if you cannot tell in advance that the data stream
>>is totally free of errors.
> 
> 
> No it is bad, because missing one bracket may lead to loss of the whole
> data set. As a medium XML is as awful as readable.

If you mean loosing a closing tag, the parser can correct,
though not always, and to different extents. If you mean
somehow a '>' of a start tag is lost, then this is better or
worse than in typical CSV or similar; a line end is a bracket,
too. A separator is a two-way bracket, adding one more
possibility for error and ambiguity.

Imagine a CSV stream with _no_ record separators.
(This is not fiction.) It is kind of efficient, you count
fields.
However, if some data item contains a separator due to
an error, you loose the whole stream, or use the wrong
data without noticing this, in the worst case.

 >>As for whitespace, read Stroustrup's article on defining operator
>>whitespace.
> 
> 
> Delimiter /= whitespace.

True, still Stroustrup demonstrates some effects we are
discussing.

>>You're missing the point: XML is *not* about rendering data.
> 
> 
> Sorry, but the thread's subject reads "Data table text I/O package". Text =
> rendered data.

Notice that the thread title has I/O. I/O can mean pretty printing,
and it can mean a reliable and robust data input-output facility,
working well in the face of erroneous input.
I was under the impression that we were discussing the latter,
in particular  I added: You better had  such-and-such  data if you
want to reliably handle data in a sloppy setting, answering
Marius Amado Alves IIRC.

>>Logarithms are logarithms, not printed logarithms, this is a second
>>step.  Data formats for exchange or storage on the one hand and
>>a print-out of some data on the other hand are two very different beasts,
>>with different purposes. Consider the MVC paradigm.
> 
> 
> This is obviously wrong, clearly print-outs serve both data exchange and
> data storage when humans are involved.

The point is whether print-outs serve *well* as a data exchange
format, IN THE SITUATION described above, that is you do not know
in advance that you will get the finest data.
 I doubt that this is the case in any but a few well
defined situations. (I.e., you might meet it more often in contexts
where Ada is used, or so I hope.)

>>The accuracy is well defined and most importantly,
>>it is up to the application, yours and mine repectively.

> This is a wrong approach of course.

There is no more accurate representation of 3.15 than the text "3.15",
right under our noses. In a text data stream, tabular, XML, whatever.

I appreciate that you care how I should read a "3.15" and store it.
Though, if my application uses decimal fixed point to represent money
with 4 digits after the point, then you can add as many zeros
as you like after .15, it's none of your business, it's the other
application's business. I may not have your hardware, I may
not have your rounding policies. I still find your data useful.

> Because the accuracy of the data is
> *not* defined by the internal type used.

The accuracy of the data may not be defined at all, IN THE DATA
STREAM. (The again, some peoply may try, adding a schema.)

>> 3.15 is as accurate as can be, and independent of
>>bits.

> Is it 3.14998751 or 3.150000?

It is 3.15. This is data, text data. Not a computer floating point
value, just data in textual external format, very flexible,
and with the number of digits that you see. Do with it what
_your_ application wants to do with it. This is what you get.
 "Is the light On, or Off?" -- "It is On." Data, "Off" or "On".
No matter how any program represents On or Off, all that can be said
about On or Off AS PARTS OF THE DATA IN THE TEXT STREAM is in
the stream. Use a Boolean, or use an enumeration type, your choice.

 A data stream does not in general define semantics. On the contrary,
the standards talk about applications defining meaning, in the end.)

> Floating-point numbers are intervals.
> Transporting them you should either use explicit bounds:

Who said floating point? I said "3.15", ('3', '.', '1', '5').
You do not have have a solution to the problem of exactly
representing R-eal values in a data transport context, do you?
(And note that not every important number originates inside
a computer's FPU.)

> [3.1499, 3.1600]

Well, someone will ask you, 'and what exactly is 3.1499?' on
*our* machine?

> Really? A normal log file [...]

You argue from your log files, let me argue from a heterogeneity point
of view. (BTW, I use text pipes and stream analysis to look at files
of about this size.)

A server is running, you can look at the trace log, some parser fails,
you want to know why. Say there are three lines of ';'-separated data,
each at most 400 characters long. Ideally one appears right after the other.
These lines are what they send you, no way to change that.
Each field has varying length. Your job will be to associate matching
fields.
 Because 370 characters don't fit in a single display line,
you end up counting ';'s in each line and take notes, or c&p,
to find the matching fields.

Now consider separated key=value lines. They will be longer,
but you can scan the line looking for the key strings. A big
step up. XML isn't worse in my view.

>>>That is: the names, the types and
>>>units are *factored* out to the table header, which allows the reader to
>>>concentrate on the *values*. Thus a table looks as:
>>>
>>>Distance [km]   Temperature  [°C]  ...
>>>3.15                29.0     ...
>>>2.10                14.4     ...
>>>
>>>This is readable.

Sure, I'm of course not saying a table isn't readable.
I can even use XSL-FO or TeX to produce a table from XML,
no problem. In fact, I have done this many times.

A formatted table just isn't that robust.
Consider the case where the headline
gets lost. The missing redundancy will leave you with a
puzzle, not a robust set of self describing text.

>>This is irrelevant in data exchange. This is print.
>>
>>
>>>To make difference more visible, consider bitmaps stored XML format. Would
>>>you be able to recognize a person's face in it?

I'm sure you know one can make text images, but I won't argue about
this for the same reasons that have been explained for years when
discussing SGML and data for which no parsing is desired.

> So an image is not print whereas a table is?

Now we are entering the realm of robust image encoding...
No.

Georg Bauhaus 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-21 21:01                                       ` Georg Bauhaus
@ 2005-06-22 12:15                                         ` Dmitry A. Kazakov
  2005-06-22 22:24                                           ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-22 12:15 UTC (permalink / raw)

On Tue, 21 Jun 2005 23:01:59 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
> Let me first guess that many here have their largely
> regular and homogenuous data in mind. I'm not talking
> about this. We went off from what to do if you
> don't have atomic, homogenous, unambigous data, sent
> around.
> 
> 1) If you have a nice arrangement of exactly one set of
>    array-like data of guaranteed quality, there is little
>    to win by using XML.

OK, that is a big difference. Tables representing tree-like structures are
awful.

>> Sorry, but the thread's subject reads "Data table text I/O package". Text =
>> rendered data.
> 
> Notice that the thread title has I/O. I/O can mean pretty printing,
> and it can mean a reliable and robust data input-output facility,
> working well in the face of erroneous input.

But for data exchange there are better techniques than XML. Even if you
mean [far stretched] objects brokering and active agents performed over a
stream or printable characters, even then I wouldn't take XML.

>>>The accuracy is well defined and most importantly,
>>>it is up to the application, yours and mine repectively.
>  
>> This is a wrong approach of course.
> 
> There is no more accurate representation of 3.15 than the text "3.15",
> right under our noses. In a text data stream, tabular, XML, whatever.

The text "3.15" represents what? Everything of course depends on the OSI
layer we are talking about. (:-))

[...]
> The accuracy of the data may not be defined at all, IN THE DATA
> STREAM. (The again, some peoply may try, adding a schema.)

Then you cannot talk about numbers transferred. You said "3.15" is a text.
So let it be a text. "3.1 5" is also a text, as valid as "3.15" [at this
level of abstraction.]

BTW, again there are better ways to send texts than XML offers.

>> [3.1499, 3.1600]
> 
> Well, someone will ask you, 'and what exactly is 3.1499?' on
> *our* machine?

3.1499 is the lower bound. So on your machine you can represent it by any
number less or equal to 3.1499. You loose precision, but retain
correctness. The true value is always within the bounds. There is still a
problem, but a much lesser one.

> Now consider separated key=value lines. They will be longer,
> but you can scan the line looking for the key strings. A big
> step up. XML isn't worse in my view.

Unfortunately in our case it is not that simple. key=value does not help.
The problem is that data need to be sorted and filtered using various
criteria. In other words a value has more than one key. A relational DB
would probably help, but to load that amount of data would take too long.
So it ends up with a specialized tool chain, integrated diagnostic etc.

BTW, 80% of that would probably be unnecessary if Ada were used! (:-)) But
the customer wished otherwise...

> A formatted table just isn't that robust.
> Consider the case where the headline
> gets lost. The missing redundancy will leave you with a
> puzzle, not a robust set of self describing text.

It is a bad idea to correct I/O error using syntax anyway. Relevant errors
are only ones made by humans. It is very unlikely that somebody would
forget to read a table header [I don't talk about writing, because to write
in XML is beyond anybody's capability anyway.]

Humans are unbeatable in pattern recognition. This is whole idea behind
tables. Tab stops and lines are very easy patterns to detect and any error
becomes immediately visible long before inspecting the table contents.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-22 12:15                                         ` Dmitry A. Kazakov
@ 2005-06-22 22:24                                           ` Georg Bauhaus
  2005-06-23  9:03                                             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-22 22:24 UTC (permalink / raw)


Dmitry A. Kazakov wrote:


> But for data exchange there are better techniques than XML.

Such as ...?

> Then you cannot talk about numbers transferred. You said "3.15" is a text.
> So let it be a text. "3.1 5" is also a text, as valid as "3.15" [at this
> level of abstraction.]

How is a number transfered from one human to another?
How do you explain the number three to a person who
cannot see?


> [sending two literals instead of one]
> The true value is always within the bounds. There is still a
> problem, but a much lesser one.

I don't agree because you are actually introducing two intervals.
And mine might be different from yours anyway. So why not use "3.15"
as per the needs of the application?

> Relevant errors are only ones made by humans.

Ahh, no. Think of the last time you have been watching satellite
TV with a strong cloud in the way. Where is your nice data stream...
(No I'm not suggesting XML here, of course, but satellites aren't
just used for MPEG streams. They can transmit XML data too.)

> [I don't talk about writing, because to write
> in XML is beyond anybody's capability anyway.]

I suggest you have a look at oXygen or nXML mode for Emacs,
or PSGML mode for Emacs. (Serna is also nice,
though it is, uhm, stabilizing.)
They all provide functions similar to a good programmer's IDE,
analysing source text to help you with typing, inserting
completions automatically, running the validator in the
background etc.

(In fact, XML lends itself well to syntax directed editing,
whether you see the tags or not. :-))


> Humans are unbeatable in pattern recognition. This is whole idea behind
> tables. Tab stops and lines are very easy patterns to detect and any error
> becomes immediately visible long before inspecting the table contents.

Right. So next time someone sends you an HTML table full of data,
use this for a start, to get a nice plain text table. (It's verbose,
I know :)

<?xml version='1.0' encoding='UTF-8'?>
<transform
   xmlns:html="http://www.w3.org/1999/xhtml"
   xmlns="http://www.w3.org/1999/XSL/Transform"
   version="1.0">

  <!-- Transforms an XHTML table into a plain text table.

Input: An XHTML document containing tables.
Output: A text document containing tables.

The tables should contain small portions of text in their cells, for
example matrix data or some tabular array of small strings.

The default width of columns in this transformation is 8, see "pad". -->

  <output method="text"/>

  <param name="my-line-terminator">
    <!--
      do not use system defaults for terminating lines.
      Use these characters instead. See new-line.
      -->
    <text>&#13;&#10;</text>
  </param>



  <template match="/">
    <!-- insert some empty lines and then start the plain text table -->
    <for-each select="descendant::table">
      <call-template name="new-line">
        <with-param name="count">2</with-param>
      </call-template>
      <apply-templates select="tbody"/>
    </for-each>
  </template>


  <template match='tbody'>
    <!-- print the head, then a separating line, then the rows -->
    <apply-templates select="tr/th"/>
    <call-template name="new-line"/>
    <text>================================================</text>
    <for-each select="tr">
      <apply-templates select="td"/>
      <call-template name="new-line"/>
    </for-each>
  </template>


  <template match="td | th">
    <!-- place the text content inside a cell padded with blanks -->
    <call-template name="pad">
      <with-param name="characters">
        <apply-templates/>
      </with-param>
    </call-template>
  </template>


  <template match="td//*">
    <!-- inside a table cell, discard everything but text -->
    <value-of select="text()"/>
  </template>


  <template name='pad'>
    <param name="characters"/>
    <!-- the text to which padding blanks might be added -->

    <param name="default-width">8</param>
    <!-- default column width measured in number of characters -->

    <variable name="fill"
              select="$default-width - string-length($characters)"/>
    <choose>
      <when test="$fill &lt; 0">
        <message>Please choose a wider display</message>
      </when>
      <otherwise>
        <value-of select="$characters"/>
        <value-of select=" substring('        ',  1, $fill)"/>
      </otherwise>
    </choose>
  </template>


  <template name="new-line">
    <!-- 1 or more lines will be terminated -->
    <param name="count">1</param>

    <if test="$count &gt; 0">
      <value-of select="$my-line-terminator"/>
      <call-template name="new-line">
        <with-param name="count">
          <value-of select="$count - 1"/>
        </with-param>
      </call-template>
    </if>
  </template>

</transform>




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-22 22:24                                           ` Georg Bauhaus
@ 2005-06-23  9:03                                             ` Dmitry A. Kazakov
  2005-06-23  9:47                                               ` Georg Bauhaus
  2005-06-23 14:16                                               ` Marc A. Criley
  0 siblings, 2 replies; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-23  9:03 UTC (permalink / raw)

On Thu, 23 Jun 2005 00:24:30 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
>> But for data exchange there are better techniques than XML.
> 
> Such as ...?

Take any middleware available.

>> Then you cannot talk about numbers transferred. You said "3.15" is a text.
>> So let it be a text. "3.1 5" is also a text, as valid as "3.15" [at this
>> level of abstraction.]
> 
> How is a number transfered from one human to another?

As a description of some [usually trivial] problem. The solution of that 
problem conveys the number. Most people are very bad in memorizing raw 
numbers or even in recognizing them from an acoustic stream.

> How do you explain the number three to a person who
> cannot see?

The number second to two. (:-))

>> [sending two literals instead of one]
>> The true value is always within the bounds. There is still a
>> problem, but a much lesser one.
> 
> I don't agree because you are actually introducing two intervals.

No, it is still one interval that contains the true number. This is the way 
floating-point arithmetic functions. The result of a+b is c, such that 
[c'Pred, c'Succ] contains the exact result. [*] The problem is that 'Pred 
and 'Succ are of course machine dependent. So when you send c you should 
also convey the range. Depending on that the receiver should chose an 
appropriate internal representation for c, which might require a "true" 
interval.

> And mine might be different from yours anyway. So why not use "3.15"
> as per the needs of the application?

It is no problem. But then in your XML format it should rather be:

<model="float", dimension="km", digits="4", value="3.15">

This might look close to Ada's ideology, but I would rather say it does 
not. It smells much of structural types equivalence, I don't like it. What 
if the application expects a fixed point number? Would you convert? It is 
too slippery...

BTW, I'm not arguing against the idea of using type descriptions in 
protocols. It is a great idea. I think Ada will definitely confront this 
issue some day, because presently Ada is completely unable to handle it. 
But XML isn't a right answer here.

>> Relevant errors are only ones made by humans.
> 
> Ahh, no. Think of the last time you have been watching satellite
> TV with a strong cloud in the way. Where is your nice data stream...
> (No I'm not suggesting XML here, of course, but satellites aren't
> just used for MPEG streams. They can transmit XML data too.)

Never use UDP, and you'll have no problems with that! (:-)) But seriously, 
do you really want to collapse all OSI levels into one big mess and make an 
application responsible for error correction?

>> [I don't talk about writing, because to write
>> in XML is beyond anybody's capability anyway.]
> 
> I suggest you have a look at oXygen or nXML mode for Emacs,
> or PSGML mode for Emacs. (Serna is also nice,
> though it is, uhm, stabilizing.)
> They all provide functions similar to a good programmer's IDE,
> analysing source text to help you with typing, inserting
> completions automatically, running the validator in the
> background etc.

That's the point, to write something as a table, you need nothing more 
elaborated than a notepad editor...

Actually I enjoy XML in postings. It is an excellent spam flag. Any post 
which isn't plain text immediately goes into the recycle bin. (:-))

>> Humans are unbeatable in pattern recognition. This is whole idea behind
>> tables. Tab stops and lines are very easy patterns to detect and any error
>> becomes immediately visible long before inspecting the table contents.
> 
> Right. So next time someone sends you an HTML table full of data,
> use this for a start, to get a nice plain text table. (It's verbose,
> I know :)
[...]

And why this nightmare cannot be written in Ada?

----------
* Depending on how the machine rounds a narrower interval can be used.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-23  9:03                                             ` Dmitry A. Kazakov
@ 2005-06-23  9:47                                               ` Georg Bauhaus
  2005-06-23 10:34                                                 ` Dmitry A. Kazakov
  2005-06-23 14:16                                               ` Marc A. Criley
  1 sibling, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-23  9:47 UTC (permalink / raw)

Dmitry A. Kazakov wrote:
> On Thu, 23 Jun 2005 00:24:30 +0200, Georg Bauhaus wrote:
> 
> 
>>Dmitry A. Kazakov wrote:
>>
>>
>>>But for data exchange there are better techniques than XML.
>>
>>Such as ...?
> 
> 
> Take any middleware available.

Uhm, yes, such as ...?

> No, it is still one interval that contains the true number. This is the way 
> floating-point arithmetic functions. The result of a+b is c, such that 
> [c'Pred, c'Succ] contains the exact result. [*] The problem is that 'Pred 
> and 'Succ are of course machine dependent. So when you send c you should 
> also convey the range. Depending on that the receiver should chose an 
> appropriate internal representation for c, which might require a "true" 
> interval.

This amounts to specifying the precise details of a floating
point computation in a data stream; a rather special case I think.
Take for example prices, guesstimates of future price changes,
insurance rates, direction of tomorrow's winds, day temperature,
and the like. It seems quite enough to transmit one fpt
number literal in these cases.

> Never use UDP, and you'll have no problems with that! (:-)) But seriously, 
> do you really want to collapse all OSI levels into one big mess

No.

> and make an 
> application responsible for error correction?

How can hard/software at OSI levels guarantee correct data?
As soon as there is something real in there (i.e., real software,
real hardware, real interference, humans, ...), degradation
is possible.

> [...]
[XSL transformation]

> And why this nightmare cannot be written in Ada?

Because it would explode even more.

Reading hint: if you see

  <apply-templates select="descendant::tbody"/>

forget about the mosaic impression conveyed by <-="::"/> for
a moment and read it as

  Apply templates to the selection of descendant tbodies.

Or read the above aloud. I believe that XSL might look nightmarish
to those who expect few characters on a PL text page, but
if you actually read the text, it is quite natural :-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-23  9:47                                               ` Georg Bauhaus
@ 2005-06-23 10:34                                                 ` Dmitry A. Kazakov
  2005-06-23 11:37                                                   ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-23 10:34 UTC (permalink / raw)

On Thu, 23 Jun 2005 11:47:10 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
>> On Thu, 23 Jun 2005 00:24:30 +0200, Georg Bauhaus wrote:
>> 
>>>Dmitry A. Kazakov wrote:
>>>
>>>>But for data exchange there are better techniques than XML.
>>>
>>>Such as ...?
>> 
>> Take any middleware available.
> 
> Uhm, yes, such as ...?

CORBA, OPC (hmm), RPC etc, Ada.Streams after all.

>> No, it is still one interval that contains the true number. This is the way 
>> floating-point arithmetic functions. The result of a+b is c, such that 
>> [c'Pred, c'Succ] contains the exact result. [*] The problem is that 'Pred 
>> and 'Succ are of course machine dependent. So when you send c you should 
>> also convey the range. Depending on that the receiver should chose an 
>> appropriate internal representation for c, which might require a "true" 
>> interval.
> 
> This amounts to specifying the precise details of a floating
> point computation in a data stream; a rather special case I think.
> Take for example prices, guesstimates of future price changes,

Those are fixed point with the problems of their own. You are bound to a
definite radix, because all values need to be exact.

> insurance rates, direction of tomorrow's winds, day temperature,
> and the like.

These are fuzzy numbers. They are characterized by a distribution of
possible values. You need more than one value here. In natural languages we
are using "approximately 3.15", "between 3 and 4", "close to 5" etc.

> It seems quite enough to transmit one fpt
> number literal in these cases.

You mean a decimal literal for the case where a fixed-point decimal number
is expected. (:-))

>> Never use UDP, and you'll have no problems with that! (:-)) But seriously, 
>> do you really want to collapse all OSI levels into one big mess
> 
> No.
> 
>> and make an 
>> application responsible for error correction?
> 
> How can hard/software at OSI levels guarantee correct data?
> As soon as there is something real in there (i.e., real software,
> real hardware, real interference, humans, ...), degradation
> is possible.

That is true, but error correction codes (Hamming etc) are *known* to be
optimal. This is a hard mathematical fact. So any bandwidth available
should be invested there rather than at the application level in fancy
things like </> brackets. Further we should never mix this class of errors
with ones made by humans while writing and reading texts. These errors have
completely different nature. The first ones should be eliminated on the
transport level. The application level should consider all data free of any
errors of this kind.

>> [...]
> [XSL transformation]
> 
>> And why this nightmare cannot be written in Ada?
> 
> Because it would explode even more.

I don't believe it! (:-))

> Reading hint: if you see
> 
>   <apply-templates select="descendant::tbody"/>
> 
> forget about the mosaic impression conveyed by <-="::"/> for
> a moment and read it as
> 
>   Apply templates to the selection of descendant tbodies.
> 
> Or read the above aloud. I believe that XSL might look nightmarish
> to those who expect few characters on a PL text page, but
> if you actually read the text, it is quite natural :-)

I think this is an essence of misunderstanding of the readability issue.
You cannot control your perception. It isn't programmable. You can only
train yourself to ignore the loathing your "hardware" generates while
seeing XML! No matter how good you could be in that, it will cost you many
extra "CPU cycles" in any case. I prefer to spare my cycles. So I vote for
Ada and plain tables. (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-23 10:34                                                 ` Dmitry A. Kazakov
@ 2005-06-23 11:37                                                   ` Georg Bauhaus
  2005-06-23 12:59                                                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-06-23 11:37 UTC (permalink / raw)


Dmitry A. Kazakov wrote:
> On Thu, 23 Jun 2005 11:47:10 +0200, Georg Bauhaus wrote:

>>>>Dmitry A. Kazakov wrote:
>>>>>But for data exchange there are better techniques than XML.
>>>>
>>>>Such as ...?
>>>
>>>Take any middleware available.
>>
>>Uhm, yes, such as ...?
> 
> 
> CORBA, OPC (hmm), RPC etc, Ada.Streams after all.

How does one debug data passed using RPC etc, for example
when the P could not be called due to some data error?



> That is true, but error correction codes (Hamming etc) are *known* to be
> optimal. This is a hard mathematical fact. So any bandwidth available
> should be invested there rather than at the application level in fancy
> things like </> brackets.

Here is an interesting point. SGML comes with SDIF (not the digital
sound thing, but SGML Document Interchange Format). SDIF by default
is kind of defined in ASN.1. So there are actually two layers of
data...


>>[XSL transformation]
>>
>>
>>>And why this nightmare cannot be written in Ada?
>>
>>Because it would explode even more.
> 
> 
> I don't believe it! (:-))

Some XSL "primitives" are quite powerful, when compared
to Ada "primitives".


> So I vote for
> Ada and plain tables. (:-))

For your eyes' pleasure yes, for the robust transmission of non-regular
data in a heterogenous uncontrolled setting, no. ;)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-23 11:37                                                   ` Georg Bauhaus
@ 2005-06-23 12:59                                                     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-06-23 12:59 UTC (permalink / raw)

On Thu, 23 Jun 2005 13:37:20 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:

>> CORBA, OPC (hmm), RPC etc, Ada.Streams after all.
> 
> How does one debug data passed using RPC etc, for example
> when the P could not be called due to some data error?

There should be no difference to debugging conventional calls, objects
(depending on the paradigm.) I would readily agree that available
middlewares aren't that good. We have our own, just because CORBA and OPC
don't fulfill our requirements.

Returning to the point. In our middleware you can monitor each byte sent or
received for any hardware interface. It is an integrated functionality. You
can also see a summary of how pieces these raw data were interpreted as the
application level values: velocity, temperature etc. Viewing some complex
hierarchical structures was never requested. Maybe because there is no any
(:-)), but largely because when it comes to debugging, timings and
relationships *between* values are of much greater importance. Typically
there is some periodic activity that involves values x1, x2, ..., xN, say,
each 10ms. The range checking subsystem reports that x34 violates its
bounds. That happens once per hour [rather an easy case, in one real case
it was once per 3 months.] So, you turn logging on, and try to analyze the
system state around these points. That's nasty! Protocol errors are trivial
compared to that. Honestly, I cannot remember any difficult case, though we
are supporting many quite strange devices. It might sound as an anecdote,
but in one case we indeed used print-outs read from a serial port! There
was no other way to access the device data, than through its serial
printer. We were lucky that the printer wasn't used in the graphic mode...
(:-))

The situation would change if more complex (OO) structures were involved.
But I don't think that XML would be an answer. I would prefer an OO-ish
approach where each object knows how to construct itself out of a segment
of raw data.

> Some XSL "primitives" are quite powerful, when compared
> to Ada "primitives".

There is nothing more powerful than a call to the procedure Do_It! (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-23  9:03                                             ` Dmitry A. Kazakov
  2005-06-23  9:47                                               ` Georg Bauhaus
@ 2005-06-23 14:16                                               ` Marc A. Criley
  1 sibling, 0 replies; 68+ messages in thread
From: Marc A. Criley @ 2005-06-23 14:16 UTC (permalink / raw)

Dmitry A. Kazakov wrote:
> On Thu, 23 Jun 2005 00:24:30 +0200, Georg Bauhaus wrote:
> 
>>Dmitry A. Kazakov wrote:
>>
>>>But for data exchange there are better techniques than XML.
>>
>>Such as ...?
> 
> Take any middleware available.

This exchange seems analagous to:

"For information exchange there are better techniques than conversing in 
English."

"Such as ...?"

"A Telephone."

:-)

At some point, somewhere, the data has to be put on the wire using some 
defined format--XML, application specific raw binary, eXternal Data 
Representation (XDR), etc.  Don't conflate the mechanism of data 
transfer, be it CORBA, RPC, Ada Streams, whatever, with the 
representation of the data.

Marc A. Criley
www.mckae.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-20 18:54                             ` Dmitry A. Kazakov
  2005-06-21  9:24                               ` Georg Bauhaus
@ 2005-06-25 16:38                               ` Simon Wright
  1 sibling, 0 replies; 68+ messages in thread
From: Simon Wright @ 2005-06-25 16:38 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> <Distance km='3.15'/ >
>
> XML adds here nothing, but a huge readability loss.

More likely to write <Distance unit="km">3.15</Distance>

Anyway, XML is a means for programs to communicate, not people ..



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-16  9:55           ` Jacob Sparre Andersen
  2005-06-16 10:53             ` Marius Amado Alves
@ 2005-06-30  3:02             ` Randy Brukardt
  2005-06-30 18:43               ` Jacob Sparre Andersen
  2005-06-30 19:24               ` Björn Persson
  1 sibling, 2 replies; 68+ messages in thread
From: Randy Brukardt @ 2005-06-30  3:02 UTC (permalink / raw)

"Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message
news:m2k6ku8w2s.fsf@hugin.crs4.it...
> Randy Brukardt wrote:
>
> > I may be dense, but isn't this the purpose of XML? If so, why
> > reinvent the wheel?
>
> The purpose of XML is to be _the_ universal file format.
>
>  a) I don't want a universal file format.
>
>  b) I don't believe in a universal file format.
>
>  c) XML is (almost) less readable than a binary file my purposes.
>
>  d) I'm _not_ going to switch away from tabulator separated tables for
>     purposes, where tabulator separated tables are a sensible
>     representation of the data in textual form.
>
> > (I personally think XML is way overused, more because it *can* be
> > used than that it is worthwhile for the application. But this seems
> > to be exactly the application that it was designed for. You'll end
> > up with something like XML eventually anyway, why not start with
> > it?)
>
> I'm afraid you completely misunderstood my problem.  It is not a
> matter of a selecting a file format.  It is the matter of
> automagically generating code for reading and writing that file
> format.

Not at all. We like to say around here that you need to describe what your
needs are, because often the program you are trying to write isn't
appropriate for Ada. We usually use that for people trying to write C in
Ada, but it should apply to everyone. :-)

For program-to-program communication, there really are only two sensible
options. If both ends are under your control, then using a binary format
(with versioning and error detection if needed) is preferable, because it
has the least overhead and there is no need for data conversion. This
certainly is the only option with reasonable performance. And this is
usually the appropriate choice.

OTOH, if the performance of the connection isn't critical, then using a
well-known standard format that already has needed tools for it seems like
the best option. Even if you don't currently need to allow access by other
systems, you're leaving the door open for future programs outside your
system to use the data.

The cases that are neither of these and thus would make sense to use some
internal, non-portable text format are essentially non-existent.

Note that human readability of program-to-program data is a non-issue.
Indeed, it is a mistake to try to bring that into the equation, as it adds a
huge amount of overhead to the task. I've always used agile methods for
debugging such data: if, in fact, I need to examine such a data stream, I'm
write a program to display it. But I don't worry about that until/unless the
need arises. It often does not arise, and even when it does, it's often not
necessary to be able to display everything -- and it's often better to write
a monitor for an interesting condition than filling a disk with 10 GB of
text!

So, all in all, I think you're trying to solve the wrong problem (finding a
way to write a specific file format), rather than using an appropriate file
format for Ada programs (usually binary).

But, as a friend of mine likes to say, "do what you want, because you will
anyway!". :-)

                  Randy.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-30  3:02             ` Randy Brukardt
@ 2005-06-30 18:43               ` Jacob Sparre Andersen
  2005-07-01  1:22                 ` Randy Brukardt
  2005-06-30 19:24               ` Björn Persson
  1 sibling, 1 reply; 68+ messages in thread
From: Jacob Sparre Andersen @ 2005-06-30 18:43 UTC (permalink / raw)

Randy Brukardt wrote:
> "Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message
> news:m2k6ku8w2s.fsf@hugin.crs4.it...
> > Randy Brukardt wrote:
> >
> > > I may be dense, but isn't this the purpose of XML? If so, why
> > > reinvent the wheel?
> >
> > The purpose of XML is to be _the_ universal file format.
> >
> >  a) I don't want a universal file format.
> >
> >  b) I don't believe in a universal file format.
> >
> >  c) XML is (almost) less readable than a binary file my purposes.
> >
> >  d) I'm _not_ going to switch away from tabulator separated tables
> >     for purposes, where tabulator separated tables are a sensible
> >     representation of the data in textual form.
> >
> > > (I personally think XML is way overused, more because it *can*
> > > be used than that it is worthwhile for the application. But this
> > > seems to be exactly the application that it was designed
> > > for. You'll end up with something like XML eventually anyway,
> > > why not start with it?)
> >
> > I'm afraid you completely misunderstood my problem.  It is not a
> > matter of a selecting a file format.  It is the matter of
> > automagically generating code for reading and writing that file
> > format.
> 
> Not at all. We like to say around here that you need to describe
> what your needs are, because often the program you are trying to
> write isn't appropriate for Ada. We usually use that for people
> trying to write C in Ada, but it should apply to everyone. :-)

I thought I had specified my needs.  But in case I forgot:

 a) A format for storing experimental data in tabular form.

 b) A format I easily can manipulate with my standard Unix toolbox.

 c) A format I easily can read and get an overview of (sections of)
    the data.

 d) A format that easily can be imported into programs I'm not in
    control of.  (concrete examples are Gnuplot, R, OOo Calc and
    Excel)

 e) A format I easily can read and write from my own programs.

Tabulator separated text files handle this quite fine (although OOo
and Excel users have to be careful about their number format settings
when they import the files).

> For program-to-program communication, there really are only two
> sensible options. If both ends are under your control, then using a
> binary format (with versioning and error detection if needed) is
> preferable, because it has the least overhead and there is no need
> for data conversion.

Yes.  But this doesn't handle b), c) and d).

> OTOH, if the performance of the connection isn't critical, then
> using a well-known standard format that already has needed tools for
> it seems like the best option. Even if you don't currently need to
> allow access by other systems, you're leaving the door open for
> future programs outside your system to use the data.

And which formats, besides tabulator separated text files, handle the
requirements?  XML doesn't handle b), c), d) and e).

> The cases that are neither of these and thus would make sense to use
> some internal, non-portable text format are essentially
> non-existent.

I think I have one of these "essentially non-existent" cases.  And
almost everything I do seems to be one of those cases.

> Note that human readability of program-to-program data is a
> non-issue.

You're apparently working in a very different area than I am.  Almost
all data going from one program to another should also be available in
a human-readable format.  My work is to look at data, not to program.
The programs are just written to process the data from one form into
another form - which hopefully can teach us something new and
interesting.

> Indeed, it is a mistake to try to bring that into the equation, as
> it adds a huge amount of overhead to the task. I've always used
> agile methods for debugging such data: if, in fact, I need to
> examine such a data stream, I'm write a program to display it. But I
> don't worry about that until/unless the need arises.

It seems that you're a programmer and not a researcher.  I am (almost)
always interested in the data.  I have yet to run into a case where I
wasn't interested in seeing the output of a program.

> It often does not arise, and even when it does, it's often not
> necessary to be able to display everything -- and it's often better
> to write a monitor for an interesting condition than filling a disk
> with 10 GB of text!

I would spend all my time writing monitors that way.

> So, all in all, I think you're trying to solve the wrong problem
> (finding a way to write a specific file format), rather than using
> an appropriate file format for Ada programs (usually binary).

It may be a long-time bad habit to use tabulator separated text files
for (intermediate) analysis results from experiments, but I haven't
found a convincing argument yet. -- If I could auto-generate the
monitor and the conversion programs to the programs I interact with,
then I might be convinced, but I would still have to hack some type
checking on top of Ada.Sequential_IO.  And the program for
auto-generating the export to Gnuplot would practically be identical
to the one I asked for initially anyway.

> But, as a friend of mine likes to say, "do what you want, because
> you will anyway!". :-)

A clever friend. :-)

Jacob
-- 
"Hungh. You see! More bear. Yellow snow is always dead give-away."

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-30  3:02             ` Randy Brukardt
  2005-06-30 18:43               ` Jacob Sparre Andersen
@ 2005-06-30 19:24               ` Björn Persson
  2005-07-01  0:54                 ` Randy Brukardt
  1 sibling, 1 reply; 68+ messages in thread
From: Björn Persson @ 2005-06-30 19:24 UTC (permalink / raw)


Randy Brukardt wrote:
> OTOH, if the performance of the connection isn't critical, then using a
> well-known standard format that already has needed tools for it seems like
> the best option.

I consider text/tab-separated-values a standard format. Whether it's 
well-known is debatable. The definition is here:
http://www.iana.org/assignments/media-types/text/tab-separated-values

I'm not going to try to decide whether it's the right choice for Jacob.

-- 
Bjï¿½rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-30 19:24               ` Björn Persson
@ 2005-07-01  0:54                 ` Randy Brukardt
  2005-07-01 21:36                   ` TSV and CSV Björn Persson
  2005-07-02  0:07                   ` Data table text I/O package? Georg Bauhaus
  0 siblings, 2 replies; 68+ messages in thread
From: Randy Brukardt @ 2005-07-01  0:54 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

"Bj�rn Persson" <spam-away@nowhere.nil> wrote in message
news:bGXwe.141215$dP1.494536@newsc.telia.net...
> Randy Brukardt wrote:
> > OTOH, if the performance of the connection isn't critical, then using a
> > well-known standard format that already has needed tools for it seems
like
> > the best option.
>
> I consider text/tab-separated-values a standard format. Whether it's
> well-known is debatable. The definition is here:
> http://www.iana.org/assignments/media-types/text/tab-separated-values

Never heard of this one. It seems like the world's worst choice for a file
format, since the first thing any decent text tool will do is discard any
tabs. I'm amazed that anyone would actually standardize such junk. I'm much
more familiar with CSV files (which also seemed pretty silly to me, but I
kinda think the entire data-in-a-text file thing is pretty silly).

                    Randy.








^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-06-30 18:43               ` Jacob Sparre Andersen
@ 2005-07-01  1:22                 ` Randy Brukardt
  2005-07-01  3:01                   ` Alexander E. Kopilovich
  0 siblings, 1 reply; 68+ messages in thread
From: Randy Brukardt @ 2005-07-01  1:22 UTC (permalink / raw)


"Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message
news:m2br5nd6sk.fsf@hugin.crs4.it...
replying to me:

...
> I thought I had specified my needs.  But in case I forgot:
>
>  a) A format for storing experimental data in tabular form.
>
>  b) A format I easily can manipulate with my standard Unix toolbox.
>
>  c) A format I easily can read and get an overview of (sections of)
>     the data.
>
>  d) A format that easily can be imported into programs I'm not in
>     control of.  (concrete examples are Gnuplot, R, OOo Calc and
>     Excel)
>
>  e) A format I easily can read and write from my own programs.
>
> Tabulator separated text files handle this quite fine (although OOo
> and Excel users have to be careful about their number format settings
> when they import the files).

Perhaps. But it's your "needs" that I question. (b) for instance doesn't
really buy anything, as you can't do any *real* data transformations that
way. Sure, you can add or delete a column, but that's trivial to code in the
unusual case that you need it. And in about the same time that a text
processing tool could do that job.

As far as (c) goes, I don't believe that mixing human output with data
storage/transmission is a good idea. Period.

So that leaves us with (a), (d), and (e). [Certainly real requirements.]

> > For program-to-program communication, there really are only two
> > sensible options. If both ends are under your control, then using a
> > binary format (with versioning and error detection if needed) is
> > preferable, because it has the least overhead and there is no need
> > for data conversion.
>
> Yes.  But this doesn't handle b), c) and d).

Of course it doesn't handle (d) [because (d) violates the premise]. And as
mentioned above, I don't think (b) and (c) should even be goals.

> > OTOH, if the performance of the connection isn't critical, then
> > using a well-known standard format that already has needed tools for
> > it seems like the best option. Even if you don't currently need to
> > allow access by other systems, you're leaving the door open for
> > future programs outside your system to use the data.
>
> And which formats, besides tabulator separated text files, handle the
> requirements?  XML doesn't handle b), c), d) and e).

Certainly (e) is handled by using tools like XMLOUT. (It can't be much
harder to write than HTML, which is trivial.) I'd be surprised if most
modern tools that can handle CSV couldn't handle a simlar XML file.
(Certainly Excel can read XML files.). And I don't want to sound like a
broken record about (b) and (c).

> > The cases that are neither of these and thus would make sense to use
> > some internal, non-portable text format are essentially non-existent.
>
> I think I have one of these "essentially non-existent" cases.  And
> almost everything I do seems to be one of those cases.

Could be, but I think it is because you have a bogus set of requirements.

> > Note that human readability of program-to-program data is a
> > non-issue.
>
> You're apparently working in a very different area than I am.  Almost
> all data going from one program to another should also be available in
> a human-readable format.  My work is to look at data, not to program.
> The programs are just written to process the data from one form into
> another form - which hopefully can teach us something new and
> interesting.

I hate to split hairs, but I think your job is to analyze data, not to "look
at data". If there is enough data to make sense processing it with a
program, there is little point at looking at it manually. You had mentioned
a large data set (50 MB?) earlier; I hope you're looking at the analysis,
not at the data. I hardly ever look at raw web logs (the closest analog I
have); I use a program and look at the results of its analysis.

Truthfully, if what you described above is true, you probably ought to be
programming in Perl (ugh) or Python. Because Ada's text processing is its
weak link, and it makes little sense to write any significant amount of text
processing code in Ada. (I say that, despite the fact that I do exactly
that -- but that's because I use Ada for everything that I can't do with a
simple batch file.)

> > Indeed, it is a mistake to try to bring that into the equation, as
> > it adds a huge amount of overhead to the task. I've always used
> > agile methods for debugging such data: if, in fact, I need to
> > examine such a data stream, I'm write a program to display it. But I
> > don't worry about that until/unless the need arises.
>
> It seems that you're a programmer and not a researcher.  I am (almost)
> always interested in the data.  I have yet to run into a case where I
> wasn't interested in seeing the output of a program.

Sure, but the output of the program is an analysis of the data, not some raw
(and huge) data stream.

> > It often does not arise, and even when it does, it's often not
> > necessary to be able to display everything -- and it's often better
> > to write a monitor for an interesting condition than filling a disk
> > with 10 GB of text!
>
> I would spend all my time writing monitors that way.

Yes, formatting imput/results usefully for humans is the hard part of
programming. Documentation, GUI input/output, and log files (that is, the
stuff for humans) take up approximately 4 times as much time to create as
the actual filter for our spam filter. For our compiler, (which needs little
documentation or specialized I/O), it was always much less, but it still is
a significant part (perhaps as much as half) of the effort. The other tools
(CLAW, the web log analyzer, the web server, etc.) all have fallen somewhere
in between those extremes -- but that's the real job that we get paid for
(because its not fun and not interesting -- someone will do the fun and
interesting stuff for free, but not the hard work, most of the time anyway).

Like I said before, your mileage may differ. If you're stuck with lame tools
that can't process a sane data format, it might make sense to use some junk
text format to match it. (I'd rather get better tools, but I realize that
isn't always possible.) But I'd hardly expect any help in creating such
stuff.

                       Randy.






^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-01  1:22                 ` Randy Brukardt
@ 2005-07-01  3:01                   ` Alexander E. Kopilovich
  2005-07-01  5:59                     ` Jeffrey Carter
  2005-07-02  1:54                     ` Randy Brukardt
  0 siblings, 2 replies; 68+ messages in thread
From: Alexander E. Kopilovich @ 2005-07-01  3:01 UTC (permalink / raw)
  To: comp.lang.ada

Randy Brukardt wrote:

> If there is enough data to make sense processing it with a
> program, there is little point at looking at it manually.

It would be fine if you said "then I see" instead of "there is".
How do you know what is there, as you aren't a scientist, but a software
engineer, regardless of you professional skills in your domain?

You obviosly don't like data very much, but for a scientist that scientific
data (often including raw experimental data) is one of the most valuable
things. It certainly deserves attentive look (at least, from time to time),
not just a bureacratic "analysis".

> Truthfully, if what you described above is true, you probably ought to be
> programming in Perl (ugh) or Python. Because Ada's text processing is its
> weak link, and it makes little sense to write any significant amount of text
> processing code in Ada.

It would be interesting to hear reply from Robert Dewar to this opinion about
text processing capabilities of Ada -:) . Actually, serious text processing
is perfectly possible with Ada, and in fact Ada is more suitable for it than
Perl. Ada is unsuitable for quick scripting (especially by novice), but it
is true for all application domains, it is true for numerical computations
as well as for text processing .

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-01  3:01                   ` Alexander E. Kopilovich
@ 2005-07-01  5:59                     ` Jeffrey Carter
  2005-07-02  1:54                     ` Randy Brukardt
  1 sibling, 0 replies; 68+ messages in thread
From: Jeffrey Carter @ 2005-07-01  5:59 UTC (permalink / raw)


Alexander E. Kopilovich wrote:
> 
> You obviosly don't like data very much, but for a scientist that scientific
> data (often including raw experimental data) is one of the most valuable
> things. It certainly deserves attentive look (at least, from time to time),
> not just a bureacratic "analysis".

I developed SW for a researcher for a while once. What I found 
interesting was that what I, as a SW engineer, wanted to hide was 
usually what he wanted to see.

-- 
Jeff Carter
"We call your door-opening request a silly thing."
Monty Python & the Holy Grail
17



^ permalink raw reply	[flat|nested] 68+ messages in thread

* TSV and CSV
  2005-07-01  0:54                 ` Randy Brukardt
@ 2005-07-01 21:36                   ` Björn Persson
  2005-07-01 22:08                     ` Martin Dowie
  2005-07-02  0:07                   ` Data table text I/O package? Georg Bauhaus
  1 sibling, 1 reply; 68+ messages in thread
From: Björn Persson @ 2005-07-01 21:36 UTC (permalink / raw)

Randy Brukardt wrote:
> "Bjï¿½rn Persson" <spam-away@nowhere.nil> wrote in message
> news:bGXwe.141215$dP1.494536@newsc.telia.net...
>>I consider text/tab-separated-values a standard format. Whether it's
>>well-known is debatable. The definition is here:
>>http://www.iana.org/assignments/media-types/text/tab-separated-values
> 
> Never heard of this one. It seems like the world's worst choice for a file
> format, since the first thing any decent text tool will do is discard any
> tabs.

Well, such a tool isn't the right tool for manipulating TSV files. As 
always, use the right tool for the job.

The only tool I can think of right now that discards tabs is a web 
browser, and that's when it thinks the content type is text/html.

> I'm much more familiar with CSV files

CSV works as long as there are no commas in the data fields, but commas 
can occur in text fields, and comma is also the decimal sign in large 
parts of the world (and preferred in ISO documents, I hear). TSV works 
in these cases, as there's usually no need to allow tabs in the fields. 
(If you find that you want tabs inside the data fields, then it's 
probably time to look for a more sophisticated file format ï¿½ perhaps XML 
based.)

-- 
Bjï¿½rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: TSV and CSV
  2005-07-01 21:36                   ` TSV and CSV Björn Persson
@ 2005-07-01 22:08                     ` Martin Dowie
  2005-07-02  0:05                       ` Georg Bauhaus
  0 siblings, 1 reply; 68+ messages in thread
From: Martin Dowie @ 2005-07-01 22:08 UTC (permalink / raw)


Bjï¿½rn Persson wrote:
> CSV works as long as there are no commas in the data fields, but commas 
> can occur in text fields, and comma is also the decimal sign in large 
> parts of the world (and preferred in ISO documents, I hear). TSV works 
> in these cases, as there's usually no need to allow tabs in the fields. 
> (If you find that you want tabs inside the data fields, then it's 
> probably time to look for a more sophisticated file format ï¿½ perhaps XML 
> based.)

If you want commas in the data fields, simply wrap the data fields in 
quotes, e.g.

"1","alpha, beta, gamma","foo"

-- Martin



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: TSV and CSV
  2005-07-01 22:08                     ` Martin Dowie
@ 2005-07-02  0:05                       ` Georg Bauhaus
  2005-07-02  1:10                         ` Randy Brukardt
  0 siblings, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-07-02  0:05 UTC (permalink / raw)


Martin Dowie wrote:

> If you want commas in the data fields, simply wrap the data fields in 
> quotes, e.g.
> 
> "1","alpha, beta, gamma","foo"

You can't be seriously sugggesting this?
"If you want quotes in the fields..."


-- Georg



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-01  0:54                 ` Randy Brukardt
  2005-07-01 21:36                   ` TSV and CSV Björn Persson
@ 2005-07-02  0:07                   ` Georg Bauhaus
  2005-07-02  1:21                     ` Randy Brukardt
  1 sibling, 1 reply; 68+ messages in thread
From: Georg Bauhaus @ 2005-07-02  0:07 UTC (permalink / raw)


Randy Brukardt wrote:

> I
> kinda think the entire data-in-a-text file thing is pretty silly).

It's not that silly when your data is actually text, though.


-- Georg



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: TSV and CSV
  2005-07-02  0:05                       ` Georg Bauhaus
@ 2005-07-02  1:10                         ` Randy Brukardt
  2005-07-02  1:20                           ` Ed
  2005-07-03  9:08                           ` Georg Bauhaus
  0 siblings, 2 replies; 68+ messages in thread
From: Randy Brukardt @ 2005-07-02  1:10 UTC (permalink / raw)

"Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
news:42c5e46e$0$10818$9b4e6d93@newsread4.arcor-online.net...
> Martin Dowie wrote:
>
> > If you want commas in the data fields, simply wrap the data fields in
> > quotes, e.g.
> >
> > "1","alpha, beta, gamma","foo"
>
> You can't be seriously sugggesting this?

Of course he's seriously suggesting this, it's how these files work.

> "If you want quotes in the fields..."

You escape them, I forget how. (There is a standard for CSV files.) Same as
you would do in Ada or any other language.There's no place that you can put
quotes without some sort of escape (I usually use the Ada syntax if I'm
inventing my own) if you plan to read them afterwards.

But in any case, if your data is at all complex, you are going to need
complex reading/writing to handle it. The original query was about tables of
numbers, not random sequences of characters. Pretty much any format can be
made to work for that.

                   Randy.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: TSV and CSV
  2005-07-02  1:10                         ` Randy Brukardt
@ 2005-07-02  1:20                           ` Ed
  2005-07-03  9:08                           ` Georg Bauhaus
  1 sibling, 0 replies; 68+ messages in thread
From: Ed @ 2005-07-02  1:20 UTC (permalink / raw)


On 01/07/2005 6:10 PM, Randy Brukardt wrote:
> You escape them, I forget how. (There is a standard for CSV files.) Same as

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm has a good 
description of the file format.

Ed.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-02  0:07                   ` Data table text I/O package? Georg Bauhaus
@ 2005-07-02  1:21                     ` Randy Brukardt
  0 siblings, 0 replies; 68+ messages in thread
From: Randy Brukardt @ 2005-07-02  1:21 UTC (permalink / raw)

"Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
news:42c5e4de$0$10818$9b4e6d93@newsread4.arcor-online.net...
> Randy Brukardt wrote:
>
> > I
> > kinda think the entire data-in-a-text file thing is pretty silly).
>
> It's not that silly when your data is actually text, though.

That would be one of the weird special cases. It's not that unusual to have
some text components in your data, but for *all* of the data to be text is
pretty rare. For instance, in a web log, the URL is text, but most of the
other components (access time, result code, IP address) are really values of
one sort or another. They can be represented as text, but any significant
manipulation ought to be done on the value (strongly typed, if you're using
Ada).

And I generally don't think of raw text (like an e-mail message) as data.
Text and data are usually different things. You're welcome to view text as
data if you want, but that's not at all the sorts of applications that I've
been thinking about here.

                              Randy.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-01  3:01                   ` Alexander E. Kopilovich
  2005-07-01  5:59                     ` Jeffrey Carter
@ 2005-07-02  1:54                     ` Randy Brukardt
  2005-07-02 10:24                       ` Dmitry A. Kazakov
  1 sibling, 1 reply; 68+ messages in thread
From: Randy Brukardt @ 2005-07-02  1:54 UTC (permalink / raw)

"Alexander E. Kopilovich" <aek@VB1162.spb.edu> wrote in message
news:mailman.122.1120188122.17633.comp.lang.ada@ada-france.org...
> Randy Brukardt wrote:
...
> You obviosly don't like data very much, but for a scientist that
scientific
> data (often including raw experimental data) is one of the most valuable
> things. It certainly deserves attentive look (at least, from time to
time),
> not just a bureacratic "analysis".

I'm not speaking about data in general (that would be silly), but about it
in the context of Ada programming. (Or have you forgotten the purpose of
this newsgroup?)

It makes perfect sense to look at raw data if you don't know what to analyze
for and you need to find some patterns to give some insight. I suppose there
also is an amount of idle curiousity, too (certainly that happens to me in
these sorts of circumstances -- that's why I might look at web logs or the
results of a game analysis). But I hardly think it makes sense to design
software based on idle curiousity.

And if you don't know what you are analyzing for, Ada is hardly the
programming language to be using. (Unless you're a hard-core Ada nut [a
category that I qualify in]; but then you hardly need advice from this
group.) You need a much more dynamic language, perhaps even those Unix
filters. Its quite possible that Jacob shouldn't be using Ada at all for his
tasks, and thus he's trying to fit a square peg into a round hole.

> > Truthfully, if what you described above is true, you probably ought to
be
> > programming in Perl (ugh) or Python. Because Ada's text processing is
its
> > weak link, and it makes little sense to write any significant amount of
text
> > processing code in Ada.
>
> It would be interesting to hear reply from Robert Dewar to this opinion
about
> text processing capabilities of Ada -:) . Actually, serious text
processing
> is perfectly possible with Ada, and in fact Ada is more suitable for it
than
> Perl. Ada is unsuitable for quick scripting (especially by novice), but it
> is true for all application domains, it is true for numerical computations
> as well as for text processing .

Certainly, serious text processing is *possible* in Ada. (My Trash Finder
spam filter certainly is an extensive text processing application!!) And of
course, the benefits of Ada do apply (particular type checking and good
runtime checks). But, Ada text processing code is just painful to write, and
it's quite hard to read. That's true no matter whether you use plain strings
or unbounded strings.

One of my original intents with TF was to show a good example of Ada code to
non-Ada programmers. But the code got so long-winded that I gave up on that
idea fairly early on. Moreover, the standard routines in
Ada.Strings.Unbounded were just not fast enough in some cases, and I had to
write special routines that understand the internal representation of an
unbounded string. Yuck. (Ada 200Y will help this a bit, at least the
searching has been improved.)

For instance, there isn't a way to search for an unbounded string in another
unbounded string. [TF puts pretty much everything into lists of unbounded
strings, because it's impossible to predict what sort of string lengths
items will have.] You have to use To_String to convert to a regular string,
which is ugly (especially without use clauses):

     if Ada.Strings.Unbounded.Index (Ada.Strings.Unbounded.Translate
(Current.Line, Ada.Strings.Maps.Constants.Lower_Case_Map),
Ada.Strings.Unbounded.To_String (Pattern.Line)) /= 0 then

Even with a use clause for Ada.Strings.Unbounded (in which case you can't
have one for Ada.Strings.Fixed, else things get very ambiguous):
     if Index (Translate (Current.Line,
Ada.Strings.Maps.Constants.Lower_Case_Map), To_String (Pattern.Line) /= 0
then

So, it's possible to write this sort of code in Ada, and get decent
performance, too, but the result isn't particularly readable,
understandable, or maintainable. It's a lot easier to write this in Perl,
although the result would probably be a bit harder to maintain. Not having
used Python, I can't say for sure, but I'd certainly hope that it would be
easier that this to write (and read!) something simple like a
case-insensitive search for a pattern.

If I had used regular strings, the complexity would have been about the
same, just in different places. (In hindsight, I probably wouldn't have used
unbounded strings at all, they just didn't buy enough simplification.)

So, I stand by my statements. There is more than 8,000 lines of text
processing code in TF, all of which looks like this. And all I can say is
that I certainly hope that there is a better way somewhere, even though such
a way isn't really possible for Ada.

                     Randy.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-02  1:54                     ` Randy Brukardt
@ 2005-07-02 10:24                       ` Dmitry A. Kazakov
  2005-07-06 22:04                         ` Randy Brukardt
  0 siblings, 1 reply; 68+ messages in thread
From: Dmitry A. Kazakov @ 2005-07-02 10:24 UTC (permalink / raw)


On Fri, 1 Jul 2005 20:54:18 -0500, Randy Brukardt wrote:

> And if you don't know what you are analyzing for, Ada is hardly the
> programming language to be using. (Unless you're a hard-core Ada nut [a
> category that I qualify in]; but then you hardly need advice from this
> group.) You need a much more dynamic language, perhaps even those Unix
> filters.

I think it depends. I have a quite opposite experience. I'm lazy and always
start to write a UNIX script. After a couple of hours fighting with that
mess I note (usually to late) that to write it in Ada (or even in ANSI C)
would take twice as short.

> For instance, there isn't a way to search for an unbounded string in another
> unbounded string. [TF puts pretty much everything into lists of unbounded
> strings, because it's impossible to predict what sort of string lengths
> items will have.] You have to use To_String to convert to a regular string,
> which is ugly (especially without use clauses):

Yes.

All built-in string types should have a common ancestor.
Ada.Strings.Unbounded was and remains an ugly hack.
 
>      if Ada.Strings.Unbounded.Index (Ada.Strings.Unbounded.Translate
> (Current.Line, Ada.Strings.Maps.Constants.Lower_Case_Map),
> Ada.Strings.Unbounded.To_String (Pattern.Line)) /= 0 then

I'm using a table of tokens instead. The string is matched against the
table for a longest token that matches. And I always use anchored search. I
tend to do everything in one pass and Ada fits here well.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: TSV and CSV
  2005-07-02  1:10                         ` Randy Brukardt
  2005-07-02  1:20                           ` Ed
@ 2005-07-03  9:08                           ` Georg Bauhaus
  1 sibling, 0 replies; 68+ messages in thread
From: Georg Bauhaus @ 2005-07-03  9:08 UTC (permalink / raw)

Randy Brukardt wrote:
> "Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
> news:42c5e46e$0$10818$9b4e6d93@newsread4.arcor-online.net...
> 
>>Martin Dowie wrote:
>>
>>
>>>If you want commas in the data fields, simply wrap the data fields in
>>>quotes, e.g.
>>>
>>>"1","alpha, beta, gamma","foo"
>>
>>You can't be seriously sugggesting this?

I was addressing the "simply" in the sentence above about wrapping
the data fields, because it only shifts the problem to the next
escaping level, which you then have mentioned.
  It's there where the problems usually start,
"simply do this, and, uhm that, and, oh, I forgot you should...".
Bottom line: We don't have standardised CSV document types.

Even considering the CSV description Ed has mentioned,
with all its buts and donts which speak for themselves...
In fact, they repeat some of the input to the XML design
discussion, which lead to a standard.

Just to make sure, it is easy to think of a (one)
set of rules for producing good CSV data. However, like
Ada programs, producing them is far less important than
using them later, from a consumption point of view.
At least if you care about the recipients at all.
When reading CSV data, you can think of more than one set
of rules, in sharp contrast to just one when producing
CSV data.

One average CSV stream we read contains no line breaks,
probably for reaons of transmission speed.
As if this weren't enough (excuse: "simply" count fields)
some fields can *contain* non-escaped separators (excuse:
"simply" inspect context to find out whether the comma is
acutally a separator...).

It is rare that I have been given a CSV file/stream to process
together with a clear description. (So maybe I'm biased.)
The streams have almost always had some hack or some
"cleverness" in them. I believe that a standardised data
format helps, in practise, to reduce undocumented hacks and
cleverness. One such format type can be based on XML.

> Of course he's seriously suggesting this, it's how these files work.

This is how these files *should* work, ideally. As you can see
on http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#FileFormat,
you still have to climb up a decision tree and visit this or that
branch in order to parse CSV data in a reliable fashion,
unless you know exactly how they are produced.

All in all you end with:

>  Pretty much any format can be
> made to work for that.

...provided you sort of reinvent the markup rules and wheels.
And disregard your own advice to use a really standardised
format (in applications not all under your control.) ;-)

-- Georg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Data table text I/O package?
  2005-07-02 10:24                       ` Dmitry A. Kazakov
@ 2005-07-06 22:04                         ` Randy Brukardt
  0 siblings, 0 replies; 68+ messages in thread
From: Randy Brukardt @ 2005-07-06 22:04 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
news:1vlfc01w9jkzj$.k4rp7yhtuoj3$.dlg@40tude.net...
...
> >      if Ada.Strings.Unbounded.Index (Ada.Strings.Unbounded.Translate
> > (Current.Line, Ada.Strings.Maps.Constants.Lower_Case_Map),
> > Ada.Strings.Unbounded.To_String (Pattern.Line)) /= 0 then
>
> I'm using a table of tokens instead. The string is matched against the
> table for a longest token that matches. And I always use anchored search.
I
> tend to do everything in one pass and Ada fits here well.

That's actually what the above is doing: a single match in a list of
patterns. It usually is inside of a loop.

Some of the matching uses a special Match_Start routine which is cheaper
than Index; but of course that only works because I know how an unbounded
string works, and Ada lets be create a child to use that information.

I'm not certain what you mean by an "anchored search", but I don't expect
that too work too well on e-mail (which is just a mass of text). I do think
it wouldn't have been any harder to have used type String items here instead
of Unbounded_String. The only reason I used Unbounded_String was to see how
easy or hard it really was to use that package - I won't make that mistake
again.

                        Randy.







^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2005-07-06 22:04 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-15  9:57 Data table text I/O package? Jacob Sparre Andersen
2005-06-15 11:43 ` Preben Randhol
2005-06-15 13:35   ` Jacob Sparre Andersen
2005-06-15 14:12     ` Preben Randhol
2005-06-15 15:02       ` Jacob Sparre Andersen
2005-06-15 16:17         ` Preben Randhol
2005-06-15 16:58           ` Dmitry A. Kazakov
2005-06-15 17:30             ` Marius Amado Alves
2005-06-15 18:41               ` Dmitry A. Kazakov
2005-06-15 19:09                 ` Marius Amado Alves
2005-06-15 18:58         ` Randy Brukardt
2005-06-16  9:55           ` Jacob Sparre Andersen
2005-06-16 10:53             ` Marius Amado Alves
2005-06-16 12:24               ` Robert A Duff
2005-06-16 14:01               ` Georg Bauhaus
2005-06-16 12:27                 ` Dmitry A. Kazakov
2005-06-16 14:46                   ` Georg Bauhaus
2005-06-16 14:51                     ` Dmitry A. Kazakov
2005-06-20 11:19                       ` Georg Bauhaus
2005-06-20 11:39                         ` Dmitry A. Kazakov
2005-06-20 18:25                           ` Georg Bauhaus
2005-06-20 18:45                             ` Preben Randhol
2005-06-20 18:54                             ` Dmitry A. Kazakov
2005-06-21  9:24                               ` Georg Bauhaus
2005-06-21  9:52                                 ` Jacob Sparre Andersen
2005-06-21 11:10                                   ` Georg Bauhaus
2005-06-21 12:35                                     ` Jacob Sparre Andersen
2005-06-21 10:42                                 ` Dmitry A. Kazakov
2005-06-21 11:41                                   ` Georg Bauhaus
2005-06-21 12:44                                     ` Dmitry A. Kazakov
2005-06-21 21:01                                       ` Georg Bauhaus
2005-06-22 12:15                                         ` Dmitry A. Kazakov
2005-06-22 22:24                                           ` Georg Bauhaus
2005-06-23  9:03                                             ` Dmitry A. Kazakov
2005-06-23  9:47                                               ` Georg Bauhaus
2005-06-23 10:34                                                 ` Dmitry A. Kazakov
2005-06-23 11:37                                                   ` Georg Bauhaus
2005-06-23 12:59                                                     ` Dmitry A. Kazakov
2005-06-23 14:16                                               ` Marc A. Criley
2005-06-25 16:38                               ` Simon Wright
2005-06-16 13:26                 ` Marius Amado Alves
2005-06-16 18:10                   ` Georg Bauhaus
2005-06-30  3:02             ` Randy Brukardt
2005-06-30 18:43               ` Jacob Sparre Andersen
2005-07-01  1:22                 ` Randy Brukardt
2005-07-01  3:01                   ` Alexander E. Kopilovich
2005-07-01  5:59                     ` Jeffrey Carter
2005-07-02  1:54                     ` Randy Brukardt
2005-07-02 10:24                       ` Dmitry A. Kazakov
2005-07-06 22:04                         ` Randy Brukardt
2005-06-30 19:24               ` Björn Persson
2005-07-01  0:54                 ` Randy Brukardt
2005-07-01 21:36                   ` TSV and CSV Björn Persson
2005-07-01 22:08                     ` Martin Dowie
2005-07-02  0:05                       ` Georg Bauhaus
2005-07-02  1:10                         ` Randy Brukardt
2005-07-02  1:20                           ` Ed
2005-07-03  9:08                           ` Georg Bauhaus
2005-07-02  0:07                   ` Data table text I/O package? Georg Bauhaus
2005-07-02  1:21                     ` Randy Brukardt
     [not found]     ` <20050615141236.GA90053@pvv.org>
2005-06-15 15:40       ` Marius Amado Alves
2005-06-15 19:18         ` Oliver Kellogg
2005-06-17  9:02           ` Jacob Sparre Andersen
     [not found]       ` <7adf1648bb99ca2bb4055ed8e6e381f4@netcabo.pt>
2005-06-15 15:46         ` Preben Randhol
     [not found]         ` <20050615154640.GA1921@pvv.org>
2005-06-15 16:14           ` Marius Amado Alves
     [not found]           ` <f04ccd7efd67fe197cc14cda89340779@netcabo.pt>
2005-06-15 16:20             ` Preben Randhol
2005-06-15 19:30 ` Simon Wright
2005-06-15 22:40 ` Lionel Draghi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox