comp.lang.ada
 help / color / mirror / Atom feed
From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Data table text I/O package?
Date: Thu, 30 Jun 2005 20:22:10 -0500
Date: 2005-06-30T20:22:10-05:00	[thread overview]
Message-ID: <JvGdnW2UVu2VB1nfRVn-sQ@megapath.net> (raw)
In-Reply-To: m2br5nd6sk.fsf@hugin.crs4.it

"Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message
news:m2br5nd6sk.fsf@hugin.crs4.it...
replying to me:

...
> I thought I had specified my needs.  But in case I forgot:
>
>  a) A format for storing experimental data in tabular form.
>
>  b) A format I easily can manipulate with my standard Unix toolbox.
>
>  c) A format I easily can read and get an overview of (sections of)
>     the data.
>
>  d) A format that easily can be imported into programs I'm not in
>     control of.  (concrete examples are Gnuplot, R, OOo Calc and
>     Excel)
>
>  e) A format I easily can read and write from my own programs.
>
> Tabulator separated text files handle this quite fine (although OOo
> and Excel users have to be careful about their number format settings
> when they import the files).

Perhaps. But it's your "needs" that I question. (b) for instance doesn't
really buy anything, as you can't do any *real* data transformations that
way. Sure, you can add or delete a column, but that's trivial to code in the
unusual case that you need it. And in about the same time that a text
processing tool could do that job.

As far as (c) goes, I don't believe that mixing human output with data
storage/transmission is a good idea. Period.

So that leaves us with (a), (d), and (e). [Certainly real requirements.]

> > For program-to-program communication, there really are only two
> > sensible options. If both ends are under your control, then using a
> > binary format (with versioning and error detection if needed) is
> > preferable, because it has the least overhead and there is no need
> > for data conversion.
>
> Yes.  But this doesn't handle b), c) and d).

Of course it doesn't handle (d) [because (d) violates the premise]. And as
mentioned above, I don't think (b) and (c) should even be goals.

> > OTOH, if the performance of the connection isn't critical, then
> > using a well-known standard format that already has needed tools for
> > it seems like the best option. Even if you don't currently need to
> > allow access by other systems, you're leaving the door open for
> > future programs outside your system to use the data.
>
> And which formats, besides tabulator separated text files, handle the
> requirements?  XML doesn't handle b), c), d) and e).

Certainly (e) is handled by using tools like XMLOUT. (It can't be much
harder to write than HTML, which is trivial.) I'd be surprised if most
modern tools that can handle CSV couldn't handle a simlar XML file.
(Certainly Excel can read XML files.). And I don't want to sound like a
broken record about (b) and (c).

> > The cases that are neither of these and thus would make sense to use
> > some internal, non-portable text format are essentially non-existent.
>
> I think I have one of these "essentially non-existent" cases.  And
> almost everything I do seems to be one of those cases.

Could be, but I think it is because you have a bogus set of requirements.

> > Note that human readability of program-to-program data is a
> > non-issue.
>
> You're apparently working in a very different area than I am.  Almost
> all data going from one program to another should also be available in
> a human-readable format.  My work is to look at data, not to program.
> The programs are just written to process the data from one form into
> another form - which hopefully can teach us something new and
> interesting.

I hate to split hairs, but I think your job is to analyze data, not to "look
at data". If there is enough data to make sense processing it with a
program, there is little point at looking at it manually. You had mentioned
a large data set (50 MB?) earlier; I hope you're looking at the analysis,
not at the data. I hardly ever look at raw web logs (the closest analog I
have); I use a program and look at the results of its analysis.

Truthfully, if what you described above is true, you probably ought to be
programming in Perl (ugh) or Python. Because Ada's text processing is its
weak link, and it makes little sense to write any significant amount of text
processing code in Ada. (I say that, despite the fact that I do exactly
that -- but that's because I use Ada for everything that I can't do with a
simple batch file.)

> > Indeed, it is a mistake to try to bring that into the equation, as
> > it adds a huge amount of overhead to the task. I've always used
> > agile methods for debugging such data: if, in fact, I need to
> > examine such a data stream, I'm write a program to display it. But I
> > don't worry about that until/unless the need arises.
>
> It seems that you're a programmer and not a researcher.  I am (almost)
> always interested in the data.  I have yet to run into a case where I
> wasn't interested in seeing the output of a program.

Sure, but the output of the program is an analysis of the data, not some raw
(and huge) data stream.

> > It often does not arise, and even when it does, it's often not
> > necessary to be able to display everything -- and it's often better
> > to write a monitor for an interesting condition than filling a disk
> > with 10 GB of text!
>
> I would spend all my time writing monitors that way.

Yes, formatting imput/results usefully for humans is the hard part of
programming. Documentation, GUI input/output, and log files (that is, the
stuff for humans) take up approximately 4 times as much time to create as
the actual filter for our spam filter. For our compiler, (which needs little
documentation or specialized I/O), it was always much less, but it still is
a significant part (perhaps as much as half) of the effort. The other tools
(CLAW, the web log analyzer, the web server, etc.) all have fallen somewhere
in between those extremes -- but that's the real job that we get paid for
(because its not fun and not interesting -- someone will do the fun and
interesting stuff for free, but not the hard work, most of the time anyway).

Like I said before, your mileage may differ. If you're stuck with lame tools
that can't process a sane data format, it might make sense to use some junk
text format to match it. (I'd rather get better tools, but I realize that
isn't always possible.) But I'd hardly expect any help in creating such
stuff.

                       Randy.






  reply	other threads:[~2005-07-01  1:22 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-15  9:57 Data table text I/O package? Jacob Sparre Andersen
2005-06-15 11:43 ` Preben Randhol
2005-06-15 13:35   ` Jacob Sparre Andersen
2005-06-15 14:12     ` Preben Randhol
2005-06-15 15:02       ` Jacob Sparre Andersen
2005-06-15 16:17         ` Preben Randhol
2005-06-15 16:58           ` Dmitry A. Kazakov
2005-06-15 17:30             ` Marius Amado Alves
2005-06-15 18:41               ` Dmitry A. Kazakov
2005-06-15 19:09                 ` Marius Amado Alves
2005-06-15 18:58         ` Randy Brukardt
2005-06-16  9:55           ` Jacob Sparre Andersen
2005-06-16 10:53             ` Marius Amado Alves
2005-06-16 12:24               ` Robert A Duff
2005-06-16 14:01               ` Georg Bauhaus
2005-06-16 12:27                 ` Dmitry A. Kazakov
2005-06-16 14:46                   ` Georg Bauhaus
2005-06-16 14:51                     ` Dmitry A. Kazakov
2005-06-20 11:19                       ` Georg Bauhaus
2005-06-20 11:39                         ` Dmitry A. Kazakov
2005-06-20 18:25                           ` Georg Bauhaus
2005-06-20 18:45                             ` Preben Randhol
2005-06-20 18:54                             ` Dmitry A. Kazakov
2005-06-21  9:24                               ` Georg Bauhaus
2005-06-21  9:52                                 ` Jacob Sparre Andersen
2005-06-21 11:10                                   ` Georg Bauhaus
2005-06-21 12:35                                     ` Jacob Sparre Andersen
2005-06-21 10:42                                 ` Dmitry A. Kazakov
2005-06-21 11:41                                   ` Georg Bauhaus
2005-06-21 12:44                                     ` Dmitry A. Kazakov
2005-06-21 21:01                                       ` Georg Bauhaus
2005-06-22 12:15                                         ` Dmitry A. Kazakov
2005-06-22 22:24                                           ` Georg Bauhaus
2005-06-23  9:03                                             ` Dmitry A. Kazakov
2005-06-23  9:47                                               ` Georg Bauhaus
2005-06-23 10:34                                                 ` Dmitry A. Kazakov
2005-06-23 11:37                                                   ` Georg Bauhaus
2005-06-23 12:59                                                     ` Dmitry A. Kazakov
2005-06-23 14:16                                               ` Marc A. Criley
2005-06-25 16:38                               ` Simon Wright
2005-06-16 13:26                 ` Marius Amado Alves
2005-06-16 18:10                   ` Georg Bauhaus
2005-06-30  3:02             ` Randy Brukardt
2005-06-30 18:43               ` Jacob Sparre Andersen
2005-07-01  1:22                 ` Randy Brukardt [this message]
2005-07-01  3:01                   ` Alexander E. Kopilovich
2005-07-01  5:59                     ` Jeffrey Carter
2005-07-02  1:54                     ` Randy Brukardt
2005-07-02 10:24                       ` Dmitry A. Kazakov
2005-07-06 22:04                         ` Randy Brukardt
2005-06-30 19:24               ` Björn Persson
2005-07-01  0:54                 ` Randy Brukardt
2005-07-01 21:36                   ` TSV and CSV Björn Persson
2005-07-01 22:08                     ` Martin Dowie
2005-07-02  0:05                       ` Georg Bauhaus
2005-07-02  1:10                         ` Randy Brukardt
2005-07-02  1:20                           ` Ed
2005-07-03  9:08                           ` Georg Bauhaus
2005-07-02  0:07                   ` Data table text I/O package? Georg Bauhaus
2005-07-02  1:21                     ` Randy Brukardt
     [not found]     ` <20050615141236.GA90053@pvv.org>
2005-06-15 15:40       ` Marius Amado Alves
2005-06-15 19:18         ` Oliver Kellogg
2005-06-17  9:02           ` Jacob Sparre Andersen
     [not found]       ` <7adf1648bb99ca2bb4055ed8e6e381f4@netcabo.pt>
2005-06-15 15:46         ` Preben Randhol
     [not found]         ` <20050615154640.GA1921@pvv.org>
2005-06-15 16:14           ` Marius Amado Alves
     [not found]           ` <f04ccd7efd67fe197cc14cda89340779@netcabo.pt>
2005-06-15 16:20             ` Preben Randhol
2005-06-15 19:30 ` Simon Wright
2005-06-15 22:40 ` Lionel Draghi
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox