From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,602331146257f418 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news1.google.com!newsread.com!newsprint.newsread.com!news-feed01.roc.ny.frontiernet.net!nntp.frontiernet.net!newscon06.news.prodigy.com!prodigy.net!border1.nntp.dca.giganews.com!nntp.giganews.com!local01.nntp.dca.giganews.com!nntp.megapath.net!news.megapath.net.POSTED!not-for-mail NNTP-Posting-Date: Thu, 30 Jun 2005 20:19:04 -0500 From: "Randy Brukardt" Newsgroups: comp.lang.ada References: <-pGdnVJqme2I_V7fRVn-qA@megapath.net> Subject: Re: Data table text I/O package? Date: Thu, 30 Jun 2005 20:22:10 -0500 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.50.4927.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4927.1200 Message-ID: NNTP-Posting-Host: 64.32.209.38 X-Trace: sv3-XkVS8LoZD4FmM2nRGw7SSRZ5YUrWU6Tk8QflEWHgsjxrRIAYmR2+nW7ArUnLFCidcWY93lxHEN0hfOF!4A6jsPQRMtEumkrAIzMjvVOCdB6/uAoFK5xT6T19OrGrl2NtlkS93N+HhQxbf+tcQjDlag5chRCD X-Complaints-To: abuse@megapath.net X-DMCA-Complaints-To: abuse@megapath.net X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.31 Xref: g2news1.google.com comp.lang.ada:11795 Date: 2005-06-30T20:22:10-05:00 List-Id: "Jacob Sparre Andersen" wrote in message news:m2br5nd6sk.fsf@hugin.crs4.it... replying to me: ... > I thought I had specified my needs. But in case I forgot: > > a) A format for storing experimental data in tabular form. > > b) A format I easily can manipulate with my standard Unix toolbox. > > c) A format I easily can read and get an overview of (sections of) > the data. > > d) A format that easily can be imported into programs I'm not in > control of. (concrete examples are Gnuplot, R, OOo Calc and > Excel) > > e) A format I easily can read and write from my own programs. > > Tabulator separated text files handle this quite fine (although OOo > and Excel users have to be careful about their number format settings > when they import the files). Perhaps. But it's your "needs" that I question. (b) for instance doesn't really buy anything, as you can't do any *real* data transformations that way. Sure, you can add or delete a column, but that's trivial to code in the unusual case that you need it. And in about the same time that a text processing tool could do that job. As far as (c) goes, I don't believe that mixing human output with data storage/transmission is a good idea. Period. So that leaves us with (a), (d), and (e). [Certainly real requirements.] > > For program-to-program communication, there really are only two > > sensible options. If both ends are under your control, then using a > > binary format (with versioning and error detection if needed) is > > preferable, because it has the least overhead and there is no need > > for data conversion. > > Yes. But this doesn't handle b), c) and d). Of course it doesn't handle (d) [because (d) violates the premise]. And as mentioned above, I don't think (b) and (c) should even be goals. > > OTOH, if the performance of the connection isn't critical, then > > using a well-known standard format that already has needed tools for > > it seems like the best option. Even if you don't currently need to > > allow access by other systems, you're leaving the door open for > > future programs outside your system to use the data. > > And which formats, besides tabulator separated text files, handle the > requirements? XML doesn't handle b), c), d) and e). Certainly (e) is handled by using tools like XMLOUT. (It can't be much harder to write than HTML, which is trivial.) I'd be surprised if most modern tools that can handle CSV couldn't handle a simlar XML file. (Certainly Excel can read XML files.). And I don't want to sound like a broken record about (b) and (c). > > The cases that are neither of these and thus would make sense to use > > some internal, non-portable text format are essentially non-existent. > > I think I have one of these "essentially non-existent" cases. And > almost everything I do seems to be one of those cases. Could be, but I think it is because you have a bogus set of requirements. > > Note that human readability of program-to-program data is a > > non-issue. > > You're apparently working in a very different area than I am. Almost > all data going from one program to another should also be available in > a human-readable format. My work is to look at data, not to program. > The programs are just written to process the data from one form into > another form - which hopefully can teach us something new and > interesting. I hate to split hairs, but I think your job is to analyze data, not to "look at data". If there is enough data to make sense processing it with a program, there is little point at looking at it manually. You had mentioned a large data set (50 MB?) earlier; I hope you're looking at the analysis, not at the data. I hardly ever look at raw web logs (the closest analog I have); I use a program and look at the results of its analysis. Truthfully, if what you described above is true, you probably ought to be programming in Perl (ugh) or Python. Because Ada's text processing is its weak link, and it makes little sense to write any significant amount of text processing code in Ada. (I say that, despite the fact that I do exactly that -- but that's because I use Ada for everything that I can't do with a simple batch file.) > > Indeed, it is a mistake to try to bring that into the equation, as > > it adds a huge amount of overhead to the task. I've always used > > agile methods for debugging such data: if, in fact, I need to > > examine such a data stream, I'm write a program to display it. But I > > don't worry about that until/unless the need arises. > > It seems that you're a programmer and not a researcher. I am (almost) > always interested in the data. I have yet to run into a case where I > wasn't interested in seeing the output of a program. Sure, but the output of the program is an analysis of the data, not some raw (and huge) data stream. > > It often does not arise, and even when it does, it's often not > > necessary to be able to display everything -- and it's often better > > to write a monitor for an interesting condition than filling a disk > > with 10 GB of text! > > I would spend all my time writing monitors that way. Yes, formatting imput/results usefully for humans is the hard part of programming. Documentation, GUI input/output, and log files (that is, the stuff for humans) take up approximately 4 times as much time to create as the actual filter for our spam filter. For our compiler, (which needs little documentation or specialized I/O), it was always much less, but it still is a significant part (perhaps as much as half) of the effort. The other tools (CLAW, the web log analyzer, the web server, etc.) all have fallen somewhere in between those extremes -- but that's the real job that we get paid for (because its not fun and not interesting -- someone will do the fun and interesting stuff for free, but not the hard work, most of the time anyway). Like I said before, your mileage may differ. If you're stuck with lame tools that can't process a sane data format, it might make sense to use some junk text format to match it. (I'd rather get better tools, but I realize that isn't always possible.) But I'd hardly expect any help in creating such stuff. Randy.