From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,602331146257f418 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!news.glorb.com!transit.nntp.hccnet.nl!transit1.nntp.hccnet.nl!border2.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Data table text I/O package? Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.14.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <42b169a6$0$27776$9b4e6d93@newsread2.arcor-online.net> <8j219bkiemjq$.1mmc0sh5a5yfi$.dlg@40tude.net> <42b17420$0$1141$9b4e6d93@newsread4.arcor-online.net> <24xw0b1odenq.mogaaz1unwv0.dlg@40tude.net> <42b6a64f$0$1132$9b4e6d93@newsread4.arcor-online.net> <1in4nty16vspl$.1cicehrmojyok$.dlg@40tude.net> <42b70a06$0$27782$9b4e6d93@newsread2.arcor-online.net> <1u56bhsve8jaq$.1jaobcwbltsz1$.dlg@40tude.net> <42b7dccb$0$1138$9b4e6d93@newsread4.arcor-online.net> <2o3wvkpwoqv8.11k4w9yehrxxx$.dlg@40tude.net> <42b7fcde$0$15419$9b4e6d93@newsread2.arcor-online.net> <42b8803d$0$1140$9b4e6d93@newsread4.arcor-online.net> Date: Wed, 22 Jun 2005 14:15:14 +0200 Message-ID: <5k89l0blbw5m.tlbbgh2pji8x.dlg@40tude.net> NNTP-Posting-Date: 22 Jun 2005 14:15:11 MEST NNTP-Posting-Host: 241ac716.newsread4.arcor-online.net X-Trace: DXC=ilZn0EBPAffci]h2U0KHce:ejgIfPPlddjW\KbG]kaMh]kI_X=5Keafj6FTHJKhc On Tue, 21 Jun 2005 23:01:59 +0200, Georg Bauhaus wrote: > Dmitry A. Kazakov wrote: > > Let me first guess that many here have their largely > regular and homogenuous data in mind. I'm not talking > about this. We went off from what to do if you > don't have atomic, homogenous, unambigous data, sent > around. > > 1) If you have a nice arrangement of exactly one set of > array-like data of guaranteed quality, there is little > to win by using XML. OK, that is a big difference. Tables representing tree-like structures are awful. >> Sorry, but the thread's subject reads "Data table text I/O package". Text = >> rendered data. > > Notice that the thread title has I/O. I/O can mean pretty printing, > and it can mean a reliable and robust data input-output facility, > working well in the face of erroneous input. But for data exchange there are better techniques than XML. Even if you mean [far stretched] objects brokering and active agents performed over a stream or printable characters, even then I wouldn't take XML. >>>The accuracy is well defined and most importantly, >>>it is up to the application, yours and mine repectively. > >> This is a wrong approach of course. > > There is no more accurate representation of 3.15 than the text "3.15", > right under our noses. In a text data stream, tabular, XML, whatever. The text "3.15" represents what? Everything of course depends on the OSI layer we are talking about. (:-)) [...] > The accuracy of the data may not be defined at all, IN THE DATA > STREAM. (The again, some peoply may try, adding a schema.) Then you cannot talk about numbers transferred. You said "3.15" is a text. So let it be a text. "3.1 5" is also a text, as valid as "3.15" [at this level of abstraction.] BTW, again there are better ways to send texts than XML offers. >> [3.1499, 3.1600] > > Well, someone will ask you, 'and what exactly is 3.1499?' on > *our* machine? 3.1499 is the lower bound. So on your machine you can represent it by any number less or equal to 3.1499. You loose precision, but retain correctness. The true value is always within the bounds. There is still a problem, but a much lesser one. > Now consider separated key=value lines. They will be longer, > but you can scan the line looking for the key strings. A big > step up. XML isn't worse in my view. Unfortunately in our case it is not that simple. key=value does not help. The problem is that data need to be sorted and filtered using various criteria. In other words a value has more than one key. A relational DB would probably help, but to load that amount of data would take too long. So it ends up with a specialized tool chain, integrated diagnostic etc. BTW, 80% of that would probably be unnecessary if Ada were used! (:-)) But the customer wished otherwise... > A formatted table just isn't that robust. > Consider the case where the headline > gets lost. The missing redundancy will leave you with a > puzzle, not a robust set of self describing text. It is a bad idea to correct I/O error using syntax anyway. Relevant errors are only ones made by humans. It is very unlikely that somebody would forget to read a table header [I don't talk about writing, because to write in XML is beyond anybody's capability anyway.] Humans are unbeatable in pattern recognition. This is whole idea behind tables. Tab stops and lines are very easy patterns to detect and any error becomes immediately visible long before inspecting the table contents. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de