From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,602331146257f418
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news3.google.com!news.glorb.com!transit.nntp.hccnet.nl!transit1.nntp.hccnet.nl!border2.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!newsfeed.arcor.de!news.arcor.de!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Data table text I/O package?
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.14.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <m2psuodjsz.fsf@hugin.crs4.it>	<m2ll5beoaj.fsf@hugin.crs4.it>
	<mailman.36.1118844777.17633.comp.lang.ada@ada-france.org>
	<m2hdfzek8i.fsf@hugin.crs4.it>	<bcqdncc6ba5A5C3fRVn-sQ@megapath.net>
	<m2k6ku8w2s.fsf@hugin.crs4.it>
 <mailman.50.1118919277.17633.comp.lang.ada@ada-france.org>
 <42b169a6$0$27776$9b4e6d93@newsread2.arcor-online.net>
 <8j219bkiemjq$.1mmc0sh5a5yfi$.dlg@40tude.net>
 <42b17420$0$1141$9b4e6d93@newsread4.arcor-online.net>
 <24xw0b1odenq.mogaaz1unwv0.dlg@40tude.net>
 <42b6a64f$0$1132$9b4e6d93@newsread4.arcor-online.net>
 <1in4nty16vspl$.1cicehrmojyok$.dlg@40tude.net>
 <42b70a06$0$27782$9b4e6d93@newsread2.arcor-online.net>
 <1u56bhsve8jaq$.1jaobcwbltsz1$.dlg@40tude.net>
 <42b7dccb$0$1138$9b4e6d93@newsread4.arcor-online.net>
 <2o3wvkpwoqv8.11k4w9yehrxxx$.dlg@40tude.net>
 <42b7fcde$0$15419$9b4e6d93@newsread2.arcor-online.net>
 <nfosboayysqq.18h2l9wtc4081$.dlg@40tude.net>
 <42b8803d$0$1140$9b4e6d93@newsread4.arcor-online.net>
Date: Wed, 22 Jun 2005 14:15:14 +0200
Message-ID: <5k89l0blbw5m.tlbbgh2pji8x.dlg@40tude.net>
NNTP-Posting-Date: 22 Jun 2005 14:15:11 MEST
NNTP-Posting-Host: 241ac716.newsread4.arcor-online.net
X-Trace: 
 DXC=ilZn0EBPAffci]h2U0KHce:ejgIfPPlddjW\KbG]kaMh]kI_X=5Keafj6FTHJKhc<aWRXZ37ga[7jn919Q4_`VjiB8=X\UUgbkd
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:11550
Date: 2005-06-22T14:15:11+02:00
List-Id: <comp.lang.ada>

On Tue, 21 Jun 2005 23:01:59 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov wrote:
> 
> Let me first guess that many here have their largely
> regular and homogenuous data in mind. I'm not talking
> about this. We went off from what to do if you
> don't have atomic, homogenous, unambigous data, sent
> around.
> 
> 1) If you have a nice arrangement of exactly one set of
>    array-like data of guaranteed quality, there is little
>    to win by using XML.

OK, that is a big difference. Tables representing tree-like structures are
awful.

>> Sorry, but the thread's subject reads "Data table text I/O package". Text =
>> rendered data.
> 
> Notice that the thread title has I/O. I/O can mean pretty printing,
> and it can mean a reliable and robust data input-output facility,
> working well in the face of erroneous input.

But for data exchange there are better techniques than XML. Even if you
mean [far stretched] objects brokering and active agents performed over a
stream or printable characters, even then I wouldn't take XML.

>>>The accuracy is well defined and most importantly,
>>>it is up to the application, yours and mine repectively.
>  
>> This is a wrong approach of course.
> 
> There is no more accurate representation of 3.15 than the text "3.15",
> right under our noses. In a text data stream, tabular, XML, whatever.

The text "3.15" represents what? Everything of course depends on the OSI
layer we are talking about. (:-))

[...]
> The accuracy of the data may not be defined at all, IN THE DATA
> STREAM. (The again, some peoply may try, adding a schema.)

Then you cannot talk about numbers transferred. You said "3.15" is a text.
So let it be a text. "3.1 5" is also a text, as valid as "3.15" [at this
level of abstraction.]

BTW, again there are better ways to send texts than XML offers.

>> [3.1499, 3.1600]
> 
> Well, someone will ask you, 'and what exactly is 3.1499?' on
> *our* machine?

3.1499 is the lower bound. So on your machine you can represent it by any
number less or equal to 3.1499. You loose precision, but retain
correctness. The true value is always within the bounds. There is still a
problem, but a much lesser one.

> Now consider separated key=value lines. They will be longer,
> but you can scan the line looking for the key strings. A big
> step up. XML isn't worse in my view.

Unfortunately in our case it is not that simple. key=value does not help.
The problem is that data need to be sorted and filtered using various
criteria. In other words a value has more than one key. A relational DB
would probably help, but to load that amount of data would take too long.
So it ends up with a specialized tool chain, integrated diagnostic etc.

BTW, 80% of that would probably be unnecessary if Ada were used! (:-)) But
the customer wished otherwise...

> A formatted table just isn't that robust.
> Consider the case where the headline
> gets lost. The missing redundancy will leave you with a
> puzzle, not a robust set of self describing text.

It is a bad idea to correct I/O error using syntax anyway. Relevant errors
are only ones made by humans. It is very unlikely that somebody would
forget to read a table header [I don't talk about writing, because to write
in XML is beyond anybody's capability anyway.]

Humans are unbeatable in pattern recognition. This is whole idea behind
tables. Tab stops and lines are very easy patterns to detect and any error
becomes immediately visible long before inspecting the table contents.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de