From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,602331146257f418
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news1.google.com!proxad.net!proxad.net!newsfeed.arcor.de!news.arcor.de!not-for-mail
Date: Sun, 03 Jul 2005 11:08:54 +0200
From: Georg Bauhaus <bauhaus@futureapps.de>
User-Agent: Debian Thunderbird 1.0.2 (X11/20050331)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: TSV and CSV
References: <m2psuodjsz.fsf@hugin.crs4.it>
 <mailman.35.1118835822.17633.comp.lang.ada@ada-france.org>
 <m2ll5beoaj.fsf@hugin.crs4.it>
 <mailman.36.1118844777.17633.comp.lang.ada@ada-france.org>
 <m2hdfzek8i.fsf@hugin.crs4.it> <bcqdncc6ba5A5C3fRVn-sQ@megapath.net>
 <m2k6ku8w2s.fsf@hugin.crs4.it> <-pGdnVJqme2I_V7fRVn-qA@megapath.net>
 <bGXwe.141215$dP1.494536@newsc.telia.net>
 <9tOdnboZwYMIDlnfRVn-iw@megapath.net> <hHixe.28376$d5.181659@newsb.telia.net>
 <da4esd$6rb$1@nwrdmz02.dmz.ncs.ea.ibs-infra.bt.com>
 <42c5e46e$0$10818$9b4e6d93@newsread4.arcor-online.net>
 <z8SdnQAFJ8pRdVjfRVn-gw@megapath.net>
In-Reply-To: <z8SdnQAFJ8pRdVjfRVn-gw@megapath.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <42c7b52b$0$10804$9b4e6d93@newsread4.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 03 Jul 2005 11:51:43 MEST
NNTP-Posting-Host: d9c36a73.newsread4.arcor-online.net
X-Trace: 
 DXC=Yb;8J1G[2Xea0B5i45NL;d:ejgIfPPlddjW\KbG]kaMhliQbn6H@_EiMOG>a8EXB;fhP3YJKgE\jlT9@o2<2gP;b
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:11838
Date: 2005-07-03T11:51:43+02:00
List-Id: <comp.lang.ada>

Randy Brukardt wrote:
> "Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
> news:42c5e46e$0$10818$9b4e6d93@newsread4.arcor-online.net...
> 
>>Martin Dowie wrote:
>>
>>
>>>If you want commas in the data fields, simply wrap the data fields in
>>>quotes, e.g.
>>>
>>>"1","alpha, beta, gamma","foo"
>>
>>You can't be seriously sugggesting this?

I was addressing the "simply" in the sentence above about wrapping
the data fields, because it only shifts the problem to the next
escaping level, which you then have mentioned.
  It's there where the problems usually start,
"simply do this, and, uhm that, and, oh, I forgot you should...".
Bottom line: We don't have standardised CSV document types.

Even considering the CSV description Ed has mentioned,
with all its buts and donts which speak for themselves...
In fact, they repeat some of the input to the XML design
discussion, which lead to a standard.

Just to make sure, it is easy to think of a (one)
set of rules for producing good CSV data. However, like
Ada programs, producing them is far less important than
using them later, from a consumption point of view.
At least if you care about the recipients at all.
When reading CSV data, you can think of more than one set
of rules, in sharp contrast to just one when producing
CSV data.

One average CSV stream we read contains no line breaks,
probably for reaons of transmission speed.
As if this weren't enough (excuse: "simply" count fields)
some fields can *contain* non-escaped separators (excuse:
"simply" inspect context to find out whether the comma is
acutally a separator...).

It is rare that I have been given a CSV file/stream to process
together with a clear description. (So maybe I'm biased.)
The streams have almost always had some hack or some
"cleverness" in them. I believe that a standardised data
format helps, in practise, to reduce undocumented hacks and
cleverness. One such format type can be based on XML.


> Of course he's seriously suggesting this, it's how these files work.

This is how these files *should* work, ideally. As you can see
on http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#FileFormat,
you still have to climb up a decision tree and visit this or that
branch in order to parse CSV data in a reliable fashion,
unless you know exactly how they are produced.

All in all you end with:

>  Pretty much any format can be
> made to work for that.

...provided you sort of reinvent the markup rules and wheels.
And disregard your own advice to use a really standardised
format (in applications not all under your control.) ;-)


-- Georg