From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,24d7acf9b853aac8
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news3.google.com!feeder.news-service.com!85.214.198.2.MISMATCH!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Natasha Kerensikova <lithiumcat@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: S-expression I/O in Ada
Date: Thu, 19 Aug 2010 07:42:02 +0000 (UTC)
Organization: A noiseless patient Spider
Message-ID: <slrni6po2a.dki.lithiumcat@sigil.instinctive.eu>
References: <547afa6b-731e-475f-a7f2-eaefefb25861@k8g2000prh.googlegroups.com>
 <slrni6lg3e.1efq.lithiumcat@sigil.instinctive.eu>
 <i4em7q$1pcu$1@adenine.netfront.net>
 <slrni6nem3.1efq.lithiumcat@sigil.instinctive.eu>
 <ebc7b61e-f12d-4d44-9463-0d6a4947fd19@l6g2000yqb.googlegroups.com>
 <slrni6nioh.1efq.lithiumcat@sigil.instinctive.eu>
 <5f5303d4-075f-48ec-bd9b-17c9052cadd6@k10g2000yqa.googlegroups.com>
 <slrni6npj7.1efq.lithiumcat@sigil.instinctive.eu>
 <7a94f8fe-8323-4df4-ad72-b8114fd95800@w30g2000yqw.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 19 Aug 2010 07:42:02 +0000 (UTC)
Injection-Info: mx01.eternal-september.org;
 posting-host="Mda950WjNwNLAFOE7yJXQw";
	logging-data="6182"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX19oj/a71wij4G7K3TYyPDMo"
User-Agent: slrn/0.9.9p1 (FreeBSD)
Cancel-Lock: sha1:dJqcNLQPN/5eQSM8ZF8jYt1BDgM=
Xref: g2news1.google.com comp.lang.ada:13504
Date: 2010-08-19T07:42:02+00:00
List-Id: <comp.lang.ada>

On 2010-08-18, Ludovic Brenta <ludovic@ludovic-brenta.org> wrote:
> Space efficiency and human readability are antagonistic goals. S-
> Expressions are not meant to be space-efficient. If you want space
> efficiency, pipe the S-Expressions through zlib :) OTOH, a raw memory
> dump is not human-readable, hence the optional hex-encoding.

Actually a pretty good space efficiency can be attained with Rivest's
S-expressions, in what he defined as the canonical form: the overhead of
each list is two bytes (the opening and closing parentheses), there is
no per-element overhead in lists, and the overhead in atom encoding is
logarithmic (the raw binary data is inserted into the stream as-is, with
only a header containing its size in ASCII-encoded decimal, followed by
a colon). Of course that's assuming the atom content is already
space-efficient.

I think space-efficiency vs human readability should not be a choice at
the S_Expressions package level. It should rather be a choice at the
object-level, or maybe better at the application level, choosing between
different representations provided by the objects.

>> Yes, and that's exactly what I would do if I was forced to hide Atom
>> type definition; except I wouldn't use String as an intermediate value,
>> rather Storage_Element_Array or Stream_Element_Array, because I allow both
>> string and binary contents in atoms.
>>
>> Which then raises the question, why not use array-of-octet directly, to
>> avoid issues when Storage_Element or Stream_Element are not octets.
>
> Sure, that's an option. I used Storage_Element but I might as well
> have used Interfaces.Unsigned_8.

So are Storage_Element or Interfaces.Unsigned_8 better in some way then
the user-defined Octet in my S_Expression package?

>> But then I could make a public array-of-octets type, and provide
>> functions to "convert" back and forth between the private Atom type and
>> public array-of-octets type. Would that be acceptable?
>
> Sure. The only problem would be that you probably want the external
> representation to be human-readable, so you'd still need some sort of
> conversion.

As I said, in my opinion it should be an application-level choice.

However Rivest's standard does provide way of encoding atoms in
different more or less human-readable formats.

I described above the canonical form, which uses the so-called verbatim
encoding (raw data prepended by the size).

When the atom content is limited to a certain subset of octets, which
is basically ASCII letters, digits and a few punctuation marks, there is
the "token encoding" which is putting the contents directly, separated
by whitespace. That's the encoding of all atoms in my example (even
though technically Rivet's standard forbids tokens beginning with a
digit, all my parsers handled them well).

When the atom content is more complicated text, there is the
quoted-string encoding, which allows almost anything, but backslash and
double-quote have to be escaped.

When the atom content is binary, but the S-expression has to be
text-based for some reason (be it readability, or transport over a
test-based protocol (e.g. SMTP)), there are hexadecimal and base-64
encodings, which should be self-describing enough.

And  when you want the space-efficiency or unicity or ease-of-parse of
the canonical form but have to send it over a text-based channel, there
is the brace-encoding, which is a full S-expression (not a single atom)
encoded in base-64 and surrounded by braces.

However all these encodings allowed by Rivest's standard represent the
same atom contents. So design-wise, it seems best for the package to
expose an octet-sequence interface for atoms, and the encoding and
decoding being handled internally along with I/O. There would only
options set by the application to guide the package into the preferred
choice of encoding (e.g. Use_Token_When_Possible, Canonical_Form,
Brace_Encode, etc).

While the S-expression decoding part is tedious but quite easy, I found
it quite fun to write the S-expression "pretty printer" (and it was no
small part of the fun to define what is "pretty").


Thanks for your comments,
Natacha