From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,24d7acf9b853aac8 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!feeder.news-service.com!85.214.198.2.MISMATCH!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Natasha Kerensikova Newsgroups: comp.lang.ada Subject: Re: S-expression I/O in Ada Date: Thu, 19 Aug 2010 07:42:02 +0000 (UTC) Organization: A noiseless patient Spider Message-ID: References: <547afa6b-731e-475f-a7f2-eaefefb25861@k8g2000prh.googlegroups.com> <5f5303d4-075f-48ec-bd9b-17c9052cadd6@k10g2000yqa.googlegroups.com> <7a94f8fe-8323-4df4-ad72-b8114fd95800@w30g2000yqw.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Date: Thu, 19 Aug 2010 07:42:02 +0000 (UTC) Injection-Info: mx01.eternal-september.org; posting-host="Mda950WjNwNLAFOE7yJXQw"; logging-data="6182"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oj/a71wij4G7K3TYyPDMo" User-Agent: slrn/0.9.9p1 (FreeBSD) Cancel-Lock: sha1:dJqcNLQPN/5eQSM8ZF8jYt1BDgM= Xref: g2news1.google.com comp.lang.ada:13504 Date: 2010-08-19T07:42:02+00:00 List-Id: On 2010-08-18, Ludovic Brenta wrote: > Space efficiency and human readability are antagonistic goals. S- > Expressions are not meant to be space-efficient. If you want space > efficiency, pipe the S-Expressions through zlib :) OTOH, a raw memory > dump is not human-readable, hence the optional hex-encoding. Actually a pretty good space efficiency can be attained with Rivest's S-expressions, in what he defined as the canonical form: the overhead of each list is two bytes (the opening and closing parentheses), there is no per-element overhead in lists, and the overhead in atom encoding is logarithmic (the raw binary data is inserted into the stream as-is, with only a header containing its size in ASCII-encoded decimal, followed by a colon). Of course that's assuming the atom content is already space-efficient. I think space-efficiency vs human readability should not be a choice at the S_Expressions package level. It should rather be a choice at the object-level, or maybe better at the application level, choosing between different representations provided by the objects. >> Yes, and that's exactly what I would do if I was forced to hide Atom >> type definition; except I wouldn't use String as an intermediate value, >> rather Storage_Element_Array or Stream_Element_Array, because I allow both >> string and binary contents in atoms. >> >> Which then raises the question, why not use array-of-octet directly, to >> avoid issues when Storage_Element or Stream_Element are not octets. > > Sure, that's an option. I used Storage_Element but I might as well > have used Interfaces.Unsigned_8. So are Storage_Element or Interfaces.Unsigned_8 better in some way then the user-defined Octet in my S_Expression package? >> But then I could make a public array-of-octets type, and provide >> functions to "convert" back and forth between the private Atom type and >> public array-of-octets type. Would that be acceptable? > > Sure. The only problem would be that you probably want the external > representation to be human-readable, so you'd still need some sort of > conversion. As I said, in my opinion it should be an application-level choice. However Rivest's standard does provide way of encoding atoms in different more or less human-readable formats. I described above the canonical form, which uses the so-called verbatim encoding (raw data prepended by the size). When the atom content is limited to a certain subset of octets, which is basically ASCII letters, digits and a few punctuation marks, there is the "token encoding" which is putting the contents directly, separated by whitespace. That's the encoding of all atoms in my example (even though technically Rivet's standard forbids tokens beginning with a digit, all my parsers handled them well). When the atom content is more complicated text, there is the quoted-string encoding, which allows almost anything, but backslash and double-quote have to be escaped. When the atom content is binary, but the S-expression has to be text-based for some reason (be it readability, or transport over a test-based protocol (e.g. SMTP)), there are hexadecimal and base-64 encodings, which should be self-describing enough. And when you want the space-efficiency or unicity or ease-of-parse of the canonical form but have to send it over a text-based channel, there is the brace-encoding, which is a full S-expression (not a single atom) encoded in base-64 and surrounded by braces. However all these encodings allowed by Rivest's standard represent the same atom contents. So design-wise, it seems best for the package to expose an octet-sequence interface for atoms, and the encoding and decoding being handled internally along with I/O. There would only options set by the application to guide the package into the preferred choice of encoding (e.g. Use_Token_When_Possible, Canonical_Form, Brace_Encode, etc). While the S-expression decoding part is tedious but quite easy, I found it quite fun to write the S-expression "pretty printer" (and it was no small part of the fun to define what is "pretty"). Thanks for your comments, Natacha