From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,24d7acf9b853aac8 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!postnews.google.com!w30g2000yqw.googlegroups.com!not-for-mail From: Ludovic Brenta Newsgroups: comp.lang.ada Subject: Re: S-expression I/O in Ada Date: Wed, 18 Aug 2010 08:07:09 -0700 (PDT) Organization: http://groups.google.com Message-ID: <7a94f8fe-8323-4df4-ad72-b8114fd95800@w30g2000yqw.googlegroups.com> References: <547afa6b-731e-475f-a7f2-eaefefb25861@k8g2000prh.googlegroups.com> <5f5303d4-075f-48ec-bd9b-17c9052cadd6@k10g2000yqa.googlegroups.com> NNTP-Posting-Host: 153.98.68.197 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: posting.google.com 1282144036 14363 127.0.0.1 (18 Aug 2010 15:07:16 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Wed, 18 Aug 2010 15:07:16 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: w30g2000yqw.googlegroups.com; posting-host=153.98.68.197; posting-account=pcLQNgkAAAD9TrXkhkIgiY6-MDtJjIlC User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6,gzip(gfe) Xref: g2news1.google.com comp.lang.ada:13489 Date: 2010-08-18T08:07:09-07:00 List-Id: Natasha Kerensikova wrote on comp.lang.ada: >>> Obviously, discrete type serialization alone is not enough, as I will >>> have non-discrete atomic object to somehow turn into atoms. And your >>> _Blob functions suffer from the same issue as Jeffrey Carter's >>> implementation in that it's just a memory dump. > >> Why is that a problem? The only reason I can think of is that your >> type contains access values, in which case it's simply not an atom but >> rather a cons pair (or a list). > > Serialization isn't only about access values. Or maybe I'm misusing the > word. > > Anyway, my (and Rivest's) S-expressions being meant for storage and > transport, there are other issues with memory dumps than access values. > For example, endianness, space-efficiency, or human-readability. Endianness only affects discrete types, which the generic To_Atom and From_Atom for discrete types convert to textual representation. And my implementation theoretically also works with multi-byte characters since it simply calls 'Image and 'Value. Space efficiency and human readability are antagonistic goals. S- Expressions are not meant to be space-efficient. If you want space efficiency, pipe the S-Expressions through zlib :) OTOH, a raw memory dump is not human-readable, hence the optional hex-encoding. > Yes, and that's exactly what I would do if I was forced to hide Atom > type definition; except I wouldn't use String as an intermediate value, > rather Storage_Element_Array or Stream_Element_Array, because I allow both > string and binary contents in atoms. > > Which then raises the question, why not use array-of-octet directly, to > avoid issues when Storage_Element or Stream_Element are not octets. Sure, that's an option. I used Storage_Element but I might as well have used Interfaces.Unsigned_8. > But then I could make a public array-of-octets type, and provide > functions to "convert" back and forth between the private Atom type and > public array-of-octets type. Would that be acceptable? Sure. The only problem would be that you probably want the external representation to be human-readable, so you'd still need some sort of conversion. >> Therefore, as String is not a blob. The blob needs to be encoded into >> ASCII characters, the String does not because it already consists of >> characters. Therefore From_Blob hex-encodes the blob into a String. > > My (and Rivest's) atoms can contain any octet sequence, which makes the > hex encoding irrelevant. So I do treat strings as blobs, they are both > data only a type away from being an atom. > > You could somehow say that strings and blobs need only to be "cast" into > an atom, while other objects need to be serialized first (usually into > strings or machine-independant blobs). OK. I was working under the assumption that S-Expression files must be human-readable and only contain ASCII or UTF-8 in order to work with any text editor. But if you lift that assumption, then your solution works. >>> Let's take for example a Wide_Wide_String (e.g. because that's how my >>> application handles strings internally), which is not so good as an >>> example because a memory dump wouldn't be so much of an issue (except >>> maybe for endianness). Let's further assume I want it serialized in >>> UTF-8. How do I do that? >> >> By modifying my implementation a little: > > I think you're missing the point. I don't want to modify any > s-expression package implementation whenever I use a new type in an > application. Ah but the fact that you changed the application-level type is irrelevant. The relevant fact was that you changed the external encoding, which is hidden in the body of the S_Expression package, to UTF-8 (and which simply does not exist in your implementation). But I see your point. > Let's say for example I have a thousand of applications using Strings, > and one using Wide_Wide_Strings, I can't see how it can be justified to > change (or fork) the S-expression package to take it into account. No, but you could hex-encode the Wide_Wide_String instead of using UTF-8 in that one package. Or, for an alternative, see below... > It seems much more natural, at least to my C-used eyes, to ask that one > application to provide whatever is needed to serialize Wide_Wide_Strings > into atoms. i.e. subtype UTF8_String is String; function To_UTF8 (S : Wide_Wide_String) return UTF8_String; function To_Atom (S : Wide_Wide_String) return S_Expression.T is begin return To_Atom (To_UTF8 (S)); end To_Atom; ? which is a specialization of the example I used to demonstrate that you can move the serialization of any arbitrary type into the client if that's what you need. -- Ludovic Brenta.