From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,24d7acf9b853aac8 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,CP1252 Path: g2news1.google.com!postnews.google.com!l14g2000yql.googlegroups.com!not-for-mail From: Ludovic Brenta Newsgroups: comp.lang.ada Subject: Re: S-expression I/O in Ada Date: Thu, 12 Aug 2010 05:46:52 -0700 (PDT) Organization: http://groups.google.com Message-ID: <0cff56bc-32fc-44fe-9e29-9387a4eb4588@l14g2000yql.googlegroups.com> References: <547afa6b-731e-475f-a7f2-eaefefb25861@k8g2000prh.googlegroups.com> <87aap6wcdx.fsf@ludovic-brenta.org> <87vd7jliyi.fsf@ludovic-brenta.org> <699464f5-7f04-4ced-bc09-6ffc42c5322a@w30g2000yqw.googlegroups.com> <87ocdbl41u.fsf@ludovic-brenta.org> <318d4041-eb01-4419-ae68-e6f3436c5b66@i31g2000yqm.googlegroups.com> <8f4b65ed-e003-4ed9-8118-da8d240dd8aa@z28g2000yqh.googlegroups.com> <6a5068d7-b774-4c52-8b00-ddcc76865847@p7g2000yqa.googlegroups.com> NNTP-Posting-Host: 153.98.68.197 Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1281617288 23173 127.0.0.1 (12 Aug 2010 12:48:08 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Thu, 12 Aug 2010 12:48:08 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: l14g2000yql.googlegroups.com; posting-host=153.98.68.197; posting-account=pcLQNgkAAAD9TrXkhkIgiY6-MDtJjIlC User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6,gzip(gfe) Xref: g2news1.google.com comp.lang.ada:13173 Date: 2010-08-12T05:46:52-07:00 List-Id: Natacha Kerensikova wrote on comp.lang.ada: > On Aug 12, 12:55=A0pm, Ludovic Brenta > wrote: > >> Natacha Kerensikova wrote on comp.lang.ada: >> [...] > >>> Sexp_Stream is supposed to perform S-expression I/O over streams, >>> without ever constructing a S-expression into memory. It is supposed >>> to do only the encoding and decoding of S-expression format, to expose >>> its clients only atoms and relations. > >> But how can it expose atoms and relations without an in-memory tree >> representation? Honestly, I do not think that such a concept is >> viable. [...] > The reading part is a bit more tricky, and I admitted when I proposed > Sexp_Stream I didn't know how to make it. Having thought (maybe too > much) since then, here is what the interface might look like: > > type Node_Type is (S_None, S_List, S_Atom); > > function Current_Node_Type(sstream: in Sexp_Stream) return Atom_Type; > > procedure Get_Atom(sstream: in Sexp_Stream, contents: out > octet_array); > =A0 -- raises an exception when Current_Node_Type is not S_Atom > =A0 -- not sure "out octet_array" works, but that's the idea > =A0 -- =A0 maybe turn it into a function for easier atom-to-object > conversion > > procedure Move_Next(sstream: in out Sexp_Stream); > =A0 -- when the client is finished with the current node, it calls this > =A0 -- procedure to update stream internal state to reflect the next > node in > =A0 -- list > > procedure Move_Lower(sstream: in out Sexp_Stream); > =A0 -- raises an exception when Current_Node_Type is not S_List > =A0 -- update the internal state to reflect the first child of the > current list > > procedure Move_Upper(sstream: in out Sexp_Stream); > =A0 -- update the internal state to reflect the node following the list > =A0 -- containing the current node. sortof "uncle" node > =A0 -- the implementation is roughly skipping whatever nodes exist in > the > =A0 -- current list until reading its ')' and reading the following node > > Such an interface support data streams, i.e. reading new data without > seek or pushback. The data would probably be read octet-by-octet > (unless reading a verbatim encoded atom), hoping for an efficient > buffering in the underlying stream. If that's a problem I guess some > buffering can be transparently implemented in the private part of > Sexp_Stream. > > Of course atoms are to be kept in memory, which means Sexp_Stream will > have to contain a dynamic array (Vector in Ada terminology, right?) > populated from the underlying stream until the atom is completely > read. Get_Atom would hand over a (static) array built from the private > dynamic array. > > The implementation would for example rely on a procedure to advance > the underlying to the next node (i.e. skipping white space), and from > the first character know whether it's the beginning of a new list > (when '(' is encountered) and update the state to S_List, and it's > finished; whether it's the end of the list (when ')' is encountered) > and update the state to S_None; or whether it's the beginning of an > atom, so read it completely, updating the internal dynamic array and > setting the state to S_Atom. > > Well actually, it's probably not the best idea, I'm not yet clear > enough on the specifications and on stream I/O to clearly think about > implementation, but that should be enough to make you understand the > idea of S-expression reading without S-expression objects. > > Now regarding the actual use of this interface, I think my previous > writing example is enough, so here is the reading part. > > My usual way of reading S-expression configuration file is to read > sequentially, one at a time, a list whose first element is an atom. > Atoms, empty lists and lists beginning with a list are considered as > comments and silently dropped. "(tcp-connect =85)" is meant to be one of > these, processed by TCP_Info's client, which will hand over only the > "=85" part to TCP_Info.Read (or other initialization subprogram). So > just like TCP_Info's client processes something like "(what-ever- > config =85) (tcp-connect =85) (other-config =85)", TCP_Info.Read will > process only "(host foo.example) (port 80)". > > So TCP_Info's client, after having read "tcp-connect" atom, will call > Move_Next on the Sexp_Stream and pass it to TCP_Info.Read. Then > TCP_Info.Read proceeds with: > > loop > =A0 case Current_Atom_Type(sstream) is > =A0 =A0 =A0when S_None =3D> return; =A0-- TCP_Info configuration is over > =A0 =A0 =A0when S_Atom =3D> null; =A0 =A0-- silent atom drop > =A0 =A0 =A0when S_List =3D> > =A0 =A0 =A0 =A0Move_Lower(sstream); > =A0 =A0 =A0 =A0Get_Atom(sstream, atom); > =A0 =A0 =A0 =A0-- make command of type String from atom > =A0 =A0 =A0 =A0-- if Get_Atom was successful > =A0 =A0 =A0 =A0Move_Next(sstream); > =A0 =A0 =A0 =A0if command =3D "host" then > =A0 =A0 =A0 =A0 =A0-- Get_Atom and turn it into host string > =A0 =A0 =A0 =A0elif command =3D "port" then > =A0 =A0 =A0 =A0 =A0-- Get_Atom and turn it into port number > =A0 =A0 =A0 =A0else > =A0 =A0 =A0 =A0 =A0-- complain about unrecognized command > =A0 =A0 =A0 =A0end if; > =A0 =A0 =A0 =A0Move_Upper(sstream); > =A0 end case; > end loop; > > TCP_Info's client S-expression parsing would be very similar, except > if command =3D =85 would be followed by a call to TCP_Info.Read rather > than a Get_Atom. > > So where are the problems with my Sexp_Stream without in memory > object? What am I missing? The "problem" is that, without admitting it, you have reintroduced a full S-Expression parser. Most of it is hidden in the Sexp_Stream implementation, but it has to be there. Otherwise, how can the Move_Lower, Advance, and Move_Upper operations work, keeping track of how many levels deep you are at all times? Note also that your TCP_Info.Read looks quite similar to mine, except that mine takes an S-Expression as the input, rather than a stream. Afterwards, it traverses the S-Expression using pretty much the same algorithm as yours. The S-Expression itself comes from the stream. So, the only difference between your concept and my implementation is that I expose the S-Expression memory tree and you don't. The reason why I prefer to expose the S-Expression is because, in the general (arbitrarily complex) case, you cannot traverse an S- Expresssion linearly; you need to traverse it as what it really is, a tree. A stream suggests linear traversal only. [...] >> TCP_Info : constant String :=3D "(tcp-connect (host foo.bar) (port 80))"= ; >> TCP_Info_Structured :=3D constant To_TCP_Info (To_Sexp (TCP_Info)); > > That's an interesting idea, which conceptually boils down to serialize > by hand the S-expression into a String, in order to unserialize it as > an in-memory object, in order to serialize back into a Stream. > > Proofreading my post, the above might sound sarcastic though actually > it is not. It's a twist I haven't thought of, but it might indeed turn > out to be the simplest practical way of doing it. Right. I was not being sarcastic either. The Cons (), To_Atom () and Append () operations are needed only when creating arbitrary and dynamic S-Expressions. For simple cases where most of the expression is hardcoded, the textual representation of the S-Expression is much more compact, readable and maintainable than the Ada procedural representation. In fact, you could also conceivably write something like: TCP_Info_Sexp : S_Expression :=3D To_Sexp ("(tcp-info (host *) (port *))"); and programmatically change the values of the atoms containing the actual data. Once you have the in-memory S-Expression, there is no limit to what you can do with it. You can *change* the S-Expression as you traverse it, deleting, replacing or adding nodes as you wish. You cannot do that with a Sexp_Stream. > Actually for a very long time I used to write S-expressions to file > using only string facilities and a special sx_print_atom() function to > handle escaping of unsafe data. By then I would have handled > TCP_Info.Write with the following C fragment (sorry I don't know yet > how to turn it into Ada, but I'm sure equivalent as simple exists): > > fprintf(outfile, "(tcp-connect\n\t(host "); > sx_print_atom(outfile, host); > fprintf(outfile, ")\n\t(port %d)\n)\n", port); Sure, that can also be done just as easily in Ada. You can write S- Expressions as easily as any blob or string; it is reading them back and understanding their structure that is tricky. -- Ludovic Brenta.