From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.4 X-Google-Thread: 103376,7b97e385047500eb X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news1.google.com!proxad.net!freenix!enst.fr!melchior!cuivre.fr.eu.org!melchior.frmug.org!not-for-mail From: "Robert C. Leif" Newsgroups: comp.lang.ada Subject: Experiences of XML parser generators for Ada? Date: Sat, 4 Dec 2004 12:37:07 -0800 Organization: Newport Instruments Message-ID: Reply-To: rleif@rleif.com NNTP-Posting-Host: lovelace.ada-france.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: melchior.cuivre.fr.eu.org 1102192672 8237 212.85.156.195 (4 Dec 2004 20:37:52 GMT) X-Complaints-To: usenet@melchior.cuivre.fr.eu.org NNTP-Posting-Date: Sat, 4 Dec 2004 20:37:52 +0000 (UTC) To: Return-Path: X-Authenticated-User: rleif.rleif.com X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Thread-Index: AcTaQPkbgHmmGgxaQpyJ6rO/h8V4vA== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at ada-france.org X-BeenThere: comp.lang.ada@ada-france.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Gateway to the comp.lang.ada Usenet newsgroup" List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Xref: g2news1.google.com comp.lang.ada:6761 Date: 2004-12-04T12:37:07-08:00 The "HUGE overhead. e.g.: 84" is being solved by the creation of "XML Binary Characterization Properties" http://www.w3.org/TR/xbc-properties/. >>From Section 4.3.2 Description " Furthermore, a schema-based encoding of an XML document can achieve a degree of compactness by using prior knowledge about the structure and content of a document. A serialization is schema-based if it uses information from the document's schema to achieve a better degree of compactness. This information could be used later as the document is processed or reconstituted. It is worth pointing out that although not self contained, a schema-based encoding is not inherently lossy given that, in principle, a decoder can reproduce the data model using both the encoding and the schema. Thus, as with other techniques, a schema-based encoding can be lossy or loss-less." If the schema data-types are the same as the Ada data-types, the space required should be approximately the same. The real problem is that the Ada community has not been involved with setting W3C standards. Ada needs a complete set of XML_IO packages including being able to create XHTML Strict. Bob Leif ------- Adrien Plisson wrote: Message: 2 Date: Sat, 04 Dec 2004 00:33:22 +0100 From: Adrien Plisson Subject: Re: Experiences of XML parser generators for Ada? To: comp.lang.ada@ada-france.org Message-ID: <41b0f749$0$25068$ba620e4c@news.skynet.be> Content-Type: text/plain; charset=us-ascii; format=flowed Daniel W wrote: > Thank you for your succinct clarification. More specifically I'm asking for > persons with experience of the parser generator. I actually have XMLBooster > downloaded, but as I said, I'm sort of short on experience.... :-) well, i don't have any experience with parser generator (excepted with lex), but i would like to share my experience: i designed a software composed of 2 parts. all parts were written in a different language, and each part was executing in its own context (think of 2 different computers). i choosed XML as the format for marshaled data accross the communication medium. i first downloaded a standard XML parser (Xerces) and tried it. it was so slow that i could not continue with it. since i was only using a subset a XML (no dtd, no validation, no entity reference, only one encoding), i decided to write my own XML parser and XML generator. i got 70x performance boost. now if i look back, i think it would have been better if i had defined my own protocol and not used XML: - the xml fragment were all generated then parsed by software under my control, no user intervention. so there was no need for something human readable. - i was mostly transmitting numeric values. since xml is a text format, performances were teared down by all the conversions from binary to string and back to binary. - since i was mostly transmitting numeric values, all my text nodes were shorter than the xml element type enclosing those values. this leads to HUGE overhead. e.g.: 84 encoded in Unicode is 84 bytes long, but the value expressed here is only 1 byte long. - the only thing xml allowed me was extensibility at no cost, in a case were i was not really needing it. so here comes my advice: think twice before using xml. xml is a very powerful tool for DYNAMICALLY STRUCTURED HUMAN READABLE TEXT. for everything else, a basic binary protocol with some well defined rules to follow (endianness, size of data) will really be more efficient. plus, a basic binary protocol do not need complicated parsers... here was my experience, i hope you find it useful. -- rien