From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00, PP_MIME_FAKE_ASCII_TEXT autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a65bb7bde679ed1d X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII Received: by 10.68.59.229 with SMTP id c5mr318481pbr.6.1322698332419; Wed, 30 Nov 2011 16:12:12 -0800 (PST) MIME-Version: 1.0 Path: lh20ni47590pbb.0!nntp.google.com!news2.google.com!news3.google.com!proxad.net!feeder1-2.proxad.net!weretis.net!feeder4.news.weretis.net!news.tornevall.net!news.jacob-sparre.dk!pnx.dk!jacob-sparre.dk!ada-dk.org!.POSTED!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: Ann: Natools.Chunked_Strings, beta 1 Date: Wed, 30 Nov 2011 18:11:10 -0600 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1322698272 4336 69.95.181.76 (1 Dec 2011 00:11:12 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Thu, 1 Dec 2011 00:11:12 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Xref: news2.google.com comp.lang.ada:14758 Date: 2011-11-30T18:11:10-06:00 List-Id: "Dmitry A. Kazakov" wrote in message news:ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net... > On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duch�ne (Hibou57) wrote: > >> By the way, I feel the original message is based on erroneous assumptions >> about implementations of Ada.Strings.Unbounded. Nothing in the RM >> requires >> an implementations to use a single array for unbounded strings, and on >> the >> opposite, it says `type Unbounded_String is private;`. > > I false assumption is IMO that strings are large and need to be > manipulated > as a whole, e.g. pieces substituted etc. > > In each such case the user should consider: > > 1. maybe the pattern he uses is wrong (e.g. poor parsing techniques > splitting and merging strings physically) > > 2. maybe string is an inappropriate data structure and something like text > buffer should be used instead. > > It is C where everything is char*, not Ada. I don't agree, for a number of reasons: (1) The Trash-Finder spam filter uses an "append-all" pattern to handling text and html filtering (along with a few replacements). That's mainly because it is best to ignore line-breaks in such matching. I could have invented a different data-structure for that use, but it would have just meant more work (especially to recreate the string pattern-matching operations, which are used extensively). (2) I worried about the performance of this code, especially in the "service" version of TF -- but there is no evidence that using unbounded strings and all of that unstructured heap makes any real difference at all, even when TF runs for months between restarts. (3) The only thing that ever took significant time is the actual pattern matching, and a few optimizations to that code eliminated the problem. (Obviously, it helps that I have my own compiler and run-time to tweak, but the optimizations make sense in general, they don't add much code and reduce the runtime a lot in some common circumstances.) To reiterate, premature optimization is the root of all (well, really most) evil. That includes making generalizations about the use of data structures! Spending a lot of time writing some other data structure when a predefined one will do is just plain silly. Randy.