From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,
	PP_MIME_FAKE_ASCII_TEXT autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a65bb7bde679ed1d
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Received: by 10.68.59.229 with SMTP id c5mr318481pbr.6.1322698332419;
        Wed, 30 Nov 2011 16:12:12 -0800 (PST)
MIME-Version: 1.0
Path: 
 lh20ni47590pbb.0!nntp.google.com!news2.google.com!news3.google.com!proxad.net!feeder1-2.proxad.net!weretis.net!feeder4.news.weretis.net!news.tornevall.net!news.jacob-sparre.dk!pnx.dk!jacob-sparre.dk!ada-dk.org!.POSTED!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: Ann: Natools.Chunked_Strings, beta 1
Date: Wed, 30 Nov 2011 18:11:10 -0600
Organization: Jacob Sparre Andersen Research & Innovation
Message-ID: <jb6gn0$47g$1@munin.nbi.dk>
References: <slrnjd9tpk.1lme.lithiumcat@sigil.instinctive.eu>
 <4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr>
 <op.v5q874xcule2fv@douda-yannick>
 <ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net>
NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net
X-Trace: munin.nbi.dk 1322698272 4336 69.95.181.76 (1 Dec 2011 00:11:12 GMT)
X-Complaints-To: news@jacob-sparre.dk
NNTP-Posting-Date: Thu, 1 Dec 2011 00:11:12 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Xref: news2.google.com comp.lang.ada:14758
Date: 2011-11-30T18:11:10-06:00
List-Id: <comp.lang.ada>

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net...
> On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duch�ne (Hibou57) wrote:
>
>> By the way, I feel the original message is based on erroneous assumptions
>> about implementations of Ada.Strings.Unbounded. Nothing in the RM 
>> requires
>> an implementations to use a single array for unbounded strings, and on 
>> the
>> opposite, it says `type Unbounded_String is private;`.
>
> I false assumption is IMO that strings are large and need to be 
> manipulated
> as a whole, e.g. pieces substituted etc.
>
> In each such case the user should consider:
>
> 1. maybe the pattern he uses is wrong (e.g. poor parsing techniques
> splitting and merging strings physically)
>
> 2. maybe string is an inappropriate data structure and something like text
> buffer should be used instead.
>
> It is C where everything is char*, not Ada.

I don't agree, for a number of reasons:

(1) The Trash-Finder spam filter uses an "append-all" pattern to handling 
text and html filtering (along with a few replacements). That's mainly 
because it is best to ignore line-breaks in such matching. I could have 
invented a different data-structure for that use, but it would have just 
meant more work (especially to recreate the string pattern-matching 
operations, which are used extensively).
(2) I worried about the performance of this code, especially in the 
"service" version of TF -- but there is no evidence that using unbounded 
strings and all of that unstructured heap makes any real difference at all, 
even when TF runs for months between restarts.
(3) The only thing that ever took significant time is the actual pattern 
matching, and a few optimizations to that code eliminated the problem. 
(Obviously, it helps that I have my own compiler and run-time to tweak, but 
the optimizations make sense in general, they don't add much code and reduce 
the runtime a lot in some common circumstances.)

To reiterate, premature optimization is the root of all (well, really most) 
evil. That includes making generalizations about the use of data structures! 
Spending a lot of time writing some other data structure when a predefined 
one will do is just plain silly.

                         Randy.