From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a65bb7bde679ed1d X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,UTF8 Received: by 10.205.127.148 with SMTP id ha20mr1047193bkc.6.1322728300207; Thu, 01 Dec 2011 00:31:40 -0800 (PST) Path: y3ni16704bkw.0!nntp.google.com!news2.google.com!goblin3!goblin1!goblin.stu.neva.ru!news.tornevall.net!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Ann: Natools.Chunked_Strings, beta 1 Date: Thu, 1 Dec 2011 09:30:37 +0100 Organization: cbb software GmbH Message-ID: References: <4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org Mime-Version: 1.0 X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news2.google.com comp.lang.ada:14766 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Date: 2011-12-01T09:30:37+01:00 List-Id: On Wed, 30 Nov 2011 18:11:10 -0600, Randy Brukardt wrote: > "Dmitry A. Kazakov" wrote in message > news:ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net... >> On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duché­¥ (Hibou57) wrote: >> >>> By the way, I feel the original message is based on erroneous assumptions >>> about implementations of Ada.Strings.Unbounded. Nothing in the RM >>> requires >>> an implementations to use a single array for unbounded strings, and on >>> the >>> opposite, it says `type Unbounded_String is private;`. >> >> I false assumption is IMO that strings are large and need to be >> manipulated >> as a whole, e.g. pieces substituted etc. >> >> In each such case the user should consider: >> >> 1. maybe the pattern he uses is wrong (e.g. poor parsing techniques >> splitting and merging strings physically) >> >> 2. maybe string is an inappropriate data structure and something like text >> buffer should be used instead. >> >> It is C where everything is char*, not Ada. > > I don't agree, for a number of reasons: > > (1) The Trash-Finder spam filter uses an "append-all" pattern to handling > text and html filtering (along with a few replacements). That's mainly > because it is best to ignore line-breaks in such matching. I could have > invented a different data-structure for that use, but it would have just > meant more work (especially to recreate the string pattern-matching > operations, which are used extensively). See, the pattern matcher should have the "line end" atom. My pattern matcher has it. > (2) I worried about the performance of this code, especially in the > "service" version of TF -- but there is no evidence that using unbounded > strings and all of that unstructured heap makes any real difference at all, > even when TF runs for months between restarts. This is very likely. But my concern was not performance, rather the idea of having long strings. Since long text strings do not exist in "nature" (:-)), nobody should like to have them. > To reiterate, premature optimization is the root of all (well, really most) > evil. Yes > That includes making generalizations about the use of data structures! > Spending a lot of time writing some other data structure when a predefined > one will do is just plain silly. No. Using unspecialized data structures for special purposes is not a mean to prevent premature optimization, it is dirty, in some sense weakly typed, design. A more careful design would possibly bring a better performance, but that is not the primary concern. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de