From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a65bb7bde679ed1d X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.35.131 with SMTP id h3mr32310404pbj.1.1322584548509; Tue, 29 Nov 2011 08:35:48 -0800 (PST) Path: lh20ni38302pbb.0!nntp.google.com!news1.google.com!volia.net!news2.volia.net!feed-A.news.volia.net!news.musoftware.de!wum.musoftware.de!feeder.erje.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Natasha Kerensikova Newsgroups: comp.lang.ada Subject: Re: Ann: Natools.Chunked_Strings, beta 1 Date: Tue, 29 Nov 2011 16:34:32 +0000 (UTC) Organization: A noiseless patient Spider Message-ID: References: <4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr> Mime-Version: 1.0 Injection-Date: Tue, 29 Nov 2011 16:34:32 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="Mda950WjNwNLAFOE7yJXQw"; logging-data="20889"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199N/6qgTjBWyTn4HiVqapl" User-Agent: slrn/0.9.9p1 (FreeBSD) Cancel-Lock: sha1:G3BefMCtYEh4xE/6aStxGemHmPw= Xref: news1.google.com comp.lang.ada:19241 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: 2011-11-29T16:34:32+00:00 List-Id: Hello, On 2011-11-29, Pascal Obry wrote: > Do you have some speed/memory comparison between your Chunked_String and > GNAT Unbounded_String? Not yet, but I would love to eventually have it. I'm mostly missing a proper benchmark protocol, and any input on that point would be appreciated. I was thinking of something like a reference text read line-by-line into one of these, checking time and memory usage after each append, hoping the overhead won't drown the measures (or measure once every N line instead). And check dispersion over a million or so trials. Something like that. Going for theory instead of experimentation, GNAT Unbounded_String performs a complete copy whenever it has to grow the storage, but it grows exponentially (adding an extra 1/32th of the length before the triggering append). For the sake of simplicity, let's consider Chunked_String with both Allocation_Unit and Chunk_Size to 4096 characters (so that I don't have to deal with a smaller last chunk). Then Chunked_String is obviously faster below 128KB, since it performs less reallocations that are cheaper, at the expense of up to 4KB of memory. Beyond 128KB strings, Unbounded_String starts reallocating less often than Chunked_String. However each reallocation still copies the whole string contents, while with these parameters Chunked_String only copies the chunk reference array, which is 512 times smaller on machines where an access uses 8 bytes. I cannot find how to compare that "with the hands", so it seems we will need measures. Using a smaller Allocation_Unit increases the number of reallocations, while never saving less than one chunk (4KB here), so it probably does not make sense to use a value different than Chunk_Size for few large strings. Lastly, when dealing with increasingly large strings, I expect Unbounded_String to break much sooner than Chunked_String, because of its need for contiguous memory. That is probably difficult to show in a benchmark, unless it can somehow setup memory fragmentation. I expect it to make a difference in real-life long-running processes though. Thanks for your interest, Natasha