comp.lang.ada
 help / color / mirror / Atom feed
From: Natasha Kerensikova <lithiumcat@gmail.com>
Subject: Re: Ann: Natools.Chunked_Strings, beta 1
Date: Tue, 29 Nov 2011 16:34:32 +0000 (UTC)
Date: 2011-11-29T16:34:32+00:00	[thread overview]
Message-ID: <slrnjda2cc.1lme.lithiumcat@sigil.instinctive.eu> (raw)
In-Reply-To: 4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr

Hello,

On 2011-11-29, Pascal Obry <pascal@obry.net> wrote:
> Do you have some speed/memory comparison between your Chunked_String and
> GNAT Unbounded_String?

Not yet, but I would love to eventually have it. I'm mostly missing a
proper benchmark protocol, and any input on that point would be
appreciated.

I was thinking of something like a reference text read line-by-line into
one of these, checking time and memory usage after each append, hoping
the overhead won't drown the measures (or measure once every N line
instead). And check dispersion over a million or so trials. Something
like that.



Going for theory instead of experimentation, GNAT Unbounded_String
performs a complete copy whenever it has to grow the storage, but it
grows exponentially (adding an extra 1/32th of the length before the
triggering append).

For the sake of simplicity, let's consider Chunked_String with both
Allocation_Unit and Chunk_Size to 4096 characters (so that I don't have
to deal with a smaller last chunk). Then Chunked_String is obviously
faster below 128KB, since it performs less reallocations that are
cheaper, at the expense of up to 4KB of memory.

Beyond 128KB strings, Unbounded_String starts reallocating less often
than Chunked_String. However each reallocation still copies the whole
string contents, while with these parameters Chunked_String only copies
the chunk reference array, which is 512 times smaller on machines where
an access uses 8 bytes. I cannot find how to compare that "with the
hands", so it seems we will need measures.

Using a smaller Allocation_Unit increases the number of reallocations,
while never saving less than one chunk (4KB here), so it probably does
not make sense to use a value different than Chunk_Size for few large
strings.


Lastly, when dealing with increasingly large strings, I expect
Unbounded_String to break much sooner than Chunked_String, because of
its need for contiguous memory. That is probably difficult to show in a
benchmark, unless it can somehow setup memory fragmentation. I expect it
to make a difference in real-life long-running processes though.



Thanks for your interest,
Natasha



  reply	other threads:[~2011-11-29 16:35 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
2011-11-29 15:37 ` Pascal Obry
2011-11-29 16:34   ` Natasha Kerensikova [this message]
2011-11-29 17:08     ` Georg Bauhaus
2011-11-30  9:51       ` Natasha Kerensikova
2011-11-29 20:25     ` Randy Brukardt
2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
2011-11-30 10:57     ` Dmitry A. Kazakov
2011-12-01  0:11       ` Randy Brukardt
2011-12-01  8:30         ` Dmitry A. Kazakov
2011-12-01 23:26           ` Vinzent Hoefler
2011-12-02  8:27             ` Dmitry A. Kazakov
2011-12-02  9:30               ` Georg Bauhaus
2011-12-02 13:11                 ` Dmitry A. Kazakov
2011-12-02  0:39           ` Randy Brukardt
2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
2011-11-30 13:08     ` Natasha Kerensikova
2011-11-30 19:39       ` Jeffrey Carter
2011-12-01 10:57         ` Natasha Kerensikova
2011-12-01 19:07           ` Jeffrey Carter
2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
2011-12-01 22:49               ` Natasha Kerensikova
2011-12-02 16:16         ` Tero Koskinen
2011-12-02 17:36           ` Adam Beneschan
2011-12-02 18:52             ` Tero Koskinen
2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
2011-12-02 19:07             ` Adam Beneschan
2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
2011-11-30 11:04   ` Natasha Kerensikova
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox