From: Natasha Kerensikova <lithiumcat@gmail.com>
Subject: Re: Ann: Natools.Chunked_Strings, beta 1
Date: Tue, 29 Nov 2011 16:34:32 +0000 (UTC)
Date: 2011-11-29T16:34:32+00:00 [thread overview]
Message-ID: <slrnjda2cc.1lme.lithiumcat@sigil.instinctive.eu> (raw)
In-Reply-To: 4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr
Hello,
On 2011-11-29, Pascal Obry <pascal@obry.net> wrote:
> Do you have some speed/memory comparison between your Chunked_String and
> GNAT Unbounded_String?
Not yet, but I would love to eventually have it. I'm mostly missing a
proper benchmark protocol, and any input on that point would be
appreciated.
I was thinking of something like a reference text read line-by-line into
one of these, checking time and memory usage after each append, hoping
the overhead won't drown the measures (or measure once every N line
instead). And check dispersion over a million or so trials. Something
like that.
Going for theory instead of experimentation, GNAT Unbounded_String
performs a complete copy whenever it has to grow the storage, but it
grows exponentially (adding an extra 1/32th of the length before the
triggering append).
For the sake of simplicity, let's consider Chunked_String with both
Allocation_Unit and Chunk_Size to 4096 characters (so that I don't have
to deal with a smaller last chunk). Then Chunked_String is obviously
faster below 128KB, since it performs less reallocations that are
cheaper, at the expense of up to 4KB of memory.
Beyond 128KB strings, Unbounded_String starts reallocating less often
than Chunked_String. However each reallocation still copies the whole
string contents, while with these parameters Chunked_String only copies
the chunk reference array, which is 512 times smaller on machines where
an access uses 8 bytes. I cannot find how to compare that "with the
hands", so it seems we will need measures.
Using a smaller Allocation_Unit increases the number of reallocations,
while never saving less than one chunk (4KB here), so it probably does
not make sense to use a value different than Chunk_Size for few large
strings.
Lastly, when dealing with increasingly large strings, I expect
Unbounded_String to break much sooner than Chunked_String, because of
its need for contiguous memory. That is probably difficult to show in a
benchmark, unless it can somehow setup memory fragmentation. I expect it
to make a difference in real-life long-running processes though.
Thanks for your interest,
Natasha
next prev parent reply other threads:[~2011-11-29 16:35 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
2011-11-29 15:37 ` Pascal Obry
2011-11-29 16:34 ` Natasha Kerensikova [this message]
2011-11-29 17:08 ` Georg Bauhaus
2011-11-30 9:51 ` Natasha Kerensikova
2011-11-29 20:25 ` Randy Brukardt
2011-11-30 10:44 ` Yannick Duchêne (Hibou57)
2011-11-30 10:39 ` Yannick Duchêne (Hibou57)
2011-11-30 10:57 ` Dmitry A. Kazakov
2011-12-01 0:11 ` Randy Brukardt
2011-12-01 8:30 ` Dmitry A. Kazakov
2011-12-01 23:26 ` Vinzent Hoefler
2011-12-02 8:27 ` Dmitry A. Kazakov
2011-12-02 9:30 ` Georg Bauhaus
2011-12-02 13:11 ` Dmitry A. Kazakov
2011-12-02 0:39 ` Randy Brukardt
2011-12-01 9:02 ` Yannick Duchêne (Hibou57)
2011-11-30 13:08 ` Natasha Kerensikova
2011-11-30 19:39 ` Jeffrey Carter
2011-12-01 10:57 ` Natasha Kerensikova
2011-12-01 19:07 ` Jeffrey Carter
2011-12-01 21:19 ` Yannick Duchêne (Hibou57)
2011-12-01 22:49 ` Natasha Kerensikova
2011-12-02 16:16 ` Tero Koskinen
2011-12-02 17:36 ` Adam Beneschan
2011-12-02 18:52 ` Tero Koskinen
2011-12-02 18:14 ` Yannick Duchêne (Hibou57)
2011-12-02 19:07 ` Adam Beneschan
2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
2011-11-30 11:04 ` Natasha Kerensikova
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox