comp.lang.ada
 help / color / mirror / Atom feed
* Ann: Natools.Chunked_Strings, beta 1
@ 2011-11-29 15:16 Natasha Kerensikova
  2011-11-29 15:37 ` Pascal Obry
  2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
  0 siblings, 2 replies; 30+ messages in thread
From: Natasha Kerensikova @ 2011-11-29 15:16 UTC (permalink / raw)


Hello,

my main project is currently an improved rewrite of a library I wrote in
C, with a (hopefully) better design. In the C version, the speed
bottleneck was in the dynamic buffers where the resulting string is
accumulated. As I hoped to eventually reach the same point in Ada, I
used a String_Accumulator interface in the main project, so that I can
easily switch accumulator implementation, possibly to something more
efficient but more complex.

At first I imagined to way of implementing a fast large string
accumulation mechanism: ropes, and brutal preallocation of massive
amounts of memory. I started writing the latter, as part of my efforts
to teach myself Ada, because it seemed easier.

Chunked_String is a string container that exposes the same interface as
Unbounded_String, but whose implementation is using independant "chunks"
of memory. Thus large strings don't have to fit in a contiguous memory
space, and append operations are much more efficient by having to resize
only the last chunk.

So now I'm publishing this code under ISC licence, hoping it can be
useful to someone and/or trigger useful comments to help me get better
at Ada. The code is available in the Natools repository at:
http://fossil.instinctive.eu/natools/dir?ci=tip

Just like Natools.Getopt_Long, all constructive comments are welcome,
about design, implementation, style, or anything else. (And will
probably also not be acted upon immediately, I'm juggling with more
tasks than I can comfortably schedule, but they were listened to and
appreciated.)


Thanks in advance for your help,
Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
@ 2011-11-29 15:37 ` Pascal Obry
  2011-11-29 16:34   ` Natasha Kerensikova
  2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
  2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
  1 sibling, 2 replies; 30+ messages in thread
From: Pascal Obry @ 2011-11-29 15:37 UTC (permalink / raw)



Natasha,

> Chunked_String is a string container that exposes the same interface as
> Unbounded_String, but whose implementation is using independant "chunks"
> of memory. Thus large strings don't have to fit in a contiguous memory
> space, and append operations are much more efficient by having to resize
> only the last chunk.

Do you have some speed/memory comparison between your Chunked_String and
GNAT Unbounded_String?

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|    http://www.obry.net  -  http://v2p.fr.eu.org
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver keys.gnupg.net --recv-key F949BD3B




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 15:37 ` Pascal Obry
@ 2011-11-29 16:34   ` Natasha Kerensikova
  2011-11-29 17:08     ` Georg Bauhaus
                       ` (2 more replies)
  2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
  1 sibling, 3 replies; 30+ messages in thread
From: Natasha Kerensikova @ 2011-11-29 16:34 UTC (permalink / raw)


Hello,

On 2011-11-29, Pascal Obry <pascal@obry.net> wrote:
> Do you have some speed/memory comparison between your Chunked_String and
> GNAT Unbounded_String?

Not yet, but I would love to eventually have it. I'm mostly missing a
proper benchmark protocol, and any input on that point would be
appreciated.

I was thinking of something like a reference text read line-by-line into
one of these, checking time and memory usage after each append, hoping
the overhead won't drown the measures (or measure once every N line
instead). And check dispersion over a million or so trials. Something
like that.



Going for theory instead of experimentation, GNAT Unbounded_String
performs a complete copy whenever it has to grow the storage, but it
grows exponentially (adding an extra 1/32th of the length before the
triggering append).

For the sake of simplicity, let's consider Chunked_String with both
Allocation_Unit and Chunk_Size to 4096 characters (so that I don't have
to deal with a smaller last chunk). Then Chunked_String is obviously
faster below 128KB, since it performs less reallocations that are
cheaper, at the expense of up to 4KB of memory.

Beyond 128KB strings, Unbounded_String starts reallocating less often
than Chunked_String. However each reallocation still copies the whole
string contents, while with these parameters Chunked_String only copies
the chunk reference array, which is 512 times smaller on machines where
an access uses 8 bytes. I cannot find how to compare that "with the
hands", so it seems we will need measures.

Using a smaller Allocation_Unit increases the number of reallocations,
while never saving less than one chunk (4KB here), so it probably does
not make sense to use a value different than Chunk_Size for few large
strings.


Lastly, when dealing with increasingly large strings, I expect
Unbounded_String to break much sooner than Chunked_String, because of
its need for contiguous memory. That is probably difficult to show in a
benchmark, unless it can somehow setup memory fragmentation. I expect it
to make a difference in real-life long-running processes though.



Thanks for your interest,
Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 16:34   ` Natasha Kerensikova
@ 2011-11-29 17:08     ` Georg Bauhaus
  2011-11-30  9:51       ` Natasha Kerensikova
  2011-11-29 20:25     ` Randy Brukardt
  2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
  2 siblings, 1 reply; 30+ messages in thread
From: Georg Bauhaus @ 2011-11-29 17:08 UTC (permalink / raw)


On 29.11.11 17:34, Natasha Kerensikova wrote:
> Hello,
> 
> On 2011-11-29, Pascal Obry <pascal@obry.net> wrote:
>> Do you have some speed/memory comparison between your Chunked_String and
>> GNAT Unbounded_String?
> 
> Not yet, but I would love to eventually have it. I'm mostly missing a
> proper benchmark protocol, and any input on that point would be
> appreciated.

The package GNAT.Spitbol.Patterns uses Unbounded_String extensively.
Last time I checked it seemed possible to replace the dependence
on Ada.Strings.Unbounded with one on a compatible package.
The very first definition in GNAT.Spitbol is

   subtype VString is Ada.Strings.Unbounded.Unbounded_String;

It should be sufficient to change this definition and

   Nul : VString renames Ada.Strings.Unbounded.Null_Unbounded_String;

to use your strings, in local copies of the sources of the
GNAT.Spitbol hierarchy.

Of particular interest might be, I think, anything that can improve
replacement in objects of type Unbounded_String.

One program whose performance depends a lot on replacement
in Unbounded_String is
http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=gnat&id=1

The bencher software they use to measure relative performance of the
programs is fairly easy to install on GNU/Linux. The "read-only"
part of the above program is really fast, the replacements, however,
aren't. The critical part is in the function Match:


      -- Perform the regex substitution.  Likely facing
      --
      -- (1a) the GREAT BIG subtstitution problem
      --      (cf. D.W.E. Blatt, 1980)
      -- (1b) replacements in Unbounded_String which
      --      the pattern matching implementation is using
      while I <= Limit loop
         for C in Codes_Index loop
            while
               Match(Subject => Sequence_Lines(I),
                     Pat => Codes(C).Code,
                     Replace => Codes(C).Alternatives)
            loop
               null;
            end loop;
         end loop;
         I := I + Step;
      end loop;

Maybe you find some idea in there to be useful for testing.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 16:34   ` Natasha Kerensikova
  2011-11-29 17:08     ` Georg Bauhaus
@ 2011-11-29 20:25     ` Randy Brukardt
  2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
  2 siblings, 0 replies; 30+ messages in thread
From: Randy Brukardt @ 2011-11-29 20:25 UTC (permalink / raw)


"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrnjda2cc.1lme.lithiumcat@sigil.instinctive.eu...
> Hello,
>
> On 2011-11-29, Pascal Obry <pascal@obry.net> wrote:
>> Do you have some speed/memory comparison between your Chunked_String and
>> GNAT Unbounded_String?
>
> Not yet, but I would love to eventually have it. I'm mostly missing a
> proper benchmark protocol, and any input on that point would be
> appreciated.
>
> I was thinking of something like a reference text read line-by-line into
> one of these, checking time and memory usage after each append, hoping
> the overhead won't drown the measures (or measure once every N line
> instead). And check dispersion over a million or so trials. Something
> like that.

That sounds a lot like what the Trash-Finder spam filter 
(http://www.rrsoftware.com/html/prodinf/tf/tf-main.html) does (other than of 
course it reads a million copies of the same message as opposed to reading 
one copy a million times). It was written using Unbounded_Strings as an 
experiment (which was a failure, IMHO, it would have been better to use a 
discriminated record for this storage, since the unbounded strings are 
components in a record which has to be manually deallocated anyway).

One could imagine using a version of it fed with a static set of e-mail as a 
benchmark. The only problem is that I never published the entire source code 
(and it wasn't tested with GNAT, either).

Tell me if you want to try something with this.

                                Randy.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 17:08     ` Georg Bauhaus
@ 2011-11-30  9:51       ` Natasha Kerensikova
  0 siblings, 0 replies; 30+ messages in thread
From: Natasha Kerensikova @ 2011-11-30  9:51 UTC (permalink / raw)


On 2011-11-29, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> The package GNAT.Spitbol.Patterns uses Unbounded_String extensively.
> Last time I checked it seemed possible to replace the dependence
> on Ada.Strings.Unbounded with one on a compatible package.
> The very first definition in GNAT.Spitbol is
>
>    subtype VString is Ada.Strings.Unbounded.Unbounded_String;
>
> It should be sufficient to change this definition and
>
>    Nul : VString renames Ada.Strings.Unbounded.Null_Unbounded_String;
>
> to use your strings, in local copies of the sources of the
> GNAT.Spitbol hierarchy.

Also To_Unbounded_String has to be replaced by To_Chunked_String, but
this is also conveniently renamed at the same place.

> Of particular interest might be, I think, anything that can improve
> replacement in objects of type Unbounded_String.

It seems we are talking about the transformation of
Prefix & Old & Suffix into Prefix & Replacement & Suffix, with
Old'Length /= Replacement'Length. In that case, Chunked_String is
probably not an improvement over Unbounded_String, since in both cases
the whole Suffix has to be moved, so they should be of similar speed
with a slight advantage for Unbounded_String because the contiguous
memory is more cache-friendly.

However it seems to be the kind of use-cases for which ropes were
designed. So it leads to the question, is there already an Ada
implementation of ropes somewhere?

Assuming there isn't, I wouldn't mind starting one. It seems to be an
interesting toy problem.

When thinking about it, the first tricky part I identified is dealing
with reference-counted object in a thread-safe way (the original paper
assumed a garbage collector). It will probably involve writing tree
algorithms from scratch (to efficiently preserve the rope invariant when
rebalancing or rotating), so I will end up with accesses to reference
counted nodes. I guess using a protected type for nodes will take care
of the thread safety, but I will need to somehow indicate whether I'm
dereferencing the access for reading or for read/write. I might be an
interesting concurrency exercise, but I don't really know how to
properly test it.



Thanks for your interest,
Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
  2011-11-29 15:37 ` Pascal Obry
@ 2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
  2011-11-30 11:04   ` Natasha Kerensikova
  1 sibling, 1 reply; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-30 10:33 UTC (permalink / raw)


Le Tue, 29 Nov 2011 16:16:16 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> Chunked_String is a string container that exposes the same interface as
> Unbounded_String
I don't see the same interface as that of Ada.Strings.Unbounded in  
Natools.Chunked_Strings.

> http://fossil.instinctive.eu/natools/dir?ci=tip
Dirty question: is your Natools.Chunked_Strings interface to be with-used  
or withed ?

If you heard about static polymorphism, and you feel that's a good idea,  
you may face some issues with the actual spec.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 15:37 ` Pascal Obry
  2011-11-29 16:34   ` Natasha Kerensikova
@ 2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
  2011-11-30 10:57     ` Dmitry A. Kazakov
  2011-11-30 13:08     ` Natasha Kerensikova
  1 sibling, 2 replies; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-30 10:39 UTC (permalink / raw)


Le Tue, 29 Nov 2011 16:37:27 +0100, Pascal Obry <pascal@obry.net> a écrit:
> Do you have some speed/memory comparison between your Chunked_String and
> GNAT Unbounded_String?
>
> Pascal.
Probably not relevant as a generality: depends on string lengths.

By the way, I feel the original message is based on erroneous assumptions  
about implementations of Ada.Strings.Unbounded. Nothing in the RM requires  
an implementations to use a single array for unbounded strings, and on the  
opposite, it says `type Unbounded_String is private;`.

At least for that reason, I feel this would be even better to stick to the  
exact same interface as the one of Ada.Strings.Unbounded. If that's  
another implementation of Ada.Strings.Unbounded, then, this should expose  
the same interface. This way, the client side could easily use one or the  
other via static polymorphism.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-29 16:34   ` Natasha Kerensikova
  2011-11-29 17:08     ` Georg Bauhaus
  2011-11-29 20:25     ` Randy Brukardt
@ 2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
  2 siblings, 0 replies; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-11-30 10:44 UTC (permalink / raw)


Le Tue, 29 Nov 2011 17:34:32 +0100, Natasha Kerensikova  
<lithiumcat@gmail.com> a écrit:
> I was thinking of something like a reference text read line-by-line
I believe there cannot be anything like a reference text or a reference  
use case with string processing.


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
@ 2011-11-30 10:57     ` Dmitry A. Kazakov
  2011-12-01  0:11       ` Randy Brukardt
  2011-11-30 13:08     ` Natasha Kerensikova
  1 sibling, 1 reply; 30+ messages in thread
From: Dmitry A. Kazakov @ 2011-11-30 10:57 UTC (permalink / raw)


On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duch�ne (Hibou57) wrote:

> By the way, I feel the original message is based on erroneous assumptions  
> about implementations of Ada.Strings.Unbounded. Nothing in the RM requires  
> an implementations to use a single array for unbounded strings, and on the  
> opposite, it says `type Unbounded_String is private;`.

I false assumption is IMO that strings are large and need to be manipulated
as a whole, e.g. pieces substituted etc.

In each such case the user should consider:

1. maybe the pattern he uses is wrong (e.g. poor parsing techniques
splitting and merging strings physically)

2. maybe string is an inappropriate data structure and something like text
buffer should be used instead.

It is C where everything is char*, not Ada.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
@ 2011-11-30 11:04   ` Natasha Kerensikova
  0 siblings, 0 replies; 30+ messages in thread
From: Natasha Kerensikova @ 2011-11-30 11:04 UTC (permalink / raw)


Hello,

On 2011-11-30, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> Le Tue, 29 Nov 2011 16:16:16 +0100, Natasha Kerensikova  
><lithiumcat@gmail.com> a écrit:
>> Chunked_String is a string container that exposes the same interface as
>> Unbounded_String
> I don't see the same interface as that of Ada.Strings.Unbounded in  
> Natools.Chunked_Strings.

Really? I did took all the subprogram names, with the same argument
names and order, and the same behavior. I only changed the type from
Unbounded_String to Chunked_String (obvisouly), and when a new
Chunked_String is created, I added Allocation_Unit and Chunk_Size with
default values, so that old calls to Ada.Strings.Unbounded subprogram
still work. Do you see something else?

>> http://fossil.instinctive.eu/natools/dir?ci=tip
> Dirty question: is your Natools.Chunked_Strings interface to be with-used  
> or withed ?

Same as Ada.Strings.Unbounded, really. I tend to with-rename, so I have
never tried how it feels when with-used. However I don't see anything
preventing it from being with-used.

> If you heard about static polymorphism, and you feel that's a good idea,  
> you may face some issues with the actual spec.

What kind of issues? Or rather, what issues that are not triggered
exactly the same way by Ada.Strings.Unbounded? I genuinely don't see
anything wrong...


Thanks for your help,
Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
  2011-11-30 10:57     ` Dmitry A. Kazakov
@ 2011-11-30 13:08     ` Natasha Kerensikova
  2011-11-30 19:39       ` Jeffrey Carter
  1 sibling, 1 reply; 30+ messages in thread
From: Natasha Kerensikova @ 2011-11-30 13:08 UTC (permalink / raw)


Hello,

On 2011-11-30, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> Le Tue, 29 Nov 2011 16:37:27 +0100, Pascal Obry <pascal@obry.net> a écrit:
>> Do you have some speed/memory comparison between your Chunked_String and
>> GNAT Unbounded_String?
> Probably not relevant as a generality: depends on string lengths.

Of course, that's why the x-axis of a benchmark graph has to be string
length.

> By the way, I feel the original message is based on erroneous assumptions  
> about implementations of Ada.Strings.Unbounded. Nothing in the RM requires  
> an implementations to use a single array for unbounded strings, and on the  
> opposite, it says `type Unbounded_String is private;`.

That is not true. The only assumption was that in the specific case of
building a somewhat large string through repeated append, an
implementation optimized for that use case would perform better than
Ada.Strings.Unbounded, which is presumably all-purpose. Hence
String_Accumulator interface.

Then it is a fact that GNAT implementation of Ada.Strings.Unbounded is a
single contiguous array that is dynamically resized, and each resize
costs a full copy of the contents (i.e. there is equivalent to C's
realloc() that may resize an array without copy.

Combined with the fact I use GNAT is enough (for me) to justify writing
Natools.Chunked_Strings.

> At least for that reason, I feel this would be even better to stick to the  
> exact same interface as the one of Ada.Strings.Unbounded. If that's  
> another implementation of Ada.Strings.Unbounded, then, this should expose  
> the same interface. This way, the client side could easily use one or the  
> other via static polymorphism.

And what exactly is preventing the client side from doing exactly that?
That's what I did in the client packages, through changes similar of
what was described about GNAT.Spitbol.


Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 13:08     ` Natasha Kerensikova
@ 2011-11-30 19:39       ` Jeffrey Carter
  2011-12-01 10:57         ` Natasha Kerensikova
  2011-12-02 16:16         ` Tero Koskinen
  0 siblings, 2 replies; 30+ messages in thread
From: Jeffrey Carter @ 2011-11-30 19:39 UTC (permalink / raw)


On 11/30/2011 06:08 AM, Natasha Kerensikova wrote:
>
> [An implementation optimized for appending should have better performance for
> that use and GNAT's Unbounded_String uses a contiguous array] Combined with
> the fact I use GNAT is enough (for me) to justify writing
> Natools.Chunked_Strings.

Justifications for rewriting Unbounded_Strings are

1. For fun
2. As a learning experience
3. You have timing requirements you can't otherwise meet.

3. does not appear to be the case here; you're planning to write something, but
you have yet to demonstrate that Unbounded_Strings isn't suitable for its timing 
requirements. So you're doing it for 1. or 2. Doing it for 3. when you haven't 
demonstrated 3. is premature optimization, AKA the root of all evil.

Doing it for 1. or 2. is fine, but means that comments about efficiency or 
performance are irrelevant.

FWIW, we have a large, soft-real-time system that makes extensive use of 
Unbounded_Strings, and have no problem meeting our timing requirements.

-- 
Jeff Carter
"My dear Mrs. Hemoglobin, when I first saw you, I
was so enamored with your beauty I ran to the basket,
jumped in, went down to the city, and bought myself a
wedding outfit."
Never Give a Sucker an Even Break
111



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 10:57     ` Dmitry A. Kazakov
@ 2011-12-01  0:11       ` Randy Brukardt
  2011-12-01  8:30         ` Dmitry A. Kazakov
  2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
  0 siblings, 2 replies; 30+ messages in thread
From: Randy Brukardt @ 2011-12-01  0:11 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net...
> On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duch�ne (Hibou57) wrote:
>
>> By the way, I feel the original message is based on erroneous assumptions
>> about implementations of Ada.Strings.Unbounded. Nothing in the RM 
>> requires
>> an implementations to use a single array for unbounded strings, and on 
>> the
>> opposite, it says `type Unbounded_String is private;`.
>
> I false assumption is IMO that strings are large and need to be 
> manipulated
> as a whole, e.g. pieces substituted etc.
>
> In each such case the user should consider:
>
> 1. maybe the pattern he uses is wrong (e.g. poor parsing techniques
> splitting and merging strings physically)
>
> 2. maybe string is an inappropriate data structure and something like text
> buffer should be used instead.
>
> It is C where everything is char*, not Ada.

I don't agree, for a number of reasons:

(1) The Trash-Finder spam filter uses an "append-all" pattern to handling 
text and html filtering (along with a few replacements). That's mainly 
because it is best to ignore line-breaks in such matching. I could have 
invented a different data-structure for that use, but it would have just 
meant more work (especially to recreate the string pattern-matching 
operations, which are used extensively).
(2) I worried about the performance of this code, especially in the 
"service" version of TF -- but there is no evidence that using unbounded 
strings and all of that unstructured heap makes any real difference at all, 
even when TF runs for months between restarts.
(3) The only thing that ever took significant time is the actual pattern 
matching, and a few optimizations to that code eliminated the problem. 
(Obviously, it helps that I have my own compiler and run-time to tweak, but 
the optimizations make sense in general, they don't add much code and reduce 
the runtime a lot in some common circumstances.)

To reiterate, premature optimization is the root of all (well, really most) 
evil. That includes making generalizations about the use of data structures! 
Spending a lot of time writing some other data structure when a predefined 
one will do is just plain silly.

                         Randy.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01  0:11       ` Randy Brukardt
@ 2011-12-01  8:30         ` Dmitry A. Kazakov
  2011-12-01 23:26           ` Vinzent Hoefler
  2011-12-02  0:39           ` Randy Brukardt
  2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
  1 sibling, 2 replies; 30+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-01  8:30 UTC (permalink / raw)


On Wed, 30 Nov 2011 18:11:10 -0600, Randy Brukardt wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
> news:ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net...
>> On Wed, 30 Nov 2011 11:39:30 +0100, Yannick Duch魥 (Hibou57) wrote:
>>
>>> By the way, I feel the original message is based on erroneous assumptions
>>> about implementations of Ada.Strings.Unbounded. Nothing in the RM 
>>> requires
>>> an implementations to use a single array for unbounded strings, and on 
>>> the
>>> opposite, it says `type Unbounded_String is private;`.
>>
>> I false assumption is IMO that strings are large and need to be 
>> manipulated
>> as a whole, e.g. pieces substituted etc.
>>
>> In each such case the user should consider:
>>
>> 1. maybe the pattern he uses is wrong (e.g. poor parsing techniques
>> splitting and merging strings physically)
>>
>> 2. maybe string is an inappropriate data structure and something like text
>> buffer should be used instead.
>>
>> It is C where everything is char*, not Ada.
> 
> I don't agree, for a number of reasons:
> 
> (1) The Trash-Finder spam filter uses an "append-all" pattern to handling 
> text and html filtering (along with a few replacements). That's mainly 
> because it is best to ignore line-breaks in such matching. I could have 
> invented a different data-structure for that use, but it would have just 
> meant more work (especially to recreate the string pattern-matching 
> operations, which are used extensively).

See, the pattern matcher should have the "line end" atom. My pattern
matcher has it.

> (2) I worried about the performance of this code, especially in the 
> "service" version of TF -- but there is no evidence that using unbounded 
> strings and all of that unstructured heap makes any real difference at all, 
> even when TF runs for months between restarts.

This is very likely. But my concern was not performance, rather the idea of
having long strings. Since long text strings do not exist in "nature"
(:-)), nobody should like to have them.

> To reiterate, premature optimization is the root of all (well, really most) 
> evil.

Yes

> That includes making generalizations about the use of data structures! 
> Spending a lot of time writing some other data structure when a predefined 
> one will do is just plain silly.

No. Using unspecialized data structures for special purposes is not a mean
to prevent premature optimization, it is dirty, in some sense weakly typed,
design. A more careful design would possibly bring a better performance,
but that is not the primary concern.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01  0:11       ` Randy Brukardt
  2011-12-01  8:30         ` Dmitry A. Kazakov
@ 2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
  1 sibling, 0 replies; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-12-01  9:02 UTC (permalink / raw)


Le Thu, 01 Dec 2011 01:11:10 +0100, Randy Brukardt <randy@rrsoftware.com>  
a écrit:
>> It is C where everything is char*, not Ada.
>
> I don't agree, for a number of reasons:
There is also the simple case of a character stream, which is not from a  
file and whose content is not static.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 19:39       ` Jeffrey Carter
@ 2011-12-01 10:57         ` Natasha Kerensikova
  2011-12-01 19:07           ` Jeffrey Carter
  2011-12-02 16:16         ` Tero Koskinen
  1 sibling, 1 reply; 30+ messages in thread
From: Natasha Kerensikova @ 2011-12-01 10:57 UTC (permalink / raw)


Hello,

On 2011-11-30, Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> wrote:
> On 11/30/2011 06:08 AM, Natasha Kerensikova wrote:
>>
>> [An implementation optimized for appending should have better performance for
>> that use and GNAT's Unbounded_String uses a contiguous array] Combined with
>> the fact I use GNAT is enough (for me) to justify writing
>> Natools.Chunked_Strings.
>
> Justifications for rewriting Unbounded_Strings are
>
> 1. For fun
> 2. As a learning experience
> 3. You have timing requirements you can't otherwise meet.

1. and 2. were pre-conditions here. Among all the mini-projects that
meet 1. and 2., I used 3. or suspicion of 3. to choose. That's how it is
relevant.

> 3. does not appear to be the case here; you're planning to write
> something, but you have yet to demonstrate that Unbounded_Strings
> isn't suitable for its timing requirements.

I have serious reasons to believe that 3. will probably happen. While
the Ada project with requirements is not yet complete (though already
written, for the most part), it is very similar to a C project I already
wrote. And for that C project, the repeated dynamic array resizes
triggered by repeated string append are demonstrated to be the
performance bottleneck, and therefore the next thing to work on should
performance be improved.

I agree that it might turn out that the Ada project has other
performance issues. But saying that 3. will *never* be a problem is
saying that either there is a point where performance is "good enough"
and that this point is reached sooner than Unbounded_String performance
issue, or that there is some fundamental unavoidable performance hit in
Ada that does not exist in C.

It stills seems to me fair to assume the latter is wrong, and the former
is a suspiciously strong assumption on my project compared to what I
disclosed about it.

> So you're doing it for 1.
> or 2. Doing it for 3. when you haven't demonstrated 3. is premature
> optimization, AKA the root of all evil.

I don't need a hard demonstration with numbers and benchmarks to know
the weak points of my designs. Especially when I do have them for a very
similar design. And unless there is a "good enough" line or an expected
performance flaw in Ada, I'm certain that Unbounded_String append
performance *will* be an issue.

Moreover, I strongly doubt any evil can come from these performance
considerations. A mere package cannot do any evil. The premature
optimization that is actually at the root of any evil is when
unwarranted performance consideration creep up to the design phase. As I
said above, my considerations are far from being unwarranted, and the
only design choice that could be qualified as premature optimization is
the use of String_Accumulator interface (rather than naive direct calls
to Ada.String.Unbounded subprograms). I would be happy to hear arguments
for or against that choice.

> FWIW, we have a large, soft-real-time system that makes extensive use of 
> Unbounded_Strings, and have no problem meeting our timing requirements.

Then I guess you are lucky enough to have a "good enough" line (and to
be on the good side of it).


Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01 10:57         ` Natasha Kerensikova
@ 2011-12-01 19:07           ` Jeffrey Carter
  2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 30+ messages in thread
From: Jeffrey Carter @ 2011-12-01 19:07 UTC (permalink / raw)


On 12/01/2011 03:57 AM, Natasha Kerensikova wrote:
>
> I have serious reasons to believe that 3. will probably happen. While
> the Ada project with requirements is not yet complete (though already
> written, for the most part), it is very similar to a C project I already
> wrote. And for that C project, the repeated dynamic array resizes
> triggered by repeated string append are demonstrated to be the
> performance bottleneck, and therefore the next thing to work on should
> performance be improved.

First let me say that for fun or as a learning experience (1. and 2. from the 
previous post) are perfectly valid reasons to be doing this, and no further 
justification is needed. Doing it when 1. and 2. do not apply, that is, when the 
sole reason for writing the software is to get something done (such as a SW 
development job), is when inability to meet timing requirements (3.) comes into 
play. If you can meet your timing requirements with off-the-shelf components, 
then unnecessarily creating an optimized component delays completion of the 
project and increases the cost.

So all my following discussion will only apply to cases where 1. or 2. do not 
apply (which is to say, not to your project).

Literature and experience are filled with examples where "serious reasons to 
believe" turned out to be wrong, which is how the adage about premature 
optimization came to be.

(You might be interested in "Ada Outperforms Assembly", a classic case where 
what experts "knew" about SW performance turned out to be wrong:

http://www.seas.gwu.edu/~adagroup/sigada-website/lawlis.html)

Finally, a performance bottleneck is not necessarily a bad thing. It is a 
performance bottleneck that prevents meeting timing requirements that justifies 
expending effort on optimization.

> I agree that it might turn out that the Ada project has other
> performance issues. But saying that 3. will *never* be a problem is
> saying that either there is a point where performance is "good enough"
> and that this point is reached sooner than Unbounded_String performance
> issue, or that there is some fundamental unavoidable performance hit in
> Ada that does not exist in C.

I'm not saying that 3. will never occur. I'm saying that until you've done it 
without the optimization, measured it, and shown that it fails to meet timing 
requirements, you don't know that 3. applies and are not (yet) justified in 
optimizing.

Clearly there is a point where performance is good enough: the point where 
timing requirements are met. Effort expended on improving performance once the 
timing requirements are met is wasted effort.

> Moreover, I strongly doubt any evil can come from these performance
> considerations. A mere package cannot do any evil. The premature
> optimization that is actually at the root of any evil is when
> unwarranted performance consideration creep up to the design phase. As I
> said above, my considerations are far from being unwarranted, and the
> only design choice that could be qualified as premature optimization is
> the use of String_Accumulator interface (rather than naive direct calls
> to Ada.String.Unbounded subprograms). I would be happy to hear arguments
> for or against that choice.

The evil of premature optimization is the waste of effort when the optimization 
is unnecessary.

>> FWIW, we have a large, soft-real-time system that makes extensive use of
>> Unbounded_Strings, and have no problem meeting our timing requirements.
>
> Then I guess you are lucky enough to have a "good enough" line (and to
> be on the good side of it).

We have timing requirements that our SW meets. I'm not saying that we have never 
needed to take steps to improve the performance of the SW, just that the use of 
Unbounded_Strings has never been the reason for not meeting our requirements.

-- 
Jeff Carter
"I wave my private parts at your aunties."
Monty Python & the Holy Grail
13



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01 19:07           ` Jeffrey Carter
@ 2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
  2011-12-01 22:49               ` Natasha Kerensikova
  0 siblings, 1 reply; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-12-01 21:19 UTC (permalink / raw)


Le Thu, 01 Dec 2011 20:07:09 +0100, Jeffrey Carter  
<spam.jrcarter.not@spam.not.acm.org> a écrit:
> The evil of premature optimization is the waste of effort when the  
> optimization is unnecessary.
Or also when optimization is at the cost of clarity (optimized  
implementation is often less easy to understand and more error prone, as  
it involves special cases everywhere).

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
@ 2011-12-01 22:49               ` Natasha Kerensikova
  0 siblings, 0 replies; 30+ messages in thread
From: Natasha Kerensikova @ 2011-12-01 22:49 UTC (permalink / raw)


On 2011-12-01, Yannick Duchêne <yannick_duchene@yahoo.fr> wrote:
> Le Thu, 01 Dec 2011 20:07:09 +0100, Jeffrey Carter  
><spam.jrcarter.not@spam.not.acm.org> a écrit:
>> The evil of premature optimization is the waste of effort when the  
>> optimization is unnecessary.
> Or also when optimization is at the cost of clarity (optimized  
> implementation is often less easy to understand and more error prone, as  
> it involves special cases everywhere).

That's what also I understood as "evil": when a trade-off between
optimization and clarity or simplicity or cleanness is shifted early
towards optimization. I don't consider a waste of time or effort as
"evil", only "bad" or "stupid". To reach the level of "evil" to my eyes,
it needs to be still costly after being unmasked as evil.

In the example here, in the case where I have a genuine performance
problem with Unbounded_String, and where Chunked_String turns out to be
as efficient, or slightly, that's exactly the same waste of time and
effort, without premature optimization or any evil.

Other the other hand, using an interface like String_Accumulator instead
of static calls, if it were only for performance consideration, and if
it turns out that the dynamic dispatch is doing irreparable damage to
performance, to the point of having to rewrite the thing with static
calls, then it would be evil because of the rewriting work needed repair
the code.

At least that's how I've always understood it. Re-reading the context of
Knuth's quote, "these attempts at efficiency actually have a strong
negative impact when debugging and maintenance are considered." The mere
existence of an optimized version of an existing module, that can be
plugged effortlessly into or out of the project, has no debugging or
maintenance impact at all.


Natasha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01  8:30         ` Dmitry A. Kazakov
@ 2011-12-01 23:26           ` Vinzent Hoefler
  2011-12-02  8:27             ` Dmitry A. Kazakov
  2011-12-02  0:39           ` Randy Brukardt
  1 sibling, 1 reply; 30+ messages in thread
From: Vinzent Hoefler @ 2011-12-01 23:26 UTC (permalink / raw)


Dmitry A. Kazakov wrote:

> This is very likely. But my concern was not performance, rather the idea of
> having long strings. Since long text strings do not exist in "nature"
> (:-)), nobody should like to have them.

Hmm. What's the average length of a DNA-string? ;)


Vinzent.

-- 
f u cn rd ths, u cn gt a gd jb n cmptr prgrmmng.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01  8:30         ` Dmitry A. Kazakov
  2011-12-01 23:26           ` Vinzent Hoefler
@ 2011-12-02  0:39           ` Randy Brukardt
  1 sibling, 0 replies; 30+ messages in thread
From: Randy Brukardt @ 2011-12-02  0:39 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:de6vkicxgv4x$.1c89iragml7xf$.dlg@40tude.net...
> On Wed, 30 Nov 2011 18:11:10 -0600, Randy Brukardt wrote:
...
>> (1) The Trash-Finder spam filter uses an "append-all" pattern to handling
>> text and html filtering (along with a few replacements). That's mainly
>> because it is best to ignore line-breaks in such matching. I could have
>> invented a different data-structure for that use, but it would have just
>> meant more work (especially to recreate the string pattern-matching
>> operations, which are used extensively).
>
> See, the pattern matcher should have the "line end" atom. My pattern
> matcher has it.

The "pattern matcher" I'm talking about is the one built-into the 
Ada.Strings packages. I'm not going to add anything to that.

Let me reiterate that the desigm of this spam filter was specifically 
intended to use as much as possible from the pre-defined string packages as 
possible. I wanted to learn about using them more (I had not done that in 
the first 8 or so years of their existence) and possibly build an example of 
good use of these strings. Your attitude seems to be that you should 
roll-your-own (my first tendency as well), and that clearly is not the best 
idea for many software projects (as you spend a lot of time rolling that 
would be better spent on the application).

With this in mind, I chose a representation for mail messages as a linked 
list of unbounded strings, one per line. (Line endings are significant for 
many of the header elements in MIME and SMTP, so I need to preserve those 
initially.) [In hindsight, I would have dumped the unbounded strings and 
made the linked list use discriminated records instead -- the memory 
management work would be the same and direct access to slices would be 
convinient for some operations. But I was committed to using unbounded 
strings in order to gain experience. Maybe a different alternative would be 
to declare the linked lists using the containers package, which did not yet 
exist at the time I created this application, then the unbounded strings 
would make more sense.]

Pattern matching of the body (only) of the message occurs on "normalized" 
text that has encodings, extra spaces, and line endings removed. And if the 
body is HTML, the body is split into text and markup portions (with each 
having a separate set of filters applied). This I did in large single 
unbounded strings, because it was a better match for the (simple) matching 
facilities offered in the Ada.Strings packages. Making a copy of the text is 
necessary in any case (since we don't want to modify the original for lots 
of reasons), and putting the result into the most convinient format. 
Declaring some other array of characters type to hold this text would be 
silly -- it surely is a string, and it would require writing pattern 
matching code from scratch (or using lots of type conversions). What would 
be the point?

I understand your comment about "weak typing", but it doesn't make sense 
here -- the entire task is weakly typed. There is hardly anything you can 
assume about incoming messages, because spammers look to exploit such 
assumptions to evade your filters. So you have to verify everything from a 
"blob" format, and once you have done so, it makes little sense to spend a 
lot of extra effort changing the type of everything. I put the strong typing 
into the data structures that store the determined structure of the message, 
for instance (it points at the various lines that start and end each 
section.

                                   Randy. 





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-01 23:26           ` Vinzent Hoefler
@ 2011-12-02  8:27             ` Dmitry A. Kazakov
  2011-12-02  9:30               ` Georg Bauhaus
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-02  8:27 UTC (permalink / raw)


On Fri, 02 Dec 2011 00:26:29 +0100, Vinzent Hoefler wrote:

> Dmitry A. Kazakov wrote:
> 
>> This is very likely. But my concern was not performance, rather the idea of
>> having long strings. Since long text strings do not exist in "nature"
>> (:-)), nobody should like to have them.
> 
> Hmm. What's the average length of a DNA-string? ;)

Yes, this is what I had in mind when specifically indicated strings as
*text* ones.

DNA chain is not a text string. Furthermore it would likely have some
specific operations and a representation tailored substring search.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02  8:27             ` Dmitry A. Kazakov
@ 2011-12-02  9:30               ` Georg Bauhaus
  2011-12-02 13:11                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 30+ messages in thread
From: Georg Bauhaus @ 2011-12-02  9:30 UTC (permalink / raw)


On 02.12.11 09:27, Dmitry A. Kazakov wrote:
> On Fri, 02 Dec 2011 00:26:29 +0100, Vinzent Hoefler wrote:
>
>> Dmitry A. Kazakov wrote:
>>
>>> This is very likely. But my concern was not performance, rather the idea of
>>> having long strings. Since long text strings do not exist in "nature"
>>> (:-)), nobody should like to have them.
>>
>> Hmm. What's the average length of a DNA-string? ;)
>
> Yes, this is what I had in mind when specifically indicated strings as
> *text* ones.
>
> DNA chain is not a text string. Furthermore it would likely have some
> specific operations and a representation tailored substring search.

I have tried this once.  Sequence information is given and is using just
a handful of characters. I mapped those to some 4bit type, even tried less.
Added lots of purportedly smart unchecked conversions, some shifting,
made my head spin by thinking about what combinations of "characters"
might suggest there could be clever additions, not shifts and the like
for obtaining info about substrings or single "characters", noted that
addition is faster than shifting or logical operations on the processor,
etc. Tried specializing searching. If there is a solution, it seems tricky.
Perhaps to be found by someone with more than ordinary combinatorial skills.
In my case this effort has produced only minuscule advantages, sometimes
the opposite, but the cost was a large number of specialized subprograms.

Then I stopped and went back to a comparatively stupid subtype of String.

There might still be an algorithm that uses the standard String type
and some fast String search, but on rearranged data: such that not just
one DNA_Character'(x) maps to same Character'(x) of the subject string, but,
if there are no more than 16 different DNA characters, such that a pair of
DNA characters from enumeration type ('A', 'C', 'G', 'T', ...) maps
to a single ordinary Character, where 'Pos gives the usual sequence of
0, 1, 2, ... So that, for example, String'('G', 'C') becomes
     Character'Val (2 * 2**4 + 1) = '!'.

Or store a few DNA characters in exact floats to be processed
in parallel with SIMD instructions of SSE on Intelian processors,
maybe. Isn't this an interesting challenge in some circles?

If someone achieves any of this using plain Ada, the solution should
create some interest in higher order languages.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02  9:30               ` Georg Bauhaus
@ 2011-12-02 13:11                 ` Dmitry A. Kazakov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-02 13:11 UTC (permalink / raw)


On Fri, 02 Dec 2011 10:30:11 +0100, Georg Bauhaus wrote:

> On 02.12.11 09:27, Dmitry A. Kazakov wrote:

>> DNA chain is not a text string. Furthermore it would likely have some
>> specific operations and a representation tailored substring search.
> 
> I have tried this once.  Sequence information is given and is using just
> a handful of characters. I mapped those to some 4bit type, even tried less.
> Added lots of purportedly smart unchecked conversions, some shifting,
> made my head spin by thinking about what combinations of "characters"
> might suggest there could be clever additions, not shifts and the like
> for obtaining info about substrings or single "characters", noted that
> addition is faster than shifting or logical operations on the processor,
> etc. Tried specializing searching. If there is a solution, it seems tricky.
> Perhaps to be found by someone with more than ordinary combinatorial skills.
> In my case this effort has produced only minuscule advantages, sometimes
> the opposite, but the cost was a large number of specialized subprograms.

There is a simple way to do such things efficiently, You tabulate
operations you want. It is just 2**16 combinations for a dyadic operation
taking two arguments of 2**8 states. A 64K array is nothing in these times.
The array can be initialized by an aggregate which text is generated once.

I am using such techniques from time to time, e.g. for character conversion
maps, for small images (icons etc) embedded into the application, etc.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-11-30 19:39       ` Jeffrey Carter
  2011-12-01 10:57         ` Natasha Kerensikova
@ 2011-12-02 16:16         ` Tero Koskinen
  2011-12-02 17:36           ` Adam Beneschan
  2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
  1 sibling, 2 replies; 30+ messages in thread
From: Tero Koskinen @ 2011-12-02 16:16 UTC (permalink / raw)


On Wed, 30 Nov 2011 12:39:36 -0700
Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> wrote:
> Justifications for rewriting Unbounded_Strings are
> 
> 1. For fun
> 2. As a learning experience
> 3. You have timing requirements you can't otherwise meet.

4th case is portability across compilers if you need to have more than
32K bytes of data.

At least Janus/Ada has 16-bit Integers and the size of the
Unbounded_String is limited to Integer'Last.

For example, in my Twitter/Identi.ca client[1], I need to handle
40K..100Kbytes long JSON data from the server and that cannot fit
into Unbounded_String when I use Janus/Ada.

I also run into 3rd item with ICCAda. In my Twitter use case, appending
data and then reading it from Unbounded_String, is multiple times
slower with ICCAda than with Janus/Ada or GNAT.

-- 
Tero Koskinen - http://iki.fi/tero.koskinen/
[1] https://bitbucket.org/tkoskine/ladybird



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02 16:16         ` Tero Koskinen
@ 2011-12-02 17:36           ` Adam Beneschan
  2011-12-02 18:52             ` Tero Koskinen
  2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
  1 sibling, 1 reply; 30+ messages in thread
From: Adam Beneschan @ 2011-12-02 17:36 UTC (permalink / raw)


On Dec 2, 8:16 am, Tero Koskinen <tero.koski...@iki.fi> wrote:
>
> I also run into 3rd item with ICCAda. In my Twitter use case, appending
> data and then reading it from Unbounded_String, is multiple times
> slower with ICCAda than with Janus/Ada or GNAT.

This problem was addressed recently.

                      -- Adam



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02 16:16         ` Tero Koskinen
  2011-12-02 17:36           ` Adam Beneschan
@ 2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
  2011-12-02 19:07             ` Adam Beneschan
  1 sibling, 1 reply; 30+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-12-02 18:14 UTC (permalink / raw)


Le Fri, 02 Dec 2011 17:16:53 +0100, Tero Koskinen <tero.koskinen@iki.fi> a  
écrit:
> I also run into 3rd item with ICCAda.
Can you, please, give some pointers to learn more about ICCAda ?


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: [Epigrams on Programming — Alan J. — P. Yale University]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02 17:36           ` Adam Beneschan
@ 2011-12-02 18:52             ` Tero Koskinen
  0 siblings, 0 replies; 30+ messages in thread
From: Tero Koskinen @ 2011-12-02 18:52 UTC (permalink / raw)


On Fri, 2 Dec 2011 09:36:28 -0800 (PST)
Adam Beneschan <adam@irvine.com> wrote:

> On Dec 2, 8:16 am, Tero Koskinen <tero.koski...@iki.fi> wrote:
> >
> > I also run into 3rd item with ICCAda. In my Twitter use case, appending
> > data and then reading it from Unbounded_String, is multiple times
> > slower with ICCAda than with Janus/Ada or GNAT.
> 
> This problem was addressed recently.

Yes, thank you. Now Unbounded_String handling in ICC Ada is
as fast as in Janus/Ada and GNAT.

> 
>                       -- Adam


-- 
Tero Koskinen - http://iki.fi/tero.koskinen/



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Ann: Natools.Chunked_Strings, beta 1
  2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
@ 2011-12-02 19:07             ` Adam Beneschan
  0 siblings, 0 replies; 30+ messages in thread
From: Adam Beneschan @ 2011-12-02 19:07 UTC (permalink / raw)


On Dec 2, 10:14 am, Yannick Duchêne (Hibou57)
<yannick_duch...@yahoo.fr> wrote:
> Le Fri, 02 Dec 2011 17:16:53 +0100, Tero Koskinen <tero.koski...@iki.fi> a
> écrit:> I also run into 3rd item with ICCAda.
>
> Can you, please, give some pointers to learn more about ICCAda ?

http://www.irvine.com

                   -- Adam




^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-12-02 19:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
2011-11-29 15:37 ` Pascal Obry
2011-11-29 16:34   ` Natasha Kerensikova
2011-11-29 17:08     ` Georg Bauhaus
2011-11-30  9:51       ` Natasha Kerensikova
2011-11-29 20:25     ` Randy Brukardt
2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
2011-11-30 10:57     ` Dmitry A. Kazakov
2011-12-01  0:11       ` Randy Brukardt
2011-12-01  8:30         ` Dmitry A. Kazakov
2011-12-01 23:26           ` Vinzent Hoefler
2011-12-02  8:27             ` Dmitry A. Kazakov
2011-12-02  9:30               ` Georg Bauhaus
2011-12-02 13:11                 ` Dmitry A. Kazakov
2011-12-02  0:39           ` Randy Brukardt
2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
2011-11-30 13:08     ` Natasha Kerensikova
2011-11-30 19:39       ` Jeffrey Carter
2011-12-01 10:57         ` Natasha Kerensikova
2011-12-01 19:07           ` Jeffrey Carter
2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
2011-12-01 22:49               ` Natasha Kerensikova
2011-12-02 16:16         ` Tero Koskinen
2011-12-02 17:36           ` Adam Beneschan
2011-12-02 18:52             ` Tero Koskinen
2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
2011-12-02 19:07             ` Adam Beneschan
2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
2011-11-30 11:04   ` Natasha Kerensikova

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox