comp.lang.ada
 help / color / mirror / Atom feed
* Reserve_Capacity for Unbounded_String?
@ 2007-07-22 19:54 Maciej Sobczak
  2007-07-22 21:32 ` Robert A Duff
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Maciej Sobczak @ 2007-07-22 19:54 UTC (permalink / raw)


Why there is no Reserve_Capacity for Unbounded_String?
Sounds like a natural idea to me.

How to emulate it?

The following seems to work fine:

declare
   S : Unbounded_String;
   My_Capacity : constant := 1000;
begin
   S := To_Unbounded_String (My_Capacity);
   Delete (Source => S, From => 1, Through => My_Capacity);
   -- ...
end;

"Seems to work fine" means that after the above two operations the
string is logically empty, but the subsequent appends run faster
(which is actually the original motivation).

Is this a reasonable approach if performance is of concern?

--
Maciej Sobczak
http://www.msobczak.com/




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-22 19:54 Reserve_Capacity for Unbounded_String? Maciej Sobczak
@ 2007-07-22 21:32 ` Robert A Duff
  2007-07-23 19:29   ` Maciej Sobczak
  2007-07-23  4:28 ` Jeffrey R. Carter
  2007-07-23 15:07 ` Adam Beneschan
  2 siblings, 1 reply; 35+ messages in thread
From: Robert A Duff @ 2007-07-22 21:32 UTC (permalink / raw)


Maciej Sobczak <see.my.homepage@gmail.com> writes:

> Why there is no Reserve_Capacity for Unbounded_String?

Probably for historical reasons.

> Sounds like a natural idea to me.

Yes.

> How to emulate it?
>
> The following seems to work fine:
>
> declare
>    S : Unbounded_String;
>    My_Capacity : constant := 1000;
> begin
>    S := To_Unbounded_String (My_Capacity);

I would write

  To_Unbounded_String (Length => My_Capacity);

to make it clear which To_Unbounded_String you're calling.

>    Delete (Source => S, From => 1, Through => My_Capacity);
>    -- ...
> end;
>
> "Seems to work fine" means that after the above two operations the
> string is logically empty, but the subsequent appends run faster
> (which is actually the original motivation).

How much faster (I'm curious)?

> Is this a reasonable approach if performance is of concern?

I think so.

- Bob



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-22 19:54 Reserve_Capacity for Unbounded_String? Maciej Sobczak
  2007-07-22 21:32 ` Robert A Duff
@ 2007-07-23  4:28 ` Jeffrey R. Carter
  2007-07-23 15:07 ` Adam Beneschan
  2 siblings, 0 replies; 35+ messages in thread
From: Jeffrey R. Carter @ 2007-07-23  4:28 UTC (permalink / raw)


Maciej Sobczak wrote:
> Is this a reasonable approach if performance is of concern?

No. If performance is a concern, you should probably not be using 
Ada.Strings.Unbounded, or any other general-purpose components.

-- 
Jeff Carter
"I'm a kike, a yid, a heebie, a hook nose! I'm Kosher,
Mum! I'm a Red Sea pedestrian, and proud of it!"
Monty Python's Life of Brian
77



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-22 19:54 Reserve_Capacity for Unbounded_String? Maciej Sobczak
  2007-07-22 21:32 ` Robert A Duff
  2007-07-23  4:28 ` Jeffrey R. Carter
@ 2007-07-23 15:07 ` Adam Beneschan
  2007-07-24  1:01   ` Randy Brukardt
  2 siblings, 1 reply; 35+ messages in thread
From: Adam Beneschan @ 2007-07-23 15:07 UTC (permalink / raw)


On Jul 22, 12:54 pm, Maciej Sobczak <see.my.homep...@gmail.com> wrote:
> Why there is no Reserve_Capacity for Unbounded_String?
> Sounds like a natural idea to me.
>
> How to emulate it?
>
> The following seems to work fine:
>
> declare
>    S : Unbounded_String;
>    My_Capacity : constant := 1000;
> begin
>    S := To_Unbounded_String (My_Capacity);
>    Delete (Source => S, From => 1, Through => My_Capacity);
>    -- ...
> end;
>
> "Seems to work fine" means that after the above two operations the
> string is logically empty, but the subsequent appends run faster
> (which is actually the original motivation).
>
> Is this a reasonable approach if performance is of concern?

I sure wouldn't count on this.  I don't know what compiler you're
using or how it implements Unbounded_String; but just based on what I
know about how this package *might* be implemented and about the
suggested implementation in the Rationale, the performance
improvements you're seeing could be just an accident.  (One test: try
doing something like "Intp := new Integer;" after each append.  If the
Delete actually deallocates memory, and a piece of that memory could
be reused by a "new Integer" allocation, you may get very different
performance results than if the deallocated memory is never reused
except by new appends.  Here, though, I'm just guessing, since as I
said I have no idea how Unbounded_Strings is implemented.)

I'm with Jeff here: you may need an Unbounded_Strings package tailored
to your specific needs.  It just isn't possible to write a package
that will achieve the best performance for all programs that might
want to use it; even a simple idea like "Reserve_Capacity" could
introduce some implementation overhead that might be unwelcome in
programs that wouldn't benefit from that feature.

                           -- Adam






^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-22 21:32 ` Robert A Duff
@ 2007-07-23 19:29   ` Maciej Sobczak
  2007-07-23 20:30     ` Robert A Duff
  0 siblings, 1 reply; 35+ messages in thread
From: Maciej Sobczak @ 2007-07-23 19:29 UTC (permalink / raw)


On 22 Lip, 23:32, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:

> > Why there is no Reserve_Capacity for Unbounded_String?
>
> Probably for historical reasons.

Would introducing it break any existing code?

> > declare
> >    S : Unbounded_String;
> >    My_Capacity : constant := 1000;
> > begin
> >    S := To_Unbounded_String (My_Capacity);
>
> I would write
>
>   To_Unbounded_String (Length => My_Capacity);
>
> to make it clear which To_Unbounded_String you're calling.

Yes. I try to follow the "rule" that names associations should be used
for any subprogram call with more than one parameter. I've never
considered hardening this rule for overloaded subprograms.

> > "Seems to work fine" means that after the above two operations the
> > string is logically empty, but the subsequent appends run faster
> > (which is actually the original motivation).
>
> How much faster (I'm curious)?

This of course depends on what is the ration between allocations and
reservations in a given test.

Here:

with Ada.Strings.Unbounded;
procedure String_Test is
begin
   for I in 1 .. 10000 loop
      declare
         S : Ada.Strings.Unbounded.Unbounded_String;
         use Ada.Strings.Unbounded;
      begin
--       S := Ada.Strings.Unbounded.To_Unbounded_String (10000);
--       Ada.Strings.Unbounded.Delete (S, 1, 10000);
         for J in 1 .. 10000 loop
            Ada.Strings.Unbounded.Append (S, 'A');
         end loop;
      end;
   end loop;
end String_Test;

On my machine and with option -O2 uncommenting the two lines makes a
difference between 1.86s and 1.43s, which is about 20% gain. Depending
on a situation it might or might not be worth the hassle (the hassle
is in determining the correct reservation size - and if we can know it
fully in advance, then maybe there is no point in using unbounded at
all).

--
Maciej Sobczak
http://www.msobczak.com/




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-23 19:29   ` Maciej Sobczak
@ 2007-07-23 20:30     ` Robert A Duff
  0 siblings, 0 replies; 35+ messages in thread
From: Robert A Duff @ 2007-07-23 20:30 UTC (permalink / raw)


Maciej Sobczak <see.my.homepage@gmail.com> writes:

> On 22 Lip, 23:32, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
>
>> > Why there is no Reserve_Capacity for Unbounded_String?
>>
>> Probably for historical reasons.
>
> Would introducing it break any existing code?

Only in rare cases.  For ex., if you had a type called Reserve_Capacity
in an unrelated package, and you had "use" on that one and also on
unbounded strings, the code would now be illegal.

>> > declare
>> >    S : Unbounded_String;
>> >    My_Capacity : constant := 1000;
>> > begin
>> >    S := To_Unbounded_String (My_Capacity);
>>
>> I would write
>>
>>   To_Unbounded_String (Length => My_Capacity);
>>
>> to make it clear which To_Unbounded_String you're calling.
>
> Yes. I try to follow the "rule" that names associations should be used
> for any subprogram call with more than one parameter.

Well, that's close, but not quite the right rule.  For example, "+" on
integers has two parameters, but I would say X+Y rather than
"+"(Left => X, Right => Y).

I don't have a well-defined rule, but I like to use named notation
often.

I think perhaps the designer of the subprogram should decide whether
named notation is appropriate, and all callers have to obey.  But that's
a language design issue, not an Ada issue.

>...I've never
> considered hardening this rule for overloaded subprograms.
>
>> > "Seems to work fine" means that after the above two operations the
>> > string is logically empty, but the subsequent appends run faster
>> > (which is actually the original motivation).
>>
>> How much faster (I'm curious)?
>
> This of course depends on what is the ration between allocations and
> reservations in a given test.
>
> Here:
>
> with Ada.Strings.Unbounded;
> procedure String_Test is
> begin
>    for I in 1 .. 10000 loop
>       declare
>          S : Ada.Strings.Unbounded.Unbounded_String;
>          use Ada.Strings.Unbounded;
>       begin
> --       S := Ada.Strings.Unbounded.To_Unbounded_String (10000);
> --       Ada.Strings.Unbounded.Delete (S, 1, 10000);
>          for J in 1 .. 10000 loop
>             Ada.Strings.Unbounded.Append (S, 'A');
>          end loop;
>       end;
>    end loop;
> end String_Test;
>
> On my machine and with option -O2 uncommenting the two lines makes a
> difference between 1.86s and 1.43s, which is about 20% gain. 

Interesting.  Thanks for posting the measurements.

>...Depending
> on a situation it might or might not be worth the hassle (the hassle
> is in determining the correct reservation size - and if we can know it
> fully in advance, then maybe there is no point in using unbounded at
> all).

Good point.  If you know the max size, the Bounded (or just plain
String) is appropriate.  But if you know the max "typical" size,
but that you still want it to work for atypical sizes, then this
sort of thing seems reasonable.

As others have pointed out, it depends on the implementation.
But that's always true about efficiency -- if you care about
efficiency, then the abstraction leaks.

- Bob



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-23 15:07 ` Adam Beneschan
@ 2007-07-24  1:01   ` Randy Brukardt
  2007-07-24  7:57     ` Pascal Obry
  2007-07-24 23:41     ` Robert A Duff
  0 siblings, 2 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-24  1:01 UTC (permalink / raw)



"Adam Beneschan" <adam@irvine.com> wrote in message
news:1185203238.701948.307410@m37g2000prh.googlegroups.com...
...
> > Is this a reasonable approach if performance is of concern?
>
> I sure wouldn't count on this.

I second (or is that "third") Adam and Jeff's comments.

For Janus/Ada, Unbounded_String is just a controlled wrapper around "access
String"; in particular, the length is the length of the item allocated. The
code you have would just cause an extra allocation/deallocation without any
corresponding benefit (and it might even cause additional fragmentation and
slow the program down).

In general, you cannot depend on the performance of the predefined packages;
you can presume that implementations have done a decent job for their
target, but that is about all. (The containers are a bit of an exception, in
that there are performance requirements on them; but those requirements are
vague enough to be pretty meaningless for average-sized containers.)

I usually recommend to use the predefined packages as necessary in your
application, and avoid worrying about performance of them. If it turns out
that there is a performance problem, and that problem is with the one of the
predefined packages, then it makes sense to replace the offending package
with something custom-written to match the problem at hand. (In this case,
just using type String and doing your own storage management would seem to
be appropriate.)

For things like Unbounded_Strings (and the Containers), the main value is to
let the package do the memory management (meaning that you have well-tested
memory management that won't leak or break). Trying to take over some of
that is suspicious (that's true for Reserve_Capacity in the containers as
well), because it is easy to get it wrong and cause *worse* performance.

In any case, implementing Reserve_Capacity to do anything (as opposed to
being a no-op) in Janus/Ada would require junking the entire existing
implementation and starting over. It surely would add overhead (because
you'd continually need to check the allocated length against the "nominal"
length). As such, this is something I would fight strongly.

                                 Randy.






^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24  1:01   ` Randy Brukardt
@ 2007-07-24  7:57     ` Pascal Obry
  2007-07-24 18:58       ` Randy Brukardt
  2007-07-24 23:41     ` Robert A Duff
  1 sibling, 1 reply; 35+ messages in thread
From: Pascal Obry @ 2007-07-24  7:57 UTC (permalink / raw)
  To: Randy Brukardt

Randy Brukardt a �crit :
> In any case, implementing Reserve_Capacity to do anything (as opposed to
> being a no-op) in Janus/Ada would require junking the entire existing
> implementation and starting over. It surely would add overhead (because
> you'd continually need to check the allocated length against the "nominal"
> length). As such, this is something I would fight strongly.

Well this is already done in GNAT for speed performances. If you are
adding lot of chunk of string into an Unbounded_String it is far too
costly to reallocate the Unbounded_String buffer each time. This was a
huge speed penalty for the templates_parser for example. The current
GNAT implementation is just what users would expect it to be.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24  7:57     ` Pascal Obry
@ 2007-07-24 18:58       ` Randy Brukardt
  2007-07-24 23:50         ` Robert A Duff
  2007-07-24 23:54         ` Pascal Obry
  0 siblings, 2 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-24 18:58 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]

"Pascal Obry" <pascal@obry.net> wrote in message
news:46A5B0FE.3060008@obry.net...
> Randy Brukardt a �crit :
> > In any case, implementing Reserve_Capacity to do anything (as opposed to
> > being a no-op) in Janus/Ada would require junking the entire existing
> > implementation and starting over. It surely would add overhead (because
> > you'd continually need to check the allocated length against the
"nominal"
> > length). As such, this is something I would fight strongly.
>
> Well this is already done in GNAT for speed performances. If you are
> adding lot of chunk of string into an Unbounded_String it is far too
> costly to reallocate the Unbounded_String buffer each time. This was a
> huge speed penalty for the templates_parser for example. The current
> GNAT implementation is just what users would expect it to be.

Maybe some uninformed users, but not anyone who doesn't add non-existent
requirements to the specs.

Unbounded_String is best used for storage (not calculation) of values. It is
a horrible design for calculation, no matter what schemes you use for the
implementation.

Moreover, I've yet to see an example where the performance of
Unbounded_Strings (allocation-wise) matters. Our spam filter was written
solely using Unbounded_Strings, it does a lot of calculation (normalization
of e-mail) and still the allocation overhead is insignificant. (The
bottleneck is in Index, which does no allocation at all.)

I can imagine that it would be possible to write a program where the
allocation overhead would matter, but such a program would be helped a lot
more by avoiding Unbounded_Strings (and *all* of their allocation overhead)
than by tweaking the implementation of Unbounded_Strings to reduce that
overhead in a few rarely used operations.

So I stand by my original statement. Anyone that depends on the performance
characteristics of a particular implementation (of anything!) is an idiot
and will be burned in the future.

                                              Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24  1:01   ` Randy Brukardt
  2007-07-24  7:57     ` Pascal Obry
@ 2007-07-24 23:41     ` Robert A Duff
  2007-07-25  0:16       ` Randy Brukardt
  1 sibling, 1 reply; 35+ messages in thread
From: Robert A Duff @ 2007-07-24 23:41 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Adam Beneschan" <adam@irvine.com> wrote in message
> news:1185203238.701948.307410@m37g2000prh.googlegroups.com...
> ...
>> > Is this a reasonable approach if performance is of concern?
>>
>> I sure wouldn't count on this.
>
> I second (or is that "third") Adam and Jeff's comments.

I said the opposite, but I guess I'm convinced by your (and Adam and
Jeff's) arguments that one should not depend on this particular
"efficiency hack".  But...

> For Janus/Ada, Unbounded_String is just a controlled wrapper around "access
> String"; in particular, the length is the length of the item
> allocated.

Are you saying that:

    for I in 1..N loop
        Append (Some_Unbounded_String, 'x');

causes a free and a "new" and a copy every time around the loop?

- Bob



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 18:58       ` Randy Brukardt
@ 2007-07-24 23:50         ` Robert A Duff
  2007-07-25  0:00           ` Randy Brukardt
  2007-07-24 23:54         ` Pascal Obry
  1 sibling, 1 reply; 35+ messages in thread
From: Robert A Duff @ 2007-07-24 23:50 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> So I stand by my original statement. Anyone that depends on the performance
> characteristics of a particular implementation (of anything!) is an idiot
> and will be burned in the future.

I really don't think you mean that literally!

I mean, the Ada language standard does not require that "X := X + 1"
take less than 100 years to execute.  Programmers _must_ trust that
language implementers make reasonably efficient choices.  In fact,
in Ada, I write things like "X := X + 1" all the time, trusting that it
compiles into a single machine instruction, or close to that.

- Bob



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 18:58       ` Randy Brukardt
  2007-07-24 23:50         ` Robert A Duff
@ 2007-07-24 23:54         ` Pascal Obry
  2007-07-25  0:52           ` Randy Brukardt
  2007-07-25  1:28           ` Randy Brukardt
  1 sibling, 2 replies; 35+ messages in thread
From: Pascal Obry @ 2007-07-24 23:54 UTC (permalink / raw)
  To: Randy Brukardt

Randy Brukardt a �crit :
> Maybe some uninformed users, but not anyone who doesn't add non-existent
> requirements to the specs.
> 
> Unbounded_String is best used for storage (not calculation) of values. It is
> a horrible design for calculation, no matter what schemes you use for the
> implementation.
> 
> Moreover, I've yet to see an example where the performance of
> Unbounded_Strings (allocation-wise) matters. Our spam filter was written
> solely using Unbounded_Strings, it does a lot of calculation (normalization
> of e-mail) and still the allocation overhead is insignificant. (The
> bottleneck is in Index, which does no allocation at all.)
> 
> I can imagine that it would be possible to write a program where the
> allocation overhead would matter, but such a program would be helped a lot
> more by avoiding Unbounded_Strings (and *all* of their allocation overhead)
> than by tweaking the implementation of Unbounded_Strings to reduce that
> overhead in a few rarely used operations.
> 
> So I stand by my original statement. Anyone that depends on the performance
> characteristics of a particular implementation (of anything!) is an idiot
> and will be burned in the future.

Well Randy, its all too easy to dismiss points like this by saying that
you have not seen a problem with the current implementation and if
somebody show you something wrong in this area then he has to use
something else that Unbounded_String... At least that's the way I'm
reading your message.

Just try this on your implementation:

   C : constant Character := 'x';

   U : Unbounded_String;

   for k in 1 .. 5_000_000 loop
      U := U & C;
   end loop;

And then test it with GNAT. Of course that is not a real example but
very close to something done in my templates engine.

You were speaking about performance about checking the preallocated
buffer. My point is that it is not that costly compared to the price of
reallocating a whole string thousands of time.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 23:50         ` Robert A Duff
@ 2007-07-25  0:00           ` Randy Brukardt
  0 siblings, 0 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25  0:00 UTC (permalink / raw)


"Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
news:wccmyxl5plh.fsf@shell01.TheWorld.com...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
>
> > So I stand by my original statement. Anyone that depends on the
performance
> > characteristics of a particular implementation (of anything!) is an
idiot
> > and will be burned in the future.
>
> I really don't think you mean that literally!
>
> I mean, the Ada language standard does not require that "X := X + 1"
> take less than 100 years to execute.  Programmers _must_ trust that
> language implementers make reasonably efficient choices.  In fact,
> in Ada, I write things like "X := X + 1" all the time, trusting that it
> compiles into a single machine instruction, or close to that.

Perhaps; I was thinking of predefined libraries more than basic stuff like
"X := X + 1;". But I think you really do have to test whether something is
fast enough; you can never *assume* that it will be. Vendors don't
intentionally make things slow, but what you think is slow might not be for
a given implementation's model. Also see my answer to your other note.

                                      Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 23:41     ` Robert A Duff
@ 2007-07-25  0:16       ` Randy Brukardt
  2007-07-25  2:25         ` Robert A Duff
  0 siblings, 1 reply; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25  0:16 UTC (permalink / raw)


"Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
news:wccr6mx5pz0.fsf@shell01.TheWorld.com...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
...
> > For Janus/Ada, Unbounded_String is just a controlled wrapper around
"access
> > String"; in particular, the length is the length of the item
> > allocated.
>
> Are you saying that:
>
>     for I in 1..N loop
>         Append (Some_Unbounded_String, 'x');
>
> causes a free and a "new" and a copy every time around the loop?

Yup; I just checked the code to make sure. Moreover, I have yet to see an
real example where the performance difference would matter that much (Free
and new aren't that expensive relative to the alternatives). Our spam filter
does operations like this, and as I said earlier, the performance bottleneck
is with Index and with the file I/O operations and not here. (Nor are there
any fragmentation problems that I can detect; the program runs for many days
without issues.)

In any case, I would expect this operation (of making the string longer) to
do a lot of allocations anyway, because the string will need to be expanded
a lot. And any "over"-allocation will be wasteful of memory; for the spam
filter, that would tend to reduce the maximum size of messages that could be
filtered.(Janus/Ada always emphasizes smaller memory usages versus wasting
memory to get better performance.)

Besides, in Janus/Ada, most the alternatives would also do a (under the
covers) "new" and "free". We do try to optimize those out, but by default

             Var : String := Some_Value & 'x';

would generate three allocations and deallocations. The optimizer probably
would get rid of the descriptors or allocate them statically, but the object
Var probably would end up in the heap.

If performance is a real issue, you'd have to write your own code to do the
actual operations you want; for instance, in your example
To_Unbounded_String (Some_Unbounded_String, (1..N => 'x')); would be a lot
faster on any compiler (far fewer calls to a library; even if there is no
allocation, there still will be overhead checks for bounds and the like).
The examples where an alternative implementation of Unbounded_Strings would
make a significant difference such that the application was fast enough with
it and not fast enough without it have to be extremely rare.

                        Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 23:54         ` Pascal Obry
@ 2007-07-25  0:52           ` Randy Brukardt
  2007-07-25  1:28           ` Randy Brukardt
  1 sibling, 0 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25  0:52 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

"Pascal Obry" <pascal@obry.net> wrote in message
news:46A69136.3010803@obry.net...
> Randy Brukardt a �crit :
> > Maybe some uninformed users, but not anyone who doesn't add non-existent
> > requirements to the specs.
> >
> > Unbounded_String is best used for storage (not calculation) of values.
It is
> > a horrible design for calculation, no matter what schemes you use for
the
> > implementation.
> >
> > Moreover, I've yet to see an example where the performance of
> > Unbounded_Strings (allocation-wise) matters. Our spam filter was written
> > solely using Unbounded_Strings, it does a lot of calculation
(normalization
> > of e-mail) and still the allocation overhead is insignificant. (The
> > bottleneck is in Index, which does no allocation at all.)
> >
> > I can imagine that it would be possible to write a program where the
> > allocation overhead would matter, but such a program would be helped a
lot
> > more by avoiding Unbounded_Strings (and *all* of their allocation
overhead)
> > than by tweaking the implementation of Unbounded_Strings to reduce that
> > overhead in a few rarely used operations.
> >
> > So I stand by my original statement. Anyone that depends on the
performance
> > characteristics of a particular implementation (of anything!) is an
idiot
> > and will be burned in the future.
>
> Well Randy, its all too easy to dismiss points like this by saying that
> you have not seen a problem with the current implementation and if
> somebody show you something wrong in this area then he has to use
> something else that Unbounded_String... At least that's the way I'm
> reading your message.
>
> Just try this on your implementation:
>
>    C : constant Character := 'x';
>
>    U : Unbounded_String;
>
>    for k in 1 .. 5_000_000 loop
>       U := U & C;
>    end loop;
>
> And then test it with GNAT. Of course that is not a real example but
> very close to something done in my templates engine.
>
> You were speaking about performance about checking the preallocated
> buffer. My point is that it is not that costly compared to the price of
> reallocating a whole string thousands of time.
>
> Pascal.
>
> -- 
>
> --|------------------------------------------------------
> --| Pascal Obry                           Team-Ada Member
> --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
> --|------------------------------------------------------
> --|              http://www.obry.net
> --| "The best way to travel is by means of imagination"
> --|
> --| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-24 23:54         ` Pascal Obry
  2007-07-25  0:52           ` Randy Brukardt
@ 2007-07-25  1:28           ` Randy Brukardt
  2007-07-25  7:48             ` Pascal Obry
  2007-07-25  8:50             ` Martin Krischik
  1 sibling, 2 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25  1:28 UTC (permalink / raw)


(I managed to press Send on this without actually writing a reply...sorry)
"Pascal Obry" <pascal@obry.net> wrote in message
news:46A69136.3010803@obry.net...
...
> > So I stand by my original statement. Anyone that depends on the
performance
> > characteristics of a particular implementation (of anything!) is an
idiot
> > and will be burned in the future.
>
> Well Randy, its all too easy to dismiss points like this by saying that
> you have not seen a problem with the current implementation and if
> somebody show you something wrong in this area then he has to use
> something else that Unbounded_String... At least that's the way I'm
> reading your message.

I want to see a real program compiled with using Janus/Ada that shows that
(a) the performance is not good enough currently and (b) that tweaking
Ada.Strings.Unbounded would make the performance good enough. Otherwise,
time spent on Ada.Strings.Unbounded is a waste of time (and potentially a
compatibility problem, such that existing programs might run out of memory
where they do not currently).

If I spent much time on things where (a) and (b) were not true, then I'd be
the idiot, by not making the best use of my time to improve the
compile/runtime/etc.

> Just try this on your implementation:
>
>    C : constant Character := 'x';
>
>    U : Unbounded_String;
>
>    for k in 1 .. 5_000_000 loop
>       U := U & C;
>    end loop;
>
> And then test it with GNAT. Of course that is not a real example but
> very close to something done in my templates engine.

(Aside: I hope not. It's always bad practice to append small things to large
things, be it strings, I/O, or anything else. It always leads to performance
issues, more so the more abstract the interface is. I realize that you have
to balance understandability with performance, and sometimes occassional
appends of small things make sense. But doing it in an inner loop like this
sets off my "this is a very bad idea" detector.)

The old version of GNAT I have installed allocates for each call to '&' (I
just checked the source code), so I doubt testing it would prove anything
whatsoever. (Indeed, looking at the code, the Janus/Ada version is likely to
be more efficient. But, based on your assertions, I have to assume that the
GNAT code is out of date, so there is no point in looking at it further.)

But even if it did, so what? Using any version of "&" (be it for String,
Unbounded_String, or any other type) in a tight loop is silly. The amount of
code needed to implement "&" is huge, and virtually any alternative way of
doing the operation would be better.

Moreover, the only way to save allocations in an increasing length string is
to overallocate the memory; that is, to waste memory. (If you had a way to
find out the blocking factor of the underlying storage pool, you could
possibly avoid that by rounding allocations up, but that would only help in
pathological programs that are adding single characters to end of strings
thousands of times.) Janus/Ada is all about minimum memory usage; speed is a
secondary consideration and in general we will not write runtime packages
such that they waste memory. As such, even if I was to rewrite this, I would
not allocate more than the memory needed by the current operation (modulo a
very small blocking factor), and as such this program would *still* need a
new allocation nearly every iteration (because the string is continuously
getting longer).

I'd expect a program that *deleted* parts of strings to show much more
improvement than one that added strings, because in that case there wouldn't
be a need to free anything (just adjust the length). But such programs are
far rarer than the append everything ones like this example.

> You were speaking about performance about checking the preallocated
> buffer. My point is that it is not that costly compared to the price of
> reallocating a whole string thousands of time.

I'm sure that's true, but my point also was that programs that do that are
generally pathologies (surely, that's the only examples that I've been shown
so far). And the real programs I've looked at (like our spam filter), this
overhead is not a significant part of the entire program. Even though our
spam filter does something like the above when decoding BASE64 encoded
e-mails. And I don't think that the performance improvement of a tweaked
Unbounded_String would be a tenth of simply not using it at all for
calculations -- which would make far more difference in a misbehaving
application.

After all, the first rule of performance improvement is to find a better
algorithm; tweaking an existing one to make it faster can only provide
incremental improvements (and that rarely helps anything). Surely that
applies here, too.

And now I've wasted far too much time on this worthless topic.

                                   Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  0:16       ` Randy Brukardt
@ 2007-07-25  2:25         ` Robert A Duff
  2007-07-25  6:07           ` Simon Wright
  2007-07-25 19:08           ` Randy Brukardt
  0 siblings, 2 replies; 35+ messages in thread
From: Robert A Duff @ 2007-07-25  2:25 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
> news:wccr6mx5pz0.fsf@shell01.TheWorld.com...
>> "Randy Brukardt" <randy@rrsoftware.com> writes:
> ...
>> > For Janus/Ada, Unbounded_String is just a controlled wrapper around
> "access
>> > String"; in particular, the length is the length of the item
>> > allocated.
>>
>> Are you saying that:
>>
>>     for I in 1..N loop
>>         Append (Some_Unbounded_String, 'x');
>>
>> causes a free and a "new" and a copy every time around the loop?
>
> Yup; I just checked the code to make sure. Moreover, I have yet to see an
> real example where the performance difference would matter that much (Free
> and new aren't that expensive relative to the alternatives).

It's the copy I'm wondering about, not the new and free so much.
To append N characters to an unb string requires quadratic copying
(copying a char N**2 / 2 times).

If you double the allocation every time, it's amortized linear.
Yes, you waste some space that way.

> If performance is a real issue, you'd have to write your own code to do the
> actual operations you want; for instance, in your example
> To_Unbounded_String (Some_Unbounded_String, (1..N => 'x')); would be a lot
> faster ...

Yes, but that's just an example.  The real example is where you're
reading characters one-by-one from somewhere, and you don't know how
many there will be, and you want to append some of them to a string.

>...on any compiler (far fewer calls to a library; even if there is no
> allocation, there still will be overhead checks for bounds and the like).
> The examples where an alternative implementation of Unbounded_Strings would
> make a significant difference such that the application was fast enough with
> it and not fast enough without it have to be extremely rare.

Fair enough.  Real-world measurements beat theoretical reasoning.  ;-)

- Bob



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  2:25         ` Robert A Duff
@ 2007-07-25  6:07           ` Simon Wright
  2007-07-25 19:08           ` Randy Brukardt
  1 sibling, 0 replies; 35+ messages in thread
From: Simon Wright @ 2007-07-25  6:07 UTC (permalink / raw)


Robert A Duff <bobduff@shell01.TheWorld.com> writes:

> It's the copy I'm wondering about, not the new and free so much.
> To append N characters to an unb string requires quadratic copying
> (copying a char N**2 / 2 times).
>
> If you double the allocation every time, it's amortized linear.
> Yes, you waste some space that way.

I made a limited Unbounded_String type (with an Append operation) that
works this way when faced with a similar problem (I am stuck on a
rather old GNAT which allocates every time). 30% performance
improvement overall, as I recall.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  1:28           ` Randy Brukardt
@ 2007-07-25  7:48             ` Pascal Obry
  2007-07-25  9:55               ` Georg Bauhaus
  2007-07-25 18:58               ` Randy Brukardt
  2007-07-25  8:50             ` Martin Krischik
  1 sibling, 2 replies; 35+ messages in thread
From: Pascal Obry @ 2007-07-25  7:48 UTC (permalink / raw)
  To: Randy Brukardt

Randy Brukardt a �crit :
> And now I've wasted far too much time on this worthless topic.

If you think so! GNAT has implemented that as in a real application the
code was running more than 10 times faster. If you think you have wasted
far too much time, no problem. I'm not a Janus/Ada customer anyway!

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  1:28           ` Randy Brukardt
  2007-07-25  7:48             ` Pascal Obry
@ 2007-07-25  8:50             ` Martin Krischik
  2007-07-25  9:26               ` AW: " Grein, Christoph (Fa. ESG)
  1 sibling, 1 reply; 35+ messages in thread
From: Martin Krischik @ 2007-07-25  8:50 UTC (permalink / raw)


Randy Brukardt schrieb:
> But, based on your assertions, I have to assume that the
> GNAT code is out of date, so there is no point in looking at it further.)

No need to guess, have a look at the current source here:

http://gnuada.sourceforge.net/source-gcc-4.1.2/index.htm

Martin

-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* AW: Reserve_Capacity for Unbounded_String?
  2007-07-25  8:50             ` Martin Krischik
@ 2007-07-25  9:26               ` Grein, Christoph (Fa. ESG)
  2007-07-25 15:32                 ` Martin Krischik
  2007-07-25 15:39                 ` Martin Krischik
  0 siblings, 2 replies; 35+ messages in thread
From: Grein, Christoph (Fa. ESG) @ 2007-07-25  9:26 UTC (permalink / raw)
  To: comp.lang.ada

> No need to guess, have a look at the current source here:
>
> http://gnuada.sourceforge.net/source-gcc-4.1.2/index.htm

There is only a-strunb.ads, the link to a-strunb.adb fails.


Eurocopter Deutschland GmbH
Sitz der Gesellschaft/Registered Office: Donauwoerth
Registergericht/Registration Court: Amtsgericht Augsburg HRB 16508
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Lutz Bertling
Geschaeftsfuehrung/Board of Management:
Dr. Wolfgang Schoder, Vorsitzender/CEO; Friedrich-Wilhelm Hormel; Ralf Barnscheidt

CONFIDENTIALITY NOTICE 

This communication and the information it contains is intended for the addressee(s) named above and for no other persons or organizations. It is confidential and may be legally privileged and protected by law. The unauthorized use, copying or disclosure of this communication or any part of it is prohibited and may be unlawful. 
If you have received this communication in error, kindly notify us by return e-mail and discard and/or delete the communication. Thank you very much. 
It is possible for e-mails to be intercepted or affected by viruses. Whilst we maintain virus checks on our e-mails, we accept no liability for viruses or other material which might be introduced with this message. 




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  7:48             ` Pascal Obry
@ 2007-07-25  9:55               ` Georg Bauhaus
  2007-07-25 10:02                 ` Georg Bauhaus
  2007-07-25 18:58               ` Randy Brukardt
  1 sibling, 1 reply; 35+ messages in thread
From: Georg Bauhaus @ 2007-07-25  9:55 UTC (permalink / raw)


On Wed, 2007-07-25 at 09:48 +0200, Pascal Obry wrote:
> Randy Brukardt a écrit :
> > And now I've wasted far too much time on this worthless topic.
> 
> If you think so! GNAT has implemented that as in a real application the
> code was running more than 10 times faster. If you think you have wasted
> far too much time, no problem. I'm not a Janus/Ada customer anyway!

In fact, there are many special string-related subprograms
in GNAT that arguably demonstrate the influences of effective
marketing on technical properties of compilers. E.g. there are
variations of

   function Str_Concat_3 (S1, S2, S3 : String) return String;
   --  Concatenate three strings and return resulting string

hidden in System.String_Ops_Concat_3 and used in algorithmically (!)
optimizing two "&"s in the compiler.

Likewise,  "... use of & removed for efficiency reasons"
in package body Ada.Strings.Fixed. (See also
Ada.Strings.Unbounded.Aux.Get_String for the unbounded case.)

Two conclusion seem to follow from this:

1) that GNAT authors agree with Randy when he says that you
have to be careful with "&"-centric algorithms, but

2) that AdaCore has better salespeople because customers are
more satisfied when the vendor has already done the "right" thing
for them. ;-)


Try this:

procedure Strtest is
   S: constant String := "A" & Argument(1) & "B";
     -- occurences of "&" --> GNAT special case
begin
   Text_IO.put_line(S);
end Strtest;


Then, using GNAT, you get

$ nm strtest.o
...
00000019 T _ada_strtest
...
         U system__string_ops_concat_3__str_concat_3  <--
$





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  9:55               ` Georg Bauhaus
@ 2007-07-25 10:02                 ` Georg Bauhaus
  0 siblings, 0 replies; 35+ messages in thread
From: Georg Bauhaus @ 2007-07-25 10:02 UTC (permalink / raw)


On Wed, 2007-07-25 at 11:55 +0200, Georg Bauhaus wrote:

>    function Str_Concat_3 (S1, S2, S3 : String) return String;
>    --  Concatenate three strings and return resulting string
> 
> hidden in System.String_Ops_Concat_3 and used in algorithmically (!)
> optimizing two "&"s in the compiler.

I mean, the compiler optimizes occurences of a small number of "&"
in a row in user programs.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AW: Reserve_Capacity for Unbounded_String?
  2007-07-25  9:26               ` AW: " Grein, Christoph (Fa. ESG)
@ 2007-07-25 15:32                 ` Martin Krischik
  2007-07-25 15:39                 ` Martin Krischik
  1 sibling, 0 replies; 35+ messages in thread
From: Martin Krischik @ 2007-07-25 15:32 UTC (permalink / raw)


Grein, Christoph (Fa. ESG) wrote:

>> No need to guess, have a look at the current source here:
>>
>> http://gnuada.sourceforge.net/source-gcc-4.1.2/index.htm
> 
> There is only a-strunb.ads, the link to a-strunb.adb fails.

Indeed. I have to check why this is so.

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AW: Reserve_Capacity for Unbounded_String?
  2007-07-25  9:26               ` AW: " Grein, Christoph (Fa. ESG)
  2007-07-25 15:32                 ` Martin Krischik
@ 2007-07-25 15:39                 ` Martin Krischik
  1 sibling, 0 replies; 35+ messages in thread
From: Martin Krischik @ 2007-07-25 15:39 UTC (permalink / raw)


Grein, Christoph (Fa. ESG) wrote:

>> No need to guess, have a look at the current source here:
>>
>> http://gnuada.sourceforge.net/source-gcc-4.1.2/index.htm
> 
> There is only a-strunb.ads, the link to a-strunb.adb fails.

The source is here:

http://gnuada.sourceforge.net/source-gcc-4.1.2/____a-strunb__adb.htm

the link is missing the 4 _'s. And only AdaCore will know why!

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  7:48             ` Pascal Obry
  2007-07-25  9:55               ` Georg Bauhaus
@ 2007-07-25 18:58               ` Randy Brukardt
  1 sibling, 0 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25 18:58 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

"Pascal Obry" <pascal@obry.net> wrote in message
news:46A70064.8090407@obry.net...
> Randy Brukardt a �crit :
> > And now I've wasted far too much time on this worthless topic.
>
> If you think so! GNAT has implemented that as in a real application the
> code was running more than 10 times faster.

I don't think that much of a speed-up would be possible with Janus/Ada even
with a specially tuned version of Ada.Strings.Unbounded. The overhead of
allocation is only a portion of the total overhead, after all; call
overhead, checking overhead, and copying overhead are all significant. The
only way to get close would be to preallocate strings in large blocks, but
that would only make sense for applications with very few unbounded strings
and that do a lot of appends and '&'s. The performance would be worse on
programs that use tens of thousands of Unbounded_Strings (like my spam
filter and most other real apps that I've seen) because of increased paging
overhead and reduced capacity.

> If you think you have wasted
> far too much time, no problem. I'm not a Janus/Ada customer anyway!

Exactly. My response would be different to a good customer...

                             Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25  2:25         ` Robert A Duff
  2007-07-25  6:07           ` Simon Wright
@ 2007-07-25 19:08           ` Randy Brukardt
  2007-07-25 20:37             ` Maciej Sobczak
  1 sibling, 1 reply; 35+ messages in thread
From: Randy Brukardt @ 2007-07-25 19:08 UTC (permalink / raw)


"Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
news:wcctzrtrzhz.fsf@shell01.TheWorld.com...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
...
> > Yup; I just checked the code to make sure. Moreover, I have yet to see
an
> > real example where the performance difference would matter that much
(Free
> > and new aren't that expensive relative to the alternatives).
>
> It's the copy I'm wondering about, not the new and free so much.
> To append N characters to an unb string requires quadratic copying
> (copying a char N**2 / 2 times).

True enough, but copying is extremely cheap relative to other operations. It
could only matter if N is extremely large, and that won't work in Janus/Ada
anyway (because Ada.Strings.Unbounded is broken in that it depends on type
Integer, which is 16-bit in Janus/Ada for historical reasons; if you want
really big strings in Janus/Ada you have to declare them yourself. I don't
much like this situation, but it is not practically fixable.)

> If you double the allocation every time, it's amortized linear.
> Yes, you waste some space that way.

Quadratic allocation maximizes heap fragmentation (a major issue for small
heaps, and a performance trap for all), and I no longer use it anywhere for
that reason. It also wastes huge amounts of space for long strings. I could
see using a scheme where you rounded up to the nearest 16 (or some similar
number) bytes (as the storage pool is likely to do something like that
anyway); that would catch the long-hanging fruit (i.e. appending lots of
single characters) without too much impact in terms of memory. But anything
further would have nasty memory use effects on programs with a lot of
unbounded strings. And doing that would cost extra overhead on every string
operation; most of them for which there would be no gain.

I still don't think it is worth it, but obviously a customer example could
change my mind.

                              Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25 19:08           ` Randy Brukardt
@ 2007-07-25 20:37             ` Maciej Sobczak
  2007-07-25 22:06               ` Georg Bauhaus
  2007-07-26 22:11               ` Randy Brukardt
  0 siblings, 2 replies; 35+ messages in thread
From: Maciej Sobczak @ 2007-07-25 20:37 UTC (permalink / raw)


On 25 Lip, 21:08, "Randy Brukardt" <ra...@rrsoftware.com> wrote:

> > If you double the allocation every time, it's amortized linear.
> > Yes, you waste some space that way.
>
> Quadratic allocation maximizes heap fragmentation

Then instead of doubling the allocation use any multiplier that is
smaller than 1.618 (yes, the golden ratio).
1.5 seems to be easy choice.

Building a string by appending small chunks to the end seems to be a
common practice. Optimizing the library for this case is a wise
implementation strategy.

> I still don't think it is worth it, but obviously a customer example could
> change my mind.

What about a customer who is choosing the competing product? Doesn't
sound convincing enough?

--
Maciej Sobczak
http://www.msobczak.com/




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25 20:37             ` Maciej Sobczak
@ 2007-07-25 22:06               ` Georg Bauhaus
  2007-07-26  6:24                 ` Maciej Sobczak
  2007-07-26 22:11               ` Randy Brukardt
  1 sibling, 1 reply; 35+ messages in thread
From: Georg Bauhaus @ 2007-07-25 22:06 UTC (permalink / raw)


Maciej Sobczak wrote:

> Building a string by appending small chunks to the end seems to be a
> common practice. Optimizing the library for this case is a wise
> implementation strategy.

Is there some material on this? I'm wondering whether concatenating
strings is more common in languages where strings are lists, or at
least not plain arrays.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25 22:06               ` Georg Bauhaus
@ 2007-07-26  6:24                 ` Maciej Sobczak
  2007-07-26  8:09                   ` Dmitry A. Kazakov
  2007-07-26  8:35                   ` Georg Bauhaus
  0 siblings, 2 replies; 35+ messages in thread
From: Maciej Sobczak @ 2007-07-26  6:24 UTC (permalink / raw)


On 26 Lip, 00:06, Georg Bauhaus <bauhaus.rm.t...@maps.futureapps.de>
wrote:

> > Building a string by appending small chunks to the end seems to be a
> > common practice. Optimizing the library for this case is a wise
> > implementation strategy.
>
> Is there some material on this? I'm wondering whether concatenating
> strings is more common in languages where strings are lists, or at
> least not plain arrays.

Why should that matter? Do you think that the implementation details
like this one can influence the way people *think* about strings in
their programs?

We read text from beginning to the end, acquiring information. It
seems obvious that, conversely, adding information to the string is
best achieved by adding more characters to the end, at least when
human-readable content is involved.

This is what I'm actually doing when typing this post. I append
characters to what is already there (let's forget correcting typos -
automated procedures don't have to do this).

Coming back to your question about different languages - it is not the
programmer who should adapt his way of building the strings according
to how the string is implemented internally in a given language (note
that many languages don't even specify it). It is the language that
should provide the implementation that is best fitted to how
programmers express their algorithms. Why do you think Java added
StringBuilder to its library? Because the immutable String didn't
quite cut it.

--
Maciej Sobczak
http://www.msobczak.com/




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-26  6:24                 ` Maciej Sobczak
@ 2007-07-26  8:09                   ` Dmitry A. Kazakov
  2007-07-26  8:20                     ` Pascal Obry
  2007-07-26  8:35                   ` Georg Bauhaus
  1 sibling, 1 reply; 35+ messages in thread
From: Dmitry A. Kazakov @ 2007-07-26  8:09 UTC (permalink / raw)


On Wed, 25 Jul 2007 23:24:03 -0700, Maciej Sobczak wrote:

> This is what I'm actually doing when typing this post. I append
> characters to what is already there (let's forget correcting typos -
> automated procedures don't have to do this).

You append characters to words, words to sentences and sentences to
paragraphs. String is not the best possible representation for a text.

> It is the language that
> should provide the implementation that is best fitted to how
> programmers express their algorithms. Why do you think Java added
> StringBuilder to its library? Because the immutable String didn't
> quite cut it.

Maybe, but Unbounded_String is not the best way to express algorithms. As
for me I avoid using Unbounded_String. I don't like the design of.

This does not mean that other containers should not have
"accumulation"/"stream" mode. For them I tend to specify the initial size
and the multiplier factor.

Perhaps, one could introduce something like that for Unbounded_String:

   X : Unbounded_String := ...;
   for X'Initial_Size use 1024*8;  -- 1K at least
   for X'Increment_Factor use 1.5; -- Multiply by 1.5 upon expanding


-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-26  8:09                   ` Dmitry A. Kazakov
@ 2007-07-26  8:20                     ` Pascal Obry
  2007-07-26  9:59                       ` Dmitry A. Kazakov
  0 siblings, 1 reply; 35+ messages in thread
From: Pascal Obry @ 2007-07-26  8:20 UTC (permalink / raw)
  To: mailbox

Dmitry A. Kazakov a �crit :
> Perhaps, one could introduce something like that for Unbounded_String:
> 
>    X : Unbounded_String := ...;
>    for X'Initial_Size use 1024*8;  -- 1K at least
>    for X'Increment_Factor use 1.5; -- Multiply by 1.5 upon expanding

I really don't see the need for rep attributes here as Unbounded_String
is not intrinsic. So

    Initial_Size (X, 1024*8);
    Increment_Factor (X, 1.5);

looks more appropriate in this case.

And mind Randy I do not expect to have to use such artifices in my code.
I do expect a compiler to deliver the "expected" best implementation for
every supported features. I'm not in the embedded domain and frankly all
those low-level details are distractions, I prefer to concentrate on my
application and not fighting the compiler :)

The main point for Ada to me is the high level abstraction.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-26  6:24                 ` Maciej Sobczak
  2007-07-26  8:09                   ` Dmitry A. Kazakov
@ 2007-07-26  8:35                   ` Georg Bauhaus
  1 sibling, 0 replies; 35+ messages in thread
From: Georg Bauhaus @ 2007-07-26  8:35 UTC (permalink / raw)


On Wed, 2007-07-25 at 23:24 -0700, Maciej Sobczak wrote:
> On 26 Lip, 00:06, Georg Bauhaus <bauhaus.rm.t...@maps.futureapps.de>
> wrote:
> 
> > > Building a string by appending small chunks to the end seems to be a
> > > common practice. Optimizing the library for this case is a wise
> > > implementation strategy.
> >
> > Is there some material on this? I'm wondering whether concatenating
> > strings is more common in languages where strings are lists, or at
> > least not plain arrays.
> 
> Why should that matter? Do you think that the implementation details
> like this one can influence the way people *think* about strings in
> their programs?

Precisely; in fact, I think that implementation is what drives
most programmers. Also crazy, with obvious economical consequences,
but also with quality consequences. To some extent, this is inevitable.
Notably when a program is written to control a machine because then
you must concentrate of what the computer does in some detail.
This is opposed to the vague ideas of what is supposed to happen
when a programmer thinks in terms of abstractions of appending
characters to some abstraction of String (which is an array!).

I'd rather not think about the programs that use strcat() and
malloc() and free() a lot for string handling, but I'm almost certain
that these functions "influence the way many people *think* about
strings in their programs." 

> We read text from beginning to the end, acquiring information. It
> seems obvious that, conversely, adding information to the string is
> best achieved by adding more characters to the end, at least when
> human-readable content is involved.

This might be true for paper strings, but programming a text reader
is very different from reading a program (or other text) in my view.
Similar reasoning applies to typing.


> Coming back to your question about different languages

It should have been more about different implementation techniques
for strings. When a string is implemented as a doubly linked list
and not an array, the performance characteristics will likely be
different for a number of operations.  When you want to _replace_
substrings with strings of different sizes, then arrays are not
at all a good choice made by an implementation/language.
When you want to _override_ characters with other characters,
arrays may be a good choice. I see the latter choice present in
Ada systems.


>  - it is not the
> programmer who should adapt his way of building the strings according
> to how the string is implemented internally in a given language (note
> that many languages don't even specify it).

I think that type String is underspecified (in the heads of programmers)
when it comes to performance characteristics. Maybe we are
spoiled by the fact that text seems so natural that we think
we needn't worry about its implementation as we do for other types.

Ada's non-character containers are different in this regard, as
already mentioned in the thread. BTW, I have found
Ada.Containers.Vectors to be a efficient for some string processing
tasks.

> It is the language that
> should provide the implementation that is best fitted to how
> programmers express their algorithms.

Without knowing what drives the design of a string-related
algorithm it is difficult to judge whether a language's string types
have been made for the algorithm. See replace/overwrite above.


>  Why do you think Java added
> StringBuilder to its library?

And StringBuffer, right from the start!

>  Because the immutable String didn't
> quite cut it.

Yes, this is what I was thinking. Ada does not provide a StringBuilder
in the Ada.Strings hierarchy. Unbounded_String is not made for most
efficient text composition, as I think we can learn from this thread.

Maybe an Unbounded_String is very much like a Matrix.
An implementer of Matrix might choose arrays, or lists.
How could he/she know whether the objects are going to
be sparse matrices?






^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-26  8:20                     ` Pascal Obry
@ 2007-07-26  9:59                       ` Dmitry A. Kazakov
  0 siblings, 0 replies; 35+ messages in thread
From: Dmitry A. Kazakov @ 2007-07-26  9:59 UTC (permalink / raw)


On Thu, 26 Jul 2007 10:20:49 +0200, Pascal Obry wrote:

> And mind Randy I do not expect to have to use such artifices in my code.
> I do expect a compiler to deliver the "expected" best implementation for
> every supported features. I'm not in the embedded domain and frankly all
> those low-level details are distractions, I prefer to concentrate on my
> application and not fighting the compiler :)

Wholeheartedly agree
 
> The main point for Ada to me is the high level abstraction.

Yes 

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Reserve_Capacity for Unbounded_String?
  2007-07-25 20:37             ` Maciej Sobczak
  2007-07-25 22:06               ` Georg Bauhaus
@ 2007-07-26 22:11               ` Randy Brukardt
  1 sibling, 0 replies; 35+ messages in thread
From: Randy Brukardt @ 2007-07-26 22:11 UTC (permalink / raw)


"Maciej Sobczak" <see.my.homepage@gmail.com> wrote in message
news:1185395844.104043.194340@o61g2000hsh.googlegroups.com...
...
> Building a string by appending small chunks to the end seems to be a
> common practice. Optimizing the library for this case is a wise
> implementation strategy.

Yes, but not if performance matters. If performance matters, doing a lot of
small operations is the worst possible way to do it, irrespective of the
implementation of those small operations.

We chose

> > I still don't think it is worth it, but obviously a customer example
could
> > change my mind.
>
> What about a customer who is choosing the competing product? Doesn't
> sound convincing enough?

No, because customers don't chose products because of the performance of one
rarely used library call. They chose products because of price, features,
ease of use, reputation of vendor, and quality of support. And it is also
the case that they rarely tell the truth about why they chose something
(partially because they often don't really have a clear understanding of
that either). I'd rather spend our limited efforts on these factors
(especially ease of use, where I don't think Janus/Ada really measures up
very well) than chasing single customers with single changes, especially
ones that will negatively impact other usage patterns.

After all, my experience with chasing customers has uniformly been bad; they
always have a new reason for not buying once you eliminate the last one.

                                               Randy.





^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-07-26 22:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-22 19:54 Reserve_Capacity for Unbounded_String? Maciej Sobczak
2007-07-22 21:32 ` Robert A Duff
2007-07-23 19:29   ` Maciej Sobczak
2007-07-23 20:30     ` Robert A Duff
2007-07-23  4:28 ` Jeffrey R. Carter
2007-07-23 15:07 ` Adam Beneschan
2007-07-24  1:01   ` Randy Brukardt
2007-07-24  7:57     ` Pascal Obry
2007-07-24 18:58       ` Randy Brukardt
2007-07-24 23:50         ` Robert A Duff
2007-07-25  0:00           ` Randy Brukardt
2007-07-24 23:54         ` Pascal Obry
2007-07-25  0:52           ` Randy Brukardt
2007-07-25  1:28           ` Randy Brukardt
2007-07-25  7:48             ` Pascal Obry
2007-07-25  9:55               ` Georg Bauhaus
2007-07-25 10:02                 ` Georg Bauhaus
2007-07-25 18:58               ` Randy Brukardt
2007-07-25  8:50             ` Martin Krischik
2007-07-25  9:26               ` AW: " Grein, Christoph (Fa. ESG)
2007-07-25 15:32                 ` Martin Krischik
2007-07-25 15:39                 ` Martin Krischik
2007-07-24 23:41     ` Robert A Duff
2007-07-25  0:16       ` Randy Brukardt
2007-07-25  2:25         ` Robert A Duff
2007-07-25  6:07           ` Simon Wright
2007-07-25 19:08           ` Randy Brukardt
2007-07-25 20:37             ` Maciej Sobczak
2007-07-25 22:06               ` Georg Bauhaus
2007-07-26  6:24                 ` Maciej Sobczak
2007-07-26  8:09                   ` Dmitry A. Kazakov
2007-07-26  8:20                     ` Pascal Obry
2007-07-26  9:59                       ` Dmitry A. Kazakov
2007-07-26  8:35                   ` Georg Bauhaus
2007-07-26 22:11               ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox