comp.lang.ada
 help / color / mirror / Atom feed
* Re: Large strings in ADA
  2000-04-16  0:00 Large strings in ADA Johan Groth
@ 2000-04-16  0:00 ` David Starner
  2000-04-17  0:00 ` Florian Weimer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: David Starner @ 2000-04-16  0:00 UTC (permalink / raw)


On Sun, 16 Apr 2000 23:26:27 +0200, Johan Groth <jgroth@xpress.se> wrote:
>Hello,
>I'm trying to write a program that converts a couple of types to strings
>and are concatenated into one large string that is at the moment a
>unbounded_string but when the string gets about 50KB big it takes longer
>and longer to append to it.
>The code looks like below. I need to concatenate about 2.5MB of data.
>What is the fasted way to do that in ADA? Just for comparison a similar
>program in C takes about one second.

Then it's not similar. C doesn't have Unbounded_Strings and 'Image attribute,
so I doubt the two programs are all that similar. At the worst, why 
didn't/don't you transliterate that C into Ada? It won't be nice, pretty or 
friendly, but it will probably be as fast as the C version (with checks
turned off.) This below isn't similar to any C program I can concieve of -
C just won't let you write at this level.

>Can anyone help me?
>
>TIA,
>Johan
>
>procedure Main is
>   type String32 is
>     record
>	Info : String(1..32) := (others => ' ');
>	Len : Natural range 0 .. 32 := 0;
>     end record;
>
>   Str : String32;
>   No : Basic_Integer;
>   Flt : Basic_Float;
>   Msg : Unbounded_String := Null_Unbounded_String;
>   Last : Natural;
>
>   procedure Append_To_Msg (Str : in String;
>                            Max_Len : in Natural;
>                            Message : in out Asu.Unbounded_String;
>                            Last : in out Natural) is
>   begin
>      Append(Message, " ");
>      Append(Message, Natural'Image(Max_Len));
>      Append(Message, " ");
>      Append(Message, Str);
>      Last := Asu.Length (Message);

It probably really hurts to count the length of the string
over and over. In fact, this is where I would bet a lot of time
is going. Why don't you try gprof (for GNAT) or a profiler
for whatever Ada compiler you're using and see where the times
being spent?

>   end Add_To_Message;
>
>begin
>   Str.info(1..6) := "hejsan";
>   Str.Len := 6;
>   No := 10;
>   Flt := 5.5;
>   for I in 1 .. 1600 loop
>      Append_To_Msg(Str.Info, Basic_Natural(Str.Len), Msg, Last);

Why don't you pass Basic_Natural (Str.Len)'Image in? If you do so,
most compilers will do the conversion only once, instead of 1600
times.

Also, you don't mention what compiler's you're comparing, running
on what systems, with what switchs, which can make a big difference.

-- 
David Starner - dstarner98@aasaa.ofe.org
Only a nerd would worry about wrong parentheses with
square brackets. But that's what mathematicians are.
   -- Dr. Burchard, math professor at OSU




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Large strings in ADA
@ 2000-04-16  0:00 Johan Groth
  2000-04-16  0:00 ` David Starner
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Johan Groth @ 2000-04-16  0:00 UTC (permalink / raw)


Hello,
I'm trying to write a program that converts a couple of types to strings
and are concatenated into one large string that is at the moment a
unbounded_string but when the string gets about 50KB big it takes longer
and longer to append to it.
The code looks like below. I need to concatenate about 2.5MB of data.
What is the fasted way to do that in ADA? Just for comparison a similar
program in C takes about one second. 
Can anyone help me?

TIA,
Johan

procedure Main is
   type String32 is
     record
	Info : String(1..32) := (others => ' ');
	Len : Natural range 0 .. 32 := 0;
     end record;

   Str : String32;
   No : Basic_Integer;
   Flt : Basic_Float;
   Msg : Unbounded_String := Null_Unbounded_String;
   Last : Natural;

   procedure Append_To_Msg (Str : in String;
                            Max_Len : in Natural;
                            Message : in out Asu.Unbounded_String;
                            Last : in out Natural) is
   begin
      Append(Message, " ");
      Append(Message, Natural'Image(Max_Len));
      Append(Message, " ");
      Append(Message, Str);
      Last := Asu.Length (Message);
   end Add_To_Message;

begin
   Str.info(1..6) := "hejsan";
   Str.Len := 6;
   No := 10;
   Flt := 5.5;
   for I in 1 .. 1600 loop
      Append_To_Msg(Str.Info, Basic_Natural(Str.Len), Msg, Last);
      if (I mod 100) = 0 then
         Put(" " & Integer'Image(I));
      end if;
   end loop;
   Put_Line("finished");
exception when Str:
  others =>
   Put_Line("exception in main: " & Ada.Exceptions.Exception_Name(Str));
end;


-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
   "Better to ask questions and seem stupid
    than not to ask questions and remain stupid" -Unknown
           Johan Groth <jgroth@xpress.se> Kupolen Data




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-16  0:00 Large strings in ADA Johan Groth
                   ` (2 preceding siblings ...)
  2000-04-17  0:00 ` Robert Dewar
@ 2000-04-17  0:00 ` Dale Stanbrough
  2000-04-17  0:00   ` tmoran
  2000-04-17  0:00 ` Robert Dewar
  4 siblings, 1 reply; 14+ messages in thread
From: Dale Stanbrough @ 2000-04-17  0:00 UTC (permalink / raw)


Johan Groth wrote:

> The code looks like below. I need to concatenate about 2.5MB of data.
> What is the fasted way to do that in ADA? Just for comparison a similar
> program in C takes about one second. 

There can't be a similar program in C, because C does not have controlled
types. Each time you append to your unbounded_string, you could be (and 
most likely are) destroying the entire contents, and reallocating the
unbounded_string all over again.

Perhaps you should try duplicating the -exact- C semantics. Presumably 
you have a -very- large char buffer into which the items are copied.

Alternatively, if you know of an upper bound for the length of the
string, you could instantiate a Bounded_String package.


Dale




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00 ` Dale Stanbrough
@ 2000-04-17  0:00   ` tmoran
  2000-04-17  0:00     ` Johan Groth
  0 siblings, 1 reply; 14+ messages in thread
From: tmoran @ 2000-04-17  0:00 UTC (permalink / raw)


>Perhaps you should try duplicating the -exact- C semantics. Presumably
>you have a -very- large char buffer into which the items are copied.

  If the original is in C, it must have something like
     char Msg[2500000];  // 2.5 MB
so the straightforward Ada equivalent would be
     Msg : String(1 .. 2_500_000);  -- 2.5 MB

  If Appends are about K characters that will be 2.5E6/K Appends.
If Append is done with strcat, searching for the terminating null
instead of by maintaining an index, the total amount of scanning
would be 0+K+2K+3K ...  25.E6 or nearly K*(2.5E6/K)**2/2 or
3E12/K.  If the machine can scan for a null at a 10**9 byte/sec,
and that's the only thing that takes any time at all,
then K, the average string length, must be at least 3,000.

  Assuming the average string length is substantially less than
3K characters, we see that the C version must be using an index.
In that case the simple Ada equivalent would be something like:

   Msg : String(1 .. 2_500_000);
   Last : Natural := 0;

   procedure Append_To_Msg (Str : in String;
                            Max_Len : in Natural;
                            Message : in out String;
                            Last : in out Natural) is
     New_Stuff : constant String
       := " " & Natural'Image(Max_Len) & " " & Str;
   begin
     Message(Last+1 .. Last+New_Stuff'length) := New_Stuff;
     Last := Last + New_Stuff'length;
   end Add_To_Message;

I made that change, modified the given program to compilable Ada,
changed the loop from 1 .. 1600 to 1 .. 2_500_000/(32+4)
(each append stores 36 characters), compiled with gnat 3.12p NT
with -O2 -gnato, and the result runs on a Pentium 200 in 0.8 second.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-16  0:00 Large strings in ADA Johan Groth
                   ` (3 preceding siblings ...)
  2000-04-17  0:00 ` Dale Stanbrough
@ 2000-04-17  0:00 ` Robert Dewar
  4 siblings, 0 replies; 14+ messages in thread
From: Robert Dewar @ 2000-04-17  0:00 UTC (permalink / raw)


In article <38FA3003.A38D7B51@xpress.se>,
  Johan Groth <jgroth@xpress.se> wrote:

> What is the fasted way to do that in ADA? Just for comparison
> a similar program in C takes about one second.
> Can anyone help me?

Well there is no such thing as a similar program in C, since
C has no support for strings as such. So that means you have
a low level program in C that represents strings as arrays
of characters, and does its own manipulations to get the
effect of string processing. DO the same in Ada and you will
get comparable performance.


Sent via Deja.com http://www.deja.com/
Before you buy.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-16  0:00 Large strings in ADA Johan Groth
  2000-04-16  0:00 ` David Starner
  2000-04-17  0:00 ` Florian Weimer
@ 2000-04-17  0:00 ` Robert Dewar
  2000-04-17  0:00 ` Dale Stanbrough
  2000-04-17  0:00 ` Robert Dewar
  4 siblings, 0 replies; 14+ messages in thread
From: Robert Dewar @ 2000-04-17  0:00 UTC (permalink / raw)


In article <38FA3003.A38D7B51@xpress.se>,
  Johan Groth <jgroth@xpress.se> wrote:
> Hello,
> I'm trying to write a program that converts a couple of types
to strings
> and are concatenated into one large string that is at the
moment a
> unbounded_string but when the string gets about 50KB big it
takes longer
> and longer to append to it.

By the way, the use of gigantic contiguous strings is almost
always a very bad data structure choice, and if you are
interested in efficiency, you should probably start by
rethinking your data structures and using a more effective
data structure.

Since C has rather poor data structuring abilities compared
to Ada, it is not unusual to "spin your own" in C, but in
Ada there is certainly a better way of doing things.


Sent via Deja.com http://www.deja.com/
Before you buy.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00     ` Johan Groth
@ 2000-04-17  0:00       ` Robert Dewar
  2000-04-17  0:00         ` David Starner
  2000-04-17  0:00       ` Florian Weimer
  2000-04-17  0:00       ` tmoran
  2 siblings, 1 reply; 14+ messages in thread
From: Robert Dewar @ 2000-04-17  0:00 UTC (permalink / raw)


In article <38FB4521.D02EA4C9@xpress.se>,
  Johan Groth <jgroth@xpress.se> wrote:
> tmoran@bix.com wrote:
> >
> > >Perhaps you should try duplicating the -exact- C semantics.
Presumably
> > >you have a -very- large char buffer into which the items
are copied.
> >
> >   If the original is in C, it must have something like
> >      char Msg[2500000];  // 2.5 MB
> > so the straightforward Ada equivalent would be
> >      Msg : String(1 .. 2_500_000);  -- 2.5 MB
>
> The above is the current solution but it doesn't work as it
uses to much
> memory.


That's odd, what machine are you on? Almost all operating
systems use commit-on-use allocation these days.


Sent via Deja.com http://www.deja.com/
Before you buy.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00     ` Johan Groth
  2000-04-17  0:00       ` Robert Dewar
  2000-04-17  0:00       ` Florian Weimer
@ 2000-04-17  0:00       ` tmoran
  2000-04-18  0:00         ` tmoran
  2 siblings, 1 reply; 14+ messages in thread
From: tmoran @ 2000-04-17  0:00 UTC (permalink / raw)


> >      Msg : String(1 .. 2_500_000);  -- 2.5 MB
>
> The above is the current solution but it doesn't work as it uses to much
> memory.
  It seems to me 2.5MB is the smallest possible amount of memory
needed to hold 2.5MB of bytes.

> The C-program reallocates the buffer only if the string added to
> it exceeds its current size and if it reallocates it grows with a power
> of 2. So if the buffer was 8 bytes big and you added 9 bytes the buffer
> would become 32 bytes big.
  So when the buffer is 2,097,152 (2^21) bytes long, and you add
a string on your way to 2,500,000, the C program will for a moment
have a 2^21 old buffer allocated, plus a 2^22 new buffer, for a
total memory requirement of 6,291,456 bytes as the high point.  So
the C approach must be able to allocate 2.5 *times* as much memory
as the simple Ada declaration.  Why?  If it's because your system
allows a large heap, but only a small (less than 2.5MB) area for
a Msg : String(1 .. 2_500_000); declaration, then how about
  type Msgs is array(1 .. 2_500_000) of aliased Character;
  type Ptr_To_Msgs is access all Character;
  P_Msg : Ptr_To_Msgs := new Msgs;
and then use the buffer at P_Msg instead of Msg.
  Note that the "double the size and reallocate" technique requires
copying the old buffer each time.  It will copy nearly 2^22=4M of
old bytes, in addition to the 2.5Mb of new bytes.  Even on a fast
machine, that's a chunk.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00       ` Robert Dewar
@ 2000-04-17  0:00         ` David Starner
  0 siblings, 0 replies; 14+ messages in thread
From: David Starner @ 2000-04-17  0:00 UTC (permalink / raw)


On Mon, 17 Apr 2000 19:30:19 GMT, Robert Dewar <robert_dewar@my-deja.com> wrote:
>In article <38FB4521.D02EA4C9@xpress.se>,
>  Johan Groth <jgroth@xpress.se> wrote:
>> tmoran@bix.com wrote:
>> >
>> > >Perhaps you should try duplicating the -exact- C semantics.
>Presumably
>> > >you have a -very- large char buffer into which the items
>are copied.
>> >
>> >   If the original is in C, it must have something like
>> >      char Msg[2500000];  // 2.5 MB
>> > so the straightforward Ada equivalent would be
>> >      Msg : String(1 .. 2_500_000);  -- 2.5 MB
>>
>> The above is the current solution but it doesn't work as it
>uses to much
>> memory.
>
>
>That's odd, what machine are you on? Almost all operating
>systems use commit-on-use allocation these days.

Doesn't this try to get 2.5 MB from the stack, though? The
C equivelent does, and I assumed that most Ada compilers
(esp. GNAT) would just allocate arrays declared in a function/
procedure like that on the stack.

-- 
David Starner - dstarner98@aasaa.ofe.org
Only a nerd would worry about wrong parentheses with
square brackets. But that's what mathematicians are.
   -- Dr. Burchard, math professor at OSU




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-16  0:00 Large strings in ADA Johan Groth
  2000-04-16  0:00 ` David Starner
@ 2000-04-17  0:00 ` Florian Weimer
  2000-04-17  0:00 ` Robert Dewar
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2000-04-17  0:00 UTC (permalink / raw)


Johan Groth <jgroth@xpress.se> writes:

>    for I in 1 .. 1600 loop
>       Append_To_Msg(Str.Info, Basic_Natural(Str.Len), Msg, Last);
>       if (I mod 100) = 0 then
>          Put(" " & Integer'Image(I));
>       end if;
>    end loop;

For example, the GNAT implementation of Unbounded_String, results in
four allocator calls and copy operations per iteration, and this is
expensive.  Perhaps you should switch to a different string-like type
which preallocates storage.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00   ` tmoran
@ 2000-04-17  0:00     ` Johan Groth
  2000-04-17  0:00       ` Robert Dewar
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Johan Groth @ 2000-04-17  0:00 UTC (permalink / raw)


tmoran@bix.com wrote:
> 
> >Perhaps you should try duplicating the -exact- C semantics. Presumably
> >you have a -very- large char buffer into which the items are copied.
> 
>   If the original is in C, it must have something like
>      char Msg[2500000];  // 2.5 MB
> so the straightforward Ada equivalent would be
>      Msg : String(1 .. 2_500_000);  -- 2.5 MB

The above is the current solution but it doesn't work as it uses to much
memory. The C-program reallocates the buffer only if the string added to
it exceeds its current size and if it reallocates it grows with a power
of 2. So if the buffer was 8 bytes big and you added 9 bytes the buffer
would become 32 bytes big.

So I will try to use the C-sematics and let the message-buffer grow the
same way and use replace_slice to add the string to the buffer.

/Johan

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
   "Better to ask questions and seem stupid
    than not to ask questions and remain stupid" -Unknown
           Johan Groth <jgroth@xpress.se> Kupolen Data




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00     ` Johan Groth
  2000-04-17  0:00       ` Robert Dewar
@ 2000-04-17  0:00       ` Florian Weimer
  2000-04-17  0:00       ` tmoran
  2 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2000-04-17  0:00 UTC (permalink / raw)


Johan Groth <jgroth@xpress.se> writes:

> So I will try to use the C-sematics and let the message-buffer grow the
> same way 

Yes, that's a good idea.  But I don't understand how this solution is
related to the semantics of the programming language C. ;)

> and use replace_slice to add the string to the buffer.

Replace_Slice is unnecessary in this case, a simple array assignment
will do the job.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-17  0:00       ` tmoran
@ 2000-04-18  0:00         ` tmoran
  2000-04-22  0:00           ` Johan Groth
  0 siblings, 1 reply; 14+ messages in thread
From: tmoran @ 2000-04-18  0:00 UTC (permalink / raw)


Oops, I meant:
  type Ptr_To_Msgs is access String;
  P_Msg : Ptr_To_Msgs := new String(1 .. 2_500_000);




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Large strings in ADA
  2000-04-18  0:00         ` tmoran
@ 2000-04-22  0:00           ` Johan Groth
  0 siblings, 0 replies; 14+ messages in thread
From: Johan Groth @ 2000-04-22  0:00 UTC (permalink / raw)


tmoran@bix.com wrote:
> 
> Oops, I meant:
>   type Ptr_To_Msgs is access String;
>   P_Msg : Ptr_To_Msgs := new String(1 .. 2_500_000);

That is the solution I use now.

/Johan

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
   "Better to ask questions and seem stupid
    than not to ask questions and remain stupid" -Unknown
           Johan Groth <jgroth@xpress.se> Kupolen Data




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2000-04-22  0:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-16  0:00 Large strings in ADA Johan Groth
2000-04-16  0:00 ` David Starner
2000-04-17  0:00 ` Florian Weimer
2000-04-17  0:00 ` Robert Dewar
2000-04-17  0:00 ` Dale Stanbrough
2000-04-17  0:00   ` tmoran
2000-04-17  0:00     ` Johan Groth
2000-04-17  0:00       ` Robert Dewar
2000-04-17  0:00         ` David Starner
2000-04-17  0:00       ` Florian Weimer
2000-04-17  0:00       ` tmoran
2000-04-18  0:00         ` tmoran
2000-04-22  0:00           ` Johan Groth
2000-04-17  0:00 ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox