comp.lang.ada
 help / color / mirror / Atom feed
* GNAT's stack checking in Ubuntu 9.04  (and Shootout regex-dna)
@ 2009-08-03 22:54 Georg Bauhaus
  2009-08-03 22:56 ` Georg Bauhaus
  2009-08-04 11:59 ` Brian Drummond
  0 siblings, 2 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-03 22:54 UTC (permalink / raw)


The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
produces storage errors where GNAT on Debian Lenny
(GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
This happens with larger data structures.

The workaround is to introduce an indirection ... (.-)
Or at least I got the regex-dna test at the Shootout fixed
by introducing a pointer type and an allocator, diff
appended below.

If we use GNAT's Spitbol Patterns for matching
(from the Ada 2005 GNAT #4 version)
and then the technique of the Ada 2005 GNAT #3
version for match-replace (Bj�rn Persson's variation),
we will get a really, really fast regex-dna.
#4 is a lot faster than #3 when matching,
#3 is a lot faster when replacing matches.


The diff that get's #4 going again on Ubuntu 9.04,
look for type Dna_Lines_Pointer:


--- regexdna.gnat       6d8829eff6a278659ecb6ebc0f03672662de39b8
+++ regexdna.gnat       09cbee8b34b6bc62a921d8bfc66e2f18cb57eeb6
@@ -120,7 +120,8 @@ begin
       Num_Lines := Num_Lines + 1;
    end if;
    declare
-      Sequence_Lines : Dna_Lines(1..Num_Lines);
+      type Dna_Lines_Pointer is access Dna_Lines;  -- avoids stack issues
+      Sequence_Lines : Dna_Lines_Pointer := new Dna_Lines(1..Num_Lines);
       Low, Sub_Len : Natural;
    begin
       -- Distribute Sequence to Sequence_Lines
@@ -149,7 +150,7 @@ begin
       New_Line;
       Put(Item => Code_Length, Width => 1);
       New_Line;
-      Put(Item => Length(Sequence_Lines), Width => 1);
+      Put(Item => Length(Sequence_Lines.all), Width => 1);
       New_Line;
    end;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04  (and Shootout regex-dna)
  2009-08-03 22:54 GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna) Georg Bauhaus
@ 2009-08-03 22:56 ` Georg Bauhaus
  2009-08-04  7:50   ` Ludovic Brenta
  2009-08-04 11:59 ` Brian Drummond
  1 sibling, 1 reply; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-03 22:56 UTC (permalink / raw)


Georg Bauhaus wrote:
> The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
> produces storage errors where GNAT on Debian Lenny
> (GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
The latter is 4.1.3, sorry.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-03 22:56 ` Georg Bauhaus
@ 2009-08-04  7:50   ` Ludovic Brenta
  2009-08-04  9:17     ` Georg Bauhaus
  0 siblings, 1 reply; 24+ messages in thread
From: Ludovic Brenta @ 2009-08-04  7:50 UTC (permalink / raw)


Georg Bauhaus wrote on comp.lang.ada:
> Georg Bauhaus wrote:
> > The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
> > produces storage errors where GNAT on Debian Lenny
> > (GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
>
> The latter is 4.1.3, sorry.

From the symptoms you describe and the workaround you found, it seems
to me not related to the compiler but rather to the way the kernel is
configured. It looks like Ubuntu's default stack size is much smaller
than on Debian or Windows. What does "ulimit -s" say on your Ubuntu
and Debian?

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04  7:50   ` Ludovic Brenta
@ 2009-08-04  9:17     ` Georg Bauhaus
  2009-08-04  9:58       ` Vadim Godunko
  2009-08-04 15:38       ` Robert A Duff
  0 siblings, 2 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-04  9:17 UTC (permalink / raw)


Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> Georg Bauhaus wrote:
>>> The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
>>> produces storage errors where GNAT on Debian Lenny
>>> (GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
>> The latter is 4.1.3, sorry.
> 
> From the symptoms you describe and the workaround you found, it seems
> to me not related to the compiler but rather to the way the kernel is
> configured. It looks like Ubuntu's default stack size is much smaller
> than on Debian or Windows. What does "ulimit -s" say on your Ubuntu
> and Debian?

8192 is the default output of ulimit -s on Ubuntu.

I had set ulimit -s $((8192 * 64)) on both Ubuntu 9.04
and Debian 5 and both exhibit the same symptoms.  Why?

More guesswork: Is an empty Unbounded_String a heavy
construct?  Observing the memory consumption of
regexdna, for reading and matching the ~50 Mo input
it does not consume more than about ~60 Mo, according
to free(1) and top.  Then, the stack allocation of the lines
array seems to hit some memory boundary. (625_000 default
initialized Vstrings, which is a subtype of Unbounded_String.)
valgrind (in default mode) did not find anything IIRC.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04  9:17     ` Georg Bauhaus
@ 2009-08-04  9:58       ` Vadim Godunko
  2009-08-04 10:44         ` Georg Bauhaus
  2009-08-04 22:20         ` Egil
  2009-08-04 15:38       ` Robert A Duff
  1 sibling, 2 replies; 24+ messages in thread
From: Vadim Godunko @ 2009-08-04  9:58 UTC (permalink / raw)


On Aug 4, 1:17 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
>
> 8192 is the default output of ulimit -s on Ubuntu.
>
> I had set ulimit -s $((8192 * 64)) on both Ubuntu 9.04
> and Debian 5 and both exhibit the same symptoms.  Why?
>
This size is too small, you can setup unlimited size by

ulimit -s unlimited

> More guesswork: Is an empty Unbounded_String a heavy
> construct?  Observing the memory consumption of
> regexdna, for reading and matching the ~50 Mo input
> it does not consume more than about ~60 Mo, according
> to free(1) and top.  Then, the stack allocation of the lines
> array seems to hit some memory boundary. (625_000 default
> initialized Vstrings, which is a subtype of Unbounded_String.)
> valgrind (in default mode) did not find anything IIRC.

Unbounded_String uses constant size of stack for object itself and
uses dynamic size on heap for actual value. But note, concatenation
operations of strings and arrays can use stack for intermediate
result, thus can be the source of problem.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04  9:58       ` Vadim Godunko
@ 2009-08-04 10:44         ` Georg Bauhaus
  2009-08-04 12:30           ` Vadim Godunko
  2009-08-04 22:20         ` Egil
  1 sibling, 1 reply; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-04 10:44 UTC (permalink / raw)


Vadim Godunko schrieb:
> On Aug 4, 1:17 pm, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
>> 8192 is the default output of ulimit -s on Ubuntu.
>>
>> I had set ulimit -s $((8192 * 64)) on both Ubuntu 9.04
>> and Debian 5 and both exhibit the same symptoms.  Why?
>>
> This size is too small,

What's the reason?

> you can setup unlimited size by
> 
> ulimit -s unlimited

The machine happens to have 500 Mo of RAM; I had
thought that would be sufficient for 2-3 times
50 Mo of input when processing is assumed to be
mostly linear (at worst doubling each string).
The program does fine in ulimit -s 8192 when the
Dna_Lines array is allocated on the heap, so the
overall amount of memory is not the problem.

It hits the limit exaclty when the array of 625_000
default initialized Vstrings is allocated on the stack.
No operations performed at that point.


> Unbounded_String uses constant size of stack for object itself and
> uses dynamic size on heap for actual value.

Yes. How much does a default initialized Unbounded_String
use when it is an array component? (I'll debug this when
there is more time; just in case you happen to know,
would be most helpful.)


> But note, concatenation
> operations of strings and arrays can use stack for intermediate
> result, thus can be the source of problem.

This was one of the fixes for the K-Nucleotide test.
(Do the lower -> upper character mapping in A.S.Unbounded,
then call To_String; not: call To_String and feed that
to To_Upper.)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04  (and Shootout regex-dna)
  2009-08-03 22:54 GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna) Georg Bauhaus
  2009-08-03 22:56 ` Georg Bauhaus
@ 2009-08-04 11:59 ` Brian Drummond
  2009-08-04 14:18   ` Georg Bauhaus
  2009-08-09 19:13   ` Georg Bauhaus
  1 sibling, 2 replies; 24+ messages in thread
From: Brian Drummond @ 2009-08-04 11:59 UTC (permalink / raw)


On Tue, 04 Aug 2009 00:54:11 +0200, Georg Bauhaus
<rm.tsoh.plus-bug.bauhaus@maps.futureapps.de> wrote:

>The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
>produces storage errors where GNAT on Debian Lenny
>(GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
>This happens with larger data structures.
>
>The workaround is to introduce an indirection ... (.-)

Default stack sizes may change between GCC versions and I've had trouble getting
the stack size flags to work on some versions. 

I've done this too. What I don't like is having to change every use of the
indirect variable, so I confine the changes to the declaration...

>--- regexdna.gnat       6d8829eff6a278659ecb6ebc0f03672662de39b8
>+++ regexdna.gnat       09cbee8b34b6bc62a921d8bfc66e2f18cb57eeb6
>@@ -120,7 +120,8 @@ begin
>       Num_Lines := Num_Lines + 1;
>    end if;
>    declare
>-      Sequence_Lines : Dna_Lines(1..Num_Lines);
>+      type Dna_Lines_Pointer is access Dna_Lines;  -- avoids stack issues
>+     -- Sequence_Lines : Dna_Lines_Pointer := new Dna_Lines(1..Num_Lines);
       Sequence_Lines_ptr : Dna_Lines_Pointer := new Dna_Lines(1..Num_Lines);
       Sequence_Lines : Dna_Lines(1..Num_Lines) renames Sequence_Lines_ptr.all;

and now no changes are necessary to the program (which may access the variable
many times)

>      Put(Item => Length(Sequence_Lines), Width => 1);

- Brian




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04 10:44         ` Georg Bauhaus
@ 2009-08-04 12:30           ` Vadim Godunko
  2009-08-04 14:15             ` Georg Bauhaus
  0 siblings, 1 reply; 24+ messages in thread
From: Vadim Godunko @ 2009-08-04 12:30 UTC (permalink / raw)


On Aug 4, 2:44 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> It hits the limit exaclty when the array of 625_000
> default initialized Vstrings is allocated on the stack.
> No operations performed at that point.
>
> > Unbounded_String uses constant size of stack for object itself and
> > uses dynamic size on heap for actual value.
>
> Yes. How much does a default initialized Unbounded_String
> use when it is an array component? (I'll debug this when
> there is more time; just in case you happen to know,
> would be most helpful.)
>
Its very easy to discover:

with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
package Tst is
   type T is array (Positive range <>) of Unbounded_String;
   X : T (1 .. 625_000);
end Tst;

> gcc -c -gnatR3 tst.ads

Representation information for unit Tst (spec)
----------------------------------------------

for T'Alignment use 16;
for T'Component_Size use 512;

for X'Size use 320000000;
for X'Alignment use 16;

So, on my platform (OpenSUSE Linux/x86_64) array of Unbounded_String
with 625_000 components occupy approximately 40Mb.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04 12:30           ` Vadim Godunko
@ 2009-08-04 14:15             ` Georg Bauhaus
  0 siblings, 0 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-04 14:15 UTC (permalink / raw)


Vadim Godunko schrieb:
>> How much does a default initialized Unbounded_String
>> use when it is an array component?

> Its very easy to discover:
> 
> with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
> package Tst is
>    type T is array (Positive range <>) of Unbounded_String;
>    X : T (1 .. 625_000);
> end Tst;
> 
>> gcc -c -gnatR3 tst.ads

Darn, yes, -gnatR!  Had forgotten, thanks.

> Representation information for unit Tst (spec)
> ----------------------------------------------



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04  (and Shootout regex-dna)
  2009-08-04 11:59 ` Brian Drummond
@ 2009-08-04 14:18   ` Georg Bauhaus
  2009-08-09 19:13   ` Georg Bauhaus
  1 sibling, 0 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-04 14:18 UTC (permalink / raw)


Brian Drummond schrieb:

>>    declare
>> -      Sequence_Lines : Dna_Lines(1..Num_Lines);
>> +      type Dna_Lines_Pointer is access Dna_Lines;  -- avoids stack issues
>> +     -- Sequence_Lines : Dna_Lines_Pointer := new Dna_Lines(1..Num_Lines);
>        Sequence_Lines_ptr : Dna_Lines_Pointer := new Dna_Lines(1..Num_Lines);
>        Sequence_Lines : Dna_Lines(1..Num_Lines) renames Sequence_Lines_ptr.all;
> 
> and now no changes are necessary to the program (which may access the variable
> many times)
> 
>>      Put(Item => Length(Sequence_Lines), Width => 1);

Cool... I had already felt comfortable with .all being optional
when there are parens or tick marks, but this one's nice. Thanks.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04  9:17     ` Georg Bauhaus
  2009-08-04  9:58       ` Vadim Godunko
@ 2009-08-04 15:38       ` Robert A Duff
  1 sibling, 0 replies; 24+ messages in thread
From: Robert A Duff @ 2009-08-04 15:38 UTC (permalink / raw)


Georg Bauhaus <rm.tsoh.plus-bug.bauhaus@maps.futureapps.de> writes:

> More guesswork: Is an empty Unbounded_String a heavy
> construct?

If you're trying to make a program fast, you almost certainly don't want
to be using unbounded strings, or any other controlled types.  At least
not in any time-critical parts.

Controlled types use a lot of memory, and are quite slow.

AdaCore is working on improving this situation, but controlled types
will always be somewhat heavy.

- Bob



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-04  9:58       ` Vadim Godunko
  2009-08-04 10:44         ` Georg Bauhaus
@ 2009-08-04 22:20         ` Egil
  1 sibling, 0 replies; 24+ messages in thread
From: Egil @ 2009-08-04 22:20 UTC (permalink / raw)


On Aug 4, 11:58 am, Vadim Godunko <vgodu...@gmail.com> wrote:
> On Aug 4, 1:17 pm, Georg Bauhaus <rm.tsoh.plus-bug.bauh...@maps.futureapps.de> wrote:
>
>
> This size is too small, you can setup unlimited size by
>
> ulimit -s unlimited
>

This only sets the stacksize for the environment task.
For default stacksize of other tasks in your program,
use these binder options:  (-bargs in gnatmake)

  -dnn[k|m] Default primary stack size = nn [kilo|mega] bytes
  -Dnn[k|m] Default secondary stack size = nnn [kilo|mega] bytes


(for recent versions of gnat)


--
~egilhh



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04  (and Shootout regex-dna)
  2009-08-04 11:59 ` Brian Drummond
  2009-08-04 14:18   ` Georg Bauhaus
@ 2009-08-09 19:13   ` Georg Bauhaus
  2009-08-10 13:10     ` jonathan
  2009-08-10 20:12     ` jonathan
  1 sibling, 2 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-09 19:13 UTC (permalink / raw)


Brian Drummond wrote:

>> The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
>> produces storage errors where GNAT on Debian Lenny
>> (GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
>> This happens with larger data structures.
>>
>> The workaround is to introduce an indirection ... (.-)
> 
> Default stack sizes may change between GCC versions and I've had trouble getting
> the stack size flags to work on some versions. 


Two new versions of regex-dna are available. One of them
performs both pattern matching and match-replace concurrently.

Both are fixed WRT stack limits, using the heap now.

I would be grateful if somebody who has access to a quad
core or single core machine could run these.  If you would
like to change the number of tasks look for the constant
Max_CPUs which is set to 4.

Multitasking version:
http://home.arcor.de/bauhaus/Ada/regexdna-multi.ada

Just the fixed original:
http://home.arcor.de/bauhaus/Ada/regexdna.ada



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-09 19:13   ` Georg Bauhaus
@ 2009-08-10 13:10     ` jonathan
  2009-08-10 20:12     ` jonathan
  1 sibling, 0 replies; 24+ messages in thread
From: jonathan @ 2009-08-10 13:10 UTC (permalink / raw)


On Aug 9, 8:13 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Brian Drummond wrote:
> >> The GNAT that comes with Ubuntu 9.04 (GCC 4.3.3)
> >> produces storage errors where GNAT on Debian Lenny
> >> (GCC 4.3.2) and GNAT 2007 on Windows (4.3.1) don't.
> >> This happens with larger data structures.
>
> >> The workaround is to introduce an indirection ... (.-)
>
> > Default stack sizes may change between GCC versions and I've had trouble getting
> > the stack size flags to work on some versions.
>
> Two new versions of regex-dna are available. One of them
> performs both pattern matching and match-replace concurrently.
>
> Both are fixed WRT stack limits, using the heap now.
>
> I would be grateful if somebody who has access to a quad
> core or single core machine could run these.  If you would
> like to change the number of tasks look for the constant
> Max_CPUs which is set to 4.
>
> Multitasking version:http://home.arcor.de/bauhaus/Ada/regexdna-multi.ada
>
> Just the fixed original:http://home.arcor.de/bauhaus/Ada/regexdna.ada


The new regexdna.adb worked perfectly on 1 core and
4 core tests. (I'm using Debian Lenny.)

regexdna.adb gave the right answers using both the
new GNAT GPL (uses gcc 4.3.4) and the Debian Lenny GNAT.

Actually, the old single-core regexdna.adb, (downloaded
from the benchmark site, gnat # 3), also worked fine on
both compilers (again on Debian Lenny). This older
regexdna.adb uses GNAT.regpat. It ran 20% slower than the
new GNAT.Spitbol version.

Jonathan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-09 19:13   ` Georg Bauhaus
  2009-08-10 13:10     ` jonathan
@ 2009-08-10 20:12     ` jonathan
  2009-08-10 20:29       ` Ludovic Brenta
                         ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: jonathan @ 2009-08-10 20:12 UTC (permalink / raw)


On Aug 9, 8:13 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:


I was going to give up on regexdna.adb, but I had to
try one last thing.  (That's the trouble with these
danged benchmarks).  I wanted to recompile the
GNAT.spitbol packages .. the quickest thing I could
think of was copy the source into my working directory
and rename them. GNAT.spitbol to GNAT_spitbol, and
GNAT.spitbol.patterns to GNAT_spitbol.patterns.
The advantage of this is you can modify them.  The
first thing I tried worked rather well: I replaced

   Stack_Size : constant Positive := 2000;

with

   Stack_Size : constant Positive := 129;

in GNAT_spitbol.patterns.

The program running time (single-core) dropped from
47 seconds to 32 seconds.  (You can set Stack_Size to
17, with all validity checks enabled, and still get
the right answer.) (Used the latest GNAT GPL, which
seems to have fewer optimizations disabled than
earlier GNAT's. Also, I ran it on a 3.2 GHz 4-core
INTEL .. faster than the one used at that
benchmarking site.)

I used gnatmake with

-O3 -gnatnp -funroll-all-loops -march=native regexdna.adb

Unfortunately, you need to do more work to
get earlier GNAT's to do this ... missing
"System.String_Hash" or something.

Last remark: someone with experience using the
spitbol routines might be able to optimize
further.  There are tricks using "anchors"
mentioned in spec of GNAT.spitbol.patterns.


Jonathan





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-10 20:12     ` jonathan
@ 2009-08-10 20:29       ` Ludovic Brenta
  2009-08-10 23:34       ` Georg Bauhaus
  2009-08-11  0:27       ` Georg Bauhaus
  2 siblings, 0 replies; 24+ messages in thread
From: Ludovic Brenta @ 2009-08-10 20:29 UTC (permalink / raw)


jonathan wrote on comp.lang.ada:
[Using one's own copy of GNAT.Spitbol.*]
> The first thing I tried worked rather well: I replaced
>
>    Stack_Size : constant Positive := 2000;
>
> with
>
>    Stack_Size : constant Positive := 129;
>
> in GNAT_spitbol.patterns.
>
> The program running time (single-core) dropped from
> 47 seconds to 32 seconds.
[...]
> Last remark: someone with experience using the
> spitbol routines might be able to optimize
> further.  There are tricks using "anchors"
> mentioned in spec of GNAT.spitbol.patterns.

Oh, I think you just struck the right chord here, Jonathan. Georg has
been harping abouf GNAT.Spitbol for a long time now :) Thanks for your
interesting feedback; I have to use strict discipline not to get
involved in those addictive benchmarks myself :)

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-10 20:12     ` jonathan
  2009-08-10 20:29       ` Ludovic Brenta
@ 2009-08-10 23:34       ` Georg Bauhaus
  2009-08-11 20:02         ` jonathan
  2009-08-11  0:27       ` Georg Bauhaus
  2 siblings, 1 reply; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-10 23:34 UTC (permalink / raw)


jonathan wrote:

> I was going to give up on regexdna.adb, but I had to
> try one last thing. [...]
>    Stack_Size : constant Positive := 2000;
> 
> with
> 
>    Stack_Size : constant Positive := 129;
> 
> in GNAT_spitbol.patterns.

> The program running time (single-core) dropped [...]

Interesting.

Does regexdna-multi.ada work on your system?

On a dual core system it seems to be shortening
the running time by almost a third.

(The slowdown in general is in the match-replace part;
the (first) matching part seems to be performing well.)

> Last remark: someone with experience using the
> spitbol routines might be able to optimize
> further.  There are tricks using "anchors"
> mentioned in spec of GNAT.spitbol.patterns.

Yes, I went up and down all kinds of anchoring and
saving cursor positions and using a function pointer
whose function would replace behind the scene and then
return False (like in the Match case) thus advancing
without requiring the loop in the match-replace part
(relying on the backtracking mechanism which would not
trigger backtracking, since the function returned False).
I also tried a Match and Replace pair.
But the results always seemed to indicate this
issue with VString and replacements:

(1) the match-replace is dominated by VString
which is Unbounded_String which is slow replacing
string slices no matter how the replacement is done.
(See also Bob Duff's remarks on Unbounded_String
in this thread.)
Constructing a new VString, for example, using Append
did not help (I was hoping that no shifting of characters
would be necessary).

(2) the lines in the match-replace section are reasonably
short,  Any("A") is a correspondingly short linear search.
The captured match is exactly the one character that
is replaced.
While the search is restarted from the string's
beginning in every loop, anchoring didn't seem to
make things faster.  For example,
 Pos(Saved_Cursor)  -- saved after last match
  & Break("A")
  & Len(1)
  & Setcur(Saved_Cursor'Access)
will in general capture a span of characters not
to be replaced.  This then requires either copying
the saved parts or saving the cursor position and
calling the Replace procedure with a Match object.
Works, but has been invariably slower in both cases.

But there is no end to ideas using SPITBOL patterns,
so yes, maybe someone has another idea? :-)

SNOBOL-4:

There is a great implementation of SPITBOL by
Phil Budne at http://www.snobol4.org/

I learned about SNOBOL-4 some years ago through
the GNAT packages, and immediately wanted to learn it,
which I did.  A pleasant experience---if you can avoid
debugging patterns :-)  Tables, Lists, user defined
types, type conversions, CODE, computed goto,
multiple function entry points,  and all this accessible
everywhere including in patterns! :)
I liked the SNOBOL-4 introduction by Susan Hockey,
Gimpel's book is a SNOBOL-4 treasure trove.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-10 20:12     ` jonathan
  2009-08-10 20:29       ` Ludovic Brenta
  2009-08-10 23:34       ` Georg Bauhaus
@ 2009-08-11  0:27       ` Georg Bauhaus
  2009-08-11 19:05         ` jonathan
  2 siblings, 1 reply; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-11  0:27 UTC (permalink / raw)


jonathan wrote:
> The first thing I tried worked rather well: I replaced
> 
>    Stack_Size : constant Positive := 2000;
> 
> with
> 
>    Stack_Size : constant Positive := 129;
> 
> in GNAT_spitbol.patterns.

Works here, too, thanks.  Worth a few seconds off 40.
While changing the library is not part of the game,
this is useful in general, I'd think, since the
comment next to Stack_Size invites us to adjust it to
fit a program's needs.

Another way of using modified copies of GNAT library
units seems to be to use -a ; GNAT then considers
the library unit in my source directory.  (There is
probably some risk when the options change from those
used for compiling the library, though.)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-11  0:27       ` Georg Bauhaus
@ 2009-08-11 19:05         ` jonathan
  2009-08-12  9:32           ` Ludovic Brenta
  0 siblings, 1 reply; 24+ messages in thread
From: jonathan @ 2009-08-11 19:05 UTC (permalink / raw)


On Aug 11, 1:27 am, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
>
> Another way of using modified copies of GNAT library
> units seems to be to use -a ; GNAT then considers
> the library unit in my source directory.  (There is
> probably some risk when the options change from those
> used for compiling the library, though.)


Thanks, -a is right way to go here.  Final report  (for now):


The following remarks apply to single-core runs using
the GNAT 4.3.2 that comes with Debian Lenny (based
on gcc 4.3.2). I've been saying that GNAT 4.3.2 produces
less well optimized executables, but I was not quite
right in this case as I explain below.

Step 1.  Compile without the -a switch:

  -O3 -gnatnp -funroll-all-loops -march=native regexdna.adb

  Running time of ./regexdna: 63 seconds.

  Comment: compilation near instanteous; I suppose it is
  linking to precompiled  Gnat.Spitbol.Pattern  packages.

Step 2.  Compile with the -a switch:

  -O3 -a -gnatnp -funroll-all-loops -march=native regexdna.adb

  Running time of ./regexdna: 51 seconds.

  Comment: Compilation is slow. Looks like it is recompiling
  the  Gnat.Spitbol.Pattern  packages.

  So it looks like the performance problem I was complaining
  about was due to the suboptimal precompiled Spitbol.Pattern.
  In fact it was almost *entirely* due to the sub-optimal
  Spitbol.Pattern, because running time is now close
  to the running time I get from the GNAT GPL compilation.
  (The GNAT GPL I downloaded was for 64-bit machines. I
  would speculate that the precompiled Spitbol.Pattern
  used by the Debian Lenny GNAT was for more general
  architectures.)

Step 3. Copy source code of the spec of   Gnat.Spitbol.Pattern
        into working directory. (I found the source directory by
        using the -gnatu switch during compilation.) You don't
        need to change its name (g-spipat.ads).
        Then edit the  g-spipat.ads  in working directory and
        change Stack_Size from 2000 to (say) 37.

  cp /usr/lib/gcc/x86_64-linux-gnu/4.3.2/adainclude/g-spipat.ads g-
spipat.ads

  Compile with the -a switch:

  -O3 -a -gnatnp -funroll-all-loops -march=native regexdna.adb

  Running time of ./regexdna: 32 seconds.

  Comment:
  Now the compiler uses the g-spipat.ads (Gnat.Spitbol.Pattern)
  in the working directory, ignores the one in adainclude, and
  recompiles the Spitbol packages.  The 32 seconds is almost the
  same as the running time of the GNAT GPL compilation. (Actually,
  the GNAT GPL compilation runs in 30.5 sec if Stack_Size is 37
  instead of 129. So it is a bit better, but not nearly as much
  as I thought.)

>
>While changing the library is not part of the game,
>this is useful in general, I'd think, since the
>comment next to Stack_Size invites us to adjust it to
>fit a program's needs.
>

Well, I do advocate using a modified g-spipat.ads.
Just append the modified g-spipat.ads to regexdna.adb,
give the usual gnatchop instruction, and the -a switch
for gnatmake.

It seems to me that this is completely kosher.

Looking (quickly) through the other submissions, I see
several common practices: writing your own library modules,
linking to language standard library modules, and linking
to various to 3rd party library modules that have
nothing to do with a particular language standard.

I notice for example that fortran and pascal submissions
will simply append their own source code versions of
hashing modules. The language doesn't provide them.

Another practice is to link with the boost library
for regex or for threads.  Some link to OpenMP for
multiprocessing IIUC.  The Ada pidigits.adb even
links to GMP using interfaces.C.

Jonathan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-10 23:34       ` Georg Bauhaus
@ 2009-08-11 20:02         ` jonathan
  2009-08-11 21:19           ` jonathan
  2009-08-11 21:38           ` Georg Bauhaus
  0 siblings, 2 replies; 24+ messages in thread
From: jonathan @ 2009-08-11 20:02 UTC (permalink / raw)


On Aug 11, 12:34 am, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:

> Does regexdna-multi.ada work on your system?
>
> On a dual core system it seems to be shortening
> the running time by almost a third.
>
Yes, regexdna-multi.ada so far works
on the compiler and machine combinations I've tried.

With

  Stack_Size : constant Positive := 37;

I get 31 second running time on 1-core, 15 second
running time on 4-cores.

The tasking-free version uses 30.5 seconds on 1-core.
This is a 3.2 GHz 4-core machine.

Actually, as I was writing that I remembered its really
an 8-core machine.  SO just for fun I set

  Max_CPUs : constant := 7;

and it ran in 12 seconds.  In fact it ran in 12 seconds
for 5, 6, 7 and 8 CPU's.  Then I couldn't resist.
I set it to 15:

  Max_CPUs : constant := 15;

It ran  in 9 seconds.  I couldn't believe it. I
timed it with my watch also.  The best time was
8.7 seconds. In fact all settings of Max_CPUs
from 9 to 15 ran in  9 seconds. At Max_CPUs = 24
it was 9.7 sec.

Jonathan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-11 20:02         ` jonathan
@ 2009-08-11 21:19           ` jonathan
  2009-08-11 21:38           ` Georg Bauhaus
  1 sibling, 0 replies; 24+ messages in thread
From: jonathan @ 2009-08-11 21:19 UTC (permalink / raw)


On Aug 11, 9:02 pm, jonathan <johns...@googlemail.com> wrote:
> On Aug 11, 12:34 am, Georg Bauhaus <rm.tsoh.plus-
>
> bug.bauh...@maps.futureapps.de> wrote:
> > Does regexdna-multi.ada work on your system?
>
> > On a dual core system it seems to be shortening
> > the running time by almost a third.

Final report (this time I mean it):

I tested regexdna-multi.ada on an older dual-core machine
I have access to.  It is almost exactly half as fast (per core)
as the quad-core I used previously.  Main question: is
3 tasks better than 2 at exploiting parallelism. On the
8-core machine, 9 tasks was optimal.

On the dual-core the answer is yes, but the benefit is
much smaller here:

1 task running time: 59 sec
2 task running time: 37.5 sec
3 task running time: 35 sec

where number of tasks is set by Max_CPUs.

Jonathan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-11 20:02         ` jonathan
  2009-08-11 21:19           ` jonathan
@ 2009-08-11 21:38           ` Georg Bauhaus
  1 sibling, 0 replies; 24+ messages in thread
From: Georg Bauhaus @ 2009-08-11 21:38 UTC (permalink / raw)


jonathan wrote:
>  In fact all settings of Max_CPUs
> from 9 to 15 ran in  9 seconds. At Max_CPUs = 24
> it was 9.7 sec.

Thanks, Jonathan, for these comments.
Your findings make me think that 9 is "magical"
because there are 9 patterns:  The algorithm is
such that it would try to arrange for 9 rendezvous
if there are 9 or more task objects available.
If this turns out to be true, I can simplify the
algorithm, getting rid of a clumsy loop that
distributes work to Max_CPUs task objects.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-11 19:05         ` jonathan
@ 2009-08-12  9:32           ` Ludovic Brenta
  2009-08-12 16:37             ` jonathan
  0 siblings, 1 reply; 24+ messages in thread
From: Ludovic Brenta @ 2009-08-12  9:32 UTC (permalink / raw)


jonathan wrote on comp.lang.ada:
> Step 1.  Compile without the -a switch:
>
>   -O3 -gnatnp -funroll-all-loops -march=native regexdna.adb
>
>   Running time of ./regexdna: 63 seconds.
>
>   Comment: compilation near instanteous; I suppose it is
>   linking to precompiled  Gnat.Spitbol.Pattern  packages.
>
> Step 2.  Compile with the -a switch:
>
>   -O3 -a -gnatnp -funroll-all-loops -march=native regexdna.adb
>
>   Running time of ./regexdna: 51 seconds.
>
>   Comment: Compilation is slow. Looks like it is recompiling
>   the  Gnat.Spitbol.Pattern  packages.
>
>   So it looks like the performance problem I was complaining
>   about was due to the suboptimal precompiled Spitbol.Pattern.
>   In fact it was almost *entirely* due to the sub-optimal
>   Spitbol.Pattern, because running time is now close
>   to the running time I get from the GNAT GPL compilation.
>   (The GNAT GPL I downloaded was for 64-bit machines. I
>   would speculate that the precompiled Spitbol.Pattern
>   used by the Debian Lenny GNAT was for more general
>   architectures.)

If you use the amd64 port of Debian Lenny, your compiler is optimized
for 64-bit. However, I use the default options for the library. For
example, the i386 (actually i486) build daemon compiled g-spipat.adb
thus:

/build/buildd-gnat-4.3_4.3.2-1.1-i386-3dUv7K/gnat-4.3-4.3.2/build/./
gcc/xgcc -B/build/buildd-gnat-4.3_4.3.2-1.1-i386-3dUv7K/gnat-4.3-4.3.2/
build/./gcc/ -B/usr/i486-linux-gnu/bin/ -B/usr/i486-linux-gnu/lib/ -
isystem /usr/i486-linux-gnu/include -isystem /usr/i486-linux-gnu/sys-
include -c -g -O2      -W -Wall -gnatpg  g-spipat.adb -o g-spipat.o

If anyone is interested, I can try to compile with -O3 -funroll-all-
loops -gnatpg instead.

--
Ludovic Brenta (maintainer of gnat in Debian).



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna)
  2009-08-12  9:32           ` Ludovic Brenta
@ 2009-08-12 16:37             ` jonathan
  0 siblings, 0 replies; 24+ messages in thread
From: jonathan @ 2009-08-12 16:37 UTC (permalink / raw)


On Aug 12, 10:32 am, Ludovic Brenta <ludo...@ludovic-brenta.org>
wrote:

> If you use the amd64 port of Debian Lenny, your compiler is optimized
> for 64-bit. However, I use the default options for the library. For
> example, the i386 (actually i486) build daemon compiled g-spipat.adb
> thus:
>
> /build/buildd-gnat-4.3_4.3.2-1.1-i386-3dUv7K/gnat-4.3-4.3.2/build/./
> gcc/xgcc -B/build/buildd-gnat-4.3_4.3.2-1.1-i386-3dUv7K/gnat-4.3-4.3.2/
> build/./gcc/ -B/usr/i486-linux-gnu/bin/ -B/usr/i486-linux-gnu/lib/ -
> isystem /usr/i486-linux-gnu/include -isystem /usr/i486-linux-gnu/sys-
> include -c -g -O2      -W -Wall -gnatpg  g-spipat.adb -o g-spipat.o
>
> If anyone is interested, I can try to compile with -O3 -funroll-all-
> loops -gnatpg instead.
>

Using the default options for g-spipat.adb sounds reasonable to me.
The user can recompile with -a and with make-benchmark-happy flags,
if he wants to experiment. I doubt that the benchmarking happy-flags
are the safest way to go. (And -funroll-all-loops hardly matters:
seems like 3% improvement. (I finally got around to checking this;)
The
gnatn helped 8%, which was more than I expected.)

Jonathan






^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-08-12 16:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-03 22:54 GNAT's stack checking in Ubuntu 9.04 (and Shootout regex-dna) Georg Bauhaus
2009-08-03 22:56 ` Georg Bauhaus
2009-08-04  7:50   ` Ludovic Brenta
2009-08-04  9:17     ` Georg Bauhaus
2009-08-04  9:58       ` Vadim Godunko
2009-08-04 10:44         ` Georg Bauhaus
2009-08-04 12:30           ` Vadim Godunko
2009-08-04 14:15             ` Georg Bauhaus
2009-08-04 22:20         ` Egil
2009-08-04 15:38       ` Robert A Duff
2009-08-04 11:59 ` Brian Drummond
2009-08-04 14:18   ` Georg Bauhaus
2009-08-09 19:13   ` Georg Bauhaus
2009-08-10 13:10     ` jonathan
2009-08-10 20:12     ` jonathan
2009-08-10 20:29       ` Ludovic Brenta
2009-08-10 23:34       ` Georg Bauhaus
2009-08-11 20:02         ` jonathan
2009-08-11 21:19           ` jonathan
2009-08-11 21:38           ` Georg Bauhaus
2009-08-11  0:27       ` Georg Bauhaus
2009-08-11 19:05         ` jonathan
2009-08-12  9:32           ` Ludovic Brenta
2009-08-12 16:37             ` jonathan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox