comp.lang.ada
 help / color / mirror / Atom feed
* Ada Shootout program for K-Nucleotide (patches)
@ 2009-08-01 12:21 Georg Bauhaus
  2009-08-01 12:59 ` Ludovic Brenta
                   ` (8 more replies)
  0 siblings, 9 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-01 12:21 UTC (permalink / raw)


This is about the K-Nucleotide program at the
Computer Language Benchmark Game,
http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&lang=gnat&id=1
where some Ada programs have started to fail.

In order to have one of them work again, I have patched
knucleotide.gnat, with some success.  New version is here:
http://home.arcor.de/bauhaus/Ada/knucleotide.gnat

Comments?  Does it work on your machine?

The two changes:

1 - [stack exhaustion] a loop reading input lines into an
 ubounded  string replaces the recursive String "&"ing
 procedure. (Noting that the input text file is ~240MB ...)

2 - [heavy bounded strings] a lightweight bounded string ADT
 replaces an instance of Generic_Bounded_Length, yielding
 an improved running time of ~22s down from ~30s for
 the 2,500,000 case.

Still, on a relatively small virtual machine running 64bit
Debian, the program cannot handle the 25,000,000 case.





^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
@ 2009-08-01 12:59 ` Ludovic Brenta
  2009-08-01 13:59   ` Georg Bauhaus
  2009-08-01 13:50 ` Gautier write-only
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 116+ messages in thread
From: Ludovic Brenta @ 2009-08-01 12:59 UTC (permalink / raw)


Georg Bauhaus wrote on comp.lang.ada:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> where some Ada programs have started to fail.
>
> In order to have one of them work again, I have patched
> knucleotide.gnat, with some success.  New version is
> here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat

Is there a version control system repository for these submissions?

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
  2009-08-01 12:59 ` Ludovic Brenta
@ 2009-08-01 13:50 ` Gautier write-only
  2009-08-01 15:21 ` Ludovic Brenta
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 116+ messages in thread
From: Gautier write-only @ 2009-08-01 13:50 UTC (permalink / raw)


Cool - just did some changes last two weeks (pidigits, fasta,
mandelbrot).
Will test it soon.
BTW, I have made a small generic procedure to test massively two
versions of a procedure, to see which one is faster, hopefully net of
whatever cache effects. I'll post it here - maybe someone is
interested ?
_________________________________________________________
Gautier's Ada programming -- http://sf.net/users/gdemont/
NB: For a direct answer, e-mail address on the Web site!



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:59 ` Ludovic Brenta
@ 2009-08-01 13:59   ` Georg Bauhaus
  0 siblings, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-01 13:59 UTC (permalink / raw)


Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> This is about the K-Nucleotide program at the
>> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>> where some Ada programs have started to fail.
>>
>> In order to have one of them work again, I have patched
>> knucleotide.gnat, with some success.  New version is
>> here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
> 
> Is there a version control system repository for these submissions?

There is a public CVS for checkout and browsing.

Submissions by normal people need to be made using
a reasonably bureaucratic Web form if things have not
changed.  A list of FAQ is
http://shootout.alioth.debian.org/u32/faq.php



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
  2009-08-01 12:59 ` Ludovic Brenta
  2009-08-01 13:50 ` Gautier write-only
@ 2009-08-01 15:21 ` Ludovic Brenta
  2009-08-01 15:37   ` Gautier write-only
                     ` (2 more replies)
  2009-08-01 17:07 ` jonathan
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 116+ messages in thread
From: Ludovic Brenta @ 2009-08-01 15:21 UTC (permalink / raw)


Georg Bauhaus wrote on comp.lang.ada:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> where some Ada programs have started to fail.
>
> In order to have one of them work again, I have patched
> knucleotide.gnat, with some success.  New version is here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
>
> Comments?  Does it work on your machine?
>
> The two changes:
>
> 1 - [stack exhaustion] a loop reading input lines into an
>  ubounded  string replaces the recursive String "&"ing
>  procedure. (Noting that the input text file is ~240MB ...)
>
> 2 - [heavy bounded strings] a lightweight bounded string ADT
>  replaces an instance of Generic_Bounded_Length, yielding
>  an improved running time of ~22s down from ~30s for
>  the 2,500,000 case.
>
> Still, on a relatively small virtual machine running 64bit
> Debian, the program cannot handle the 25,000,000 case.

I note that both Martin and yourself use Ada.Text_IO (especially
Read_Line) which is notoriously slow.  I would use Character'Read and
a finite state machine to detect the portion of the input file to be
processed, then to fill in the buffer.  That should be faster.

(the shootout rules state that the program must read "line by line"
but that does not make sense since Unix has no concept of a line and
all programs must therefore read and parse each byte to detect ends of
lines).

--
Ludovic Brenta (too busy to help any further, alas).



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 15:21 ` Ludovic Brenta
@ 2009-08-01 15:37   ` Gautier write-only
  2009-08-02 12:55   ` Georg Bauhaus
  2009-08-03  8:56   ` Jacob Sparre Andersen
  2 siblings, 0 replies; 116+ messages in thread
From: Gautier write-only @ 2009-08-01 15:37 UTC (permalink / raw)


On 1 Aug., 17:21, Ludovic Brenta <ludo...@ludovic-brenta.org>:

> I note that both Martin and yourself use Ada.Text_IO (especially
> Read_Line) which is notoriously slow.  I would use Character'Read and
> a finite state machine to detect the portion of the input file to be
> processed, then to fill in the buffer.  That should be faster.
>
> (the shootout rules state that the program must read "line by line"
> but that does not make sense since Unix has no concept of a line and
> all programs must therefore read and parse each byte to detect ends of
> lines).

Some Ada shootout programs also use GNAT.IO, it might help.
_________________________________________________________
Gautier's Ada programming -- http://sf.net/users/gdemont/
NB: For a direct answer, e-mail address on the Web site!



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (2 preceding siblings ...)
  2009-08-01 15:21 ` Ludovic Brenta
@ 2009-08-01 17:07 ` jonathan
  2009-08-01 17:59 ` jonathan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-01 17:07 UTC (permalink / raw)


On Aug 1, 1:21 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> where some Ada programs have started to fail.
>
> In order to have one of them work again, I have patched
> knucleotide.gnat, with some success.  New version is here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
>
> Comments?  Does it work on your machine?
>
> The two changes:
>
> 1 - [stack exhaustion] a loop reading input lines into an
>  ubounded  string replaces the recursive String "&"ing
>  procedure. (Noting that the input text file is ~240MB ...)
>
> 2 - [heavy bounded strings] a lightweight bounded string ADT
>  replaces an instance of Generic_Bounded_Length, yielding
>  an improved running time of ~22s down from ~30s for
>  the 2,500,000 case.
>
> Still, on a relatively small virtual machine running 64bit
> Debian, the program cannot handle the 25,000,000 case.

The 250Meg version runs fine on my machine, assuming
I did this right...can't make head of tails of the
rules.  I used ./fasta to make the
data files:

  ./fasta 2_500_000 > fasta25.dat     #made a 25Meg file
  ./fasta 25_000_000 > fasta250.dat   #made a 250Meg file

   compiler:          GNAT 4.3.2 (comes with Lenny, thanks to Ludovic
& co)
   Operating system:  Debian Linux, Lenny.
   Processor:         Intel x86-64 (Xeon X5460 @ 3.16GHz).

Running the bench:

  time ./knucleotide < fasta25.dat   # 7.5 sec

  time ./knucleotide < fasta250.dat  # 76.1 sec

Oh, and the new version runs twice as fast as the old - the
old took 14 seconds on fasta25.dat.

Jonathan





^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (3 preceding siblings ...)
  2009-08-01 17:07 ` jonathan
@ 2009-08-01 17:59 ` jonathan
  2009-08-02 11:39   ` Georg Bauhaus
  2009-08-04  8:23 ` Martin
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-01 17:59 UTC (permalink / raw)




I forgot to add I used:

  gnatmake -gnatnp -O3 -funroll-all-loops -march=native knucleotide

with a running time of (usually)  7.25 sec on the 25Meg data file:

    time ./knucleotide < fasta25.dat

j.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 17:59 ` jonathan
@ 2009-08-02 11:39   ` Georg Bauhaus
  2009-08-02 16:00     ` jonathan
  2009-09-03 13:44     ` Olivier Scalbert
  0 siblings, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-02 11:39 UTC (permalink / raw)


jonathan wrote:
> 
> I forgot to add I used:
> 
>   gnatmake -gnatnp -O3 -funroll-all-loops -march=native knucleotide
> 
> with a running time of (usually)  7.25 sec on the 25Meg data file:
> 
>     time ./knucleotide < fasta25.dat

Thanks for running the program, glad to hear how they work.

Meanwhile I have a tasking version which runs a little (25%)
faster.  The two new changes are

1 - make 3 additional tasks call Write; round robin

2 - order the results using a protected Printer object
 with entry families.  Heck, these entry families have clever
 features! :-)


http://home.arcor.de/bauhaus/Ada/knucleotide.tasking.gnat



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 15:21 ` Ludovic Brenta
  2009-08-01 15:37   ` Gautier write-only
@ 2009-08-02 12:55   ` Georg Bauhaus
  2009-08-27 11:17     ` Ole-Hjalmar Kristensen
  2009-08-03  8:56   ` Jacob Sparre Andersen
  2 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-02 12:55 UTC (permalink / raw)


Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> This is about the K-Nucleotide program at the
>> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>> where some Ada programs have started to fail.
>>
>> In order to have one of them work again, I have patched
>> knucleotide.gnat, with some success.  New version is here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
>>
>> Comments?  Does it work on your machine?
>>
>> The two changes:
>>
>> 1 - [stack exhaustion] a loop reading input lines into an
>>  ubounded  string replaces the recursive String "&"ing
>>  procedure. (Noting that the input text file is ~240MB ...)
>>
>> 2 - [heavy bounded strings] a lightweight bounded string ADT
>>  replaces an instance of Generic_Bounded_Length, yielding
>>  an improved running time of ~22s down from ~30s for
>>  the 2,500,000 case.
>>
>> Still, on a relatively small virtual machine running 64bit
>> Debian, the program cannot handle the 25,000,000 case.
> 
> I note that both Martin and yourself use Ada.Text_IO (especially
> Read_Line) which is notoriously slow.  I would use Character'Read and
> a finite state machine to detect the portion of the input file to be
> processed, then to fill in the buffer.  That should be faster.

Not sure that Character'Read will really be faster.
The following Getline has left me unconvinced.
I made a simple test just reading big files and
measuring the time; Text_IO.Get_Line wins on Windows
and even more or Debian.

Also, unless there is bug in the program below,
using Streams and then child packages of Text_IO
for output formatting, seems to confuse the
I/O system.  (Would we be shooting ourselves in
the foot anyway?)

GNAT.IO is, I think, getting us in trouble; it doesn't
signal End_Error, if I'm not mistaken.


with Ada.Text_IO.Text_Streams;

procedure Getline (Item: in out String; Last : out Natural) is
   --
   --  Input via Character'Read.
   --  Assume Unix or Windows line ending conventions.
   --
   Stdin: constant Ada.Text_IO.Text_Streams.Stream_Access :=
     Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Standard_Input);
   C : Character;
begin
   Last := Item'First - 1;

   loop
      Character'Read (Stdin, C);
      exit when C = ASCII.LF;
      if C /= ASCII.CR then
         Last := Last + 1;
         Item (Last) := C;
      end if;
      exit when Last = Item'Last;
   end loop;

end Getline;



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-02 11:39   ` Georg Bauhaus
@ 2009-08-02 16:00     ` jonathan
  2009-08-02 16:42       ` jonathan
  2009-09-03 13:44     ` Olivier Scalbert
  1 sibling, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-02 16:00 UTC (permalink / raw)


On Aug 2, 12:39 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> jonathan wrote:
>
> > I forgot to add I used:
>
> >   gnatmake -gnatnp -O3 -funroll-all-loops -march=native knucleotide
>
> > with a running time of (usually)  7.25 sec on the 25Meg data file:
>
> >     time ./knucleotide < fasta25.dat
>
> Thanks for running the program, glad to hear how they work.
>
> Meanwhile I have a tasking version which runs a little (25%)
> faster.  The two new changes are
>
> 1 - make 3 additional tasks call Write; round robin
>
> 2 - order the results using a protected Printer object
>  with entry families.  Heck, these entry families have clever
>  features! :-)
>
> http://home.arcor.de/bauhaus/Ada/knucleotide.tasking.gnat

The new version worked like a charm.  I checked the web page, and
the output is correct.  Your latest version gave an additional
70% speedup on my machine, so:

     time ./knucleotide < fasta25.dat    # 25  Meg data file

ran in 4.2 seconds (fastest result), and

     time ./knucleotide < fasta250.dat   # 250 Meg data file

ran in  43 seconds.  I used

   gnatmake -gnatnp -O3 -march=native knucleotide

The -march=native was unnecesssary, as was the -funroll-all-loops.
My machine is a quadcore like the test machine, but more
recent and faster in GHz.  Still, the results look very respectable.

Jonathan





^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-02 16:00     ` jonathan
@ 2009-08-02 16:42       ` jonathan
  0 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-02 16:42 UTC (permalink / raw)


On Aug 2, 5:00 pm, jonathan <johns...@googlemail.com> wrote:
> The new version worked like a charm.  I checked the web page, and
> the output is correct.  Your latest version gave an additional
> 70% speedup on my machine, so:
>
>      time ./knucleotide < fasta25.dat    # 25  Meg data file
>
> ran in 4.2 seconds (fastest result), and
>
>      time ./knucleotide < fasta250.dat   # 250 Meg data file
>
> ran in  43 seconds.  I used
>
>    gnatmake -gnatnp -O3 -march=native knucleotide
>

And I neglected to say that the timings above are wallclock times.
The several tasks ran in parallel on the 4 cores - the actual CPU
time accumulated over the several cores was than twice the 43 seconds.
If this is not allowed by the rules (single core supposedly) then
old version is faster, because it has no tasking overhead.  On the
other hand, some of the other programs seem to be threaded.

j.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 15:21 ` Ludovic Brenta
  2009-08-01 15:37   ` Gautier write-only
  2009-08-02 12:55   ` Georg Bauhaus
@ 2009-08-03  8:56   ` Jacob Sparre Andersen
  2009-08-03 11:43     ` Georg Bauhaus
  2 siblings, 1 reply; 116+ messages in thread
From: Jacob Sparre Andersen @ 2009-08-03  8:56 UTC (permalink / raw)


Ludovic Brenta wrote:

> I note that both Martin and yourself use Ada.Text_IO (especially
> Read_Line) which is notoriously slow.  I would use Character'Read
> and a finite state machine to detect the portion of the input file
> to be processed, then to fill in the buffer.  That should be faster.

Wouldn't it be even faster to map the data file to a string (using
POSIX.Memory_Mapping.Map_Memory), and then "read" from that string?

I have a ready-to-use package and demonstration procedure for this
purpose on <http://edb.jacob-sparre.dk/Ada/memory_maps/>, if somebody
is interested.

Greetings,

Jacob
-- 
"It is a syntax error to write FORTRAN while not wearing a blue tie."



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03  8:56   ` Jacob Sparre Andersen
@ 2009-08-03 11:43     ` Georg Bauhaus
  2009-08-03 12:36       ` Jacob Sparre Andersen
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-03 11:43 UTC (permalink / raw)


Jacob Sparre Andersen schrieb:
> Ludovic Brenta wrote:
> 
>> I note that both Martin and yourself use Ada.Text_IO (especially
>> Read_Line) which is notoriously slow.  I would use Character'Read
>> and a finite state machine to detect the portion of the input file
>> to be processed, then to fill in the buffer.  That should be faster.

Some tests require that all lines are read, and kept.

> Wouldn't it be even faster to map the data file to a string (using
> POSIX.Memory_Mapping.Map_Memory), and then "read" from that string?

According to the FAQ, "programs are expected to either take a single
command-line parameter or read text from stdin."
"Each program was run as a child-process of a Python script using Popen"
There also is a "preference for plain vanilla programs".
(E.g., the pidigits description permits using the GMP library,
as an exception, I think.)

It seems, then, that Ada has not the best built in support
for programs that should quickly read and write lines of chars
from and into pipes, respectively.


> I have a ready-to-use package and demonstration procedure for this
> purpose on <http://edb.jacob-sparre.dk/Ada/memory_maps/>, if somebody
> is interested.
> 
> Greetings,
> 
> Jacob



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03 11:43     ` Georg Bauhaus
@ 2009-08-03 12:36       ` Jacob Sparre Andersen
  2009-08-03 18:31         ` Isaac Gouy
  0 siblings, 1 reply; 116+ messages in thread
From: Jacob Sparre Andersen @ 2009-08-03 12:36 UTC (permalink / raw)


Georg Bauhaus wrote:
> Jacob Sparre Andersen schrieb:

> > Wouldn't it be even faster to map the data file to a string (using
> > POSIX.Memory_Mapping.Map_Memory), and then "read" from that
> > string?
> 
> According to the FAQ, "programs are expected to either take a single
> command-line parameter or read text from stdin."

If we interpret this as the choice of the implementer, then there is
no problem.  If the programs have to be able to read from stdin, then
memory mapped files aren't of much use.

> "Each program was run as a child-process of a Python script using
> Popen"

That's not a problem.

> There also is a "preference for plain vanilla programs".

I would definitely consider POSIX plain vanilla on a Linux system.

> It seems, then, that Ada has not the best built in support for
> programs that should quickly read and write lines of chars from and
> into pipes, respectively.

Agreed.

Greetings,

Jacob
-- 
"We will be restoring normality as soon as we are sure what is normal anyway."



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03 12:36       ` Jacob Sparre Andersen
@ 2009-08-03 18:31         ` Isaac Gouy
  2009-08-03 20:29           ` Robert A Duff
  2009-08-03 20:47           ` Ludovic Brenta
  0 siblings, 2 replies; 116+ messages in thread
From: Isaac Gouy @ 2009-08-03 18:31 UTC (permalink / raw)


On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
> Georg Bauhaus wrote:
> > Jacob Sparre Andersen schrieb:
> > > Wouldn't it be even faster to map the data file to a string (using
> > > POSIX.Memory_Mapping.Map_Memory), and then "read" from that
> > > string?
>
> > According to the FAQ, "programs are expected to either take a single
> > command-line parameter or read text from stdin."
>
> If we interpret this as the choice of the implementer, then there is
> no problem.  If the programs have to be able to read from stdin, then
> memory mapped files aren't of much use.


"read line-by-line a redirected FASTA format file from stdin"

http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&lang=all#about




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03 18:31         ` Isaac Gouy
@ 2009-08-03 20:29           ` Robert A Duff
  2009-08-04 15:43             ` Isaac Gouy
  2009-08-03 20:47           ` Ludovic Brenta
  1 sibling, 1 reply; 116+ messages in thread
From: Robert A Duff @ 2009-08-03 20:29 UTC (permalink / raw)


Isaac Gouy <igouy2@yahoo.com> writes:

> On Aug 3, 5:36�am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
>> If we interpret this as the choice of the implementer, then there is
>> no problem. �If the programs have to be able to read from stdin, then
>> memory mapped files aren't of much use.
>
> "read line-by-line a redirected FASTA format file from stdin"
>
> http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&lang=all#about

What exactly does that mean?

Would it be "cheating" to read all of stdin until reaching end-of-file,
and then break the result up into lines (by whatever means)?

What if it said "read line-by-line from a file named on the command
line"?  Would it then be "cheating" to mmap the file, and then
break it up into lines?

- Bob



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03 18:31         ` Isaac Gouy
  2009-08-03 20:29           ` Robert A Duff
@ 2009-08-03 20:47           ` Ludovic Brenta
  1 sibling, 0 replies; 116+ messages in thread
From: Ludovic Brenta @ 2009-08-03 20:47 UTC (permalink / raw)


Isaac Gouy wrote on comp.lang.ada:
> Jacob Sparre Andersen wrote:
>
> > Georg Bauhaus wrote:
> > > Jacob Sparre Andersen schrieb:
> > > > Wouldn't it be even faster to map the data file to a string (using
> > > > POSIX.Memory_Mapping.Map_Memory), and then "read" from that
> > > > string?
>
> > > According to the FAQ, "programs are expected to either take a single
> > > command-line parameter or read text from stdin."
>
> > If we interpret this as the choice of the implementer, then there is
> > no problem.  If the programs have to be able to read from stdin, then
> > memory mapped files aren't of much use.
>
> "read line-by-line a redirected FASTA format file from stdin"
>
> http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...

It is impossible to read line-by-line on Unix because Unix has no
concept of a line.  Instead, every program must read fixed-size chunks
from the file (allowing for the possibility that a read returns fewer
than the requested number of bytes) and interpret the bytes read to
detect "end of lines" in ASCII text.  The smallest chunk that can be
read is one byte.  Whether this is done inside the language's
predefined library, in a third-party library or directly in a program
is irrelevant; the mechanism is always the same.

On the other hand, standard input is a stream of potentialy infinite
length and it is impossible to map a stream in memory.  That is why I
suggested using Ada's stream facility instead of Ada.Text_IO.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (4 preceding siblings ...)
  2009-08-01 17:59 ` jonathan
@ 2009-08-04  8:23 ` Martin
  2009-08-16 15:20 ` jonathan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 116+ messages in thread
From: Martin @ 2009-08-04  8:23 UTC (permalink / raw)


On Aug 1, 1:21 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> where some Ada programs have started to fail.
>
> In order to have one of them work again, I have patched
> knucleotide.gnat, with some success.  New version is here:http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
>
> Comments?  Does it work on your machine?
>
> The two changes:
>
> 1 - [stack exhaustion] a loop reading input lines into an
>  ubounded  string replaces the recursive String "&"ing
>  procedure. (Noting that the input text file is ~240MB ...)
>
> 2 - [heavy bounded strings] a lightweight bounded string ADT
>  replaces an instance of Generic_Bounded_Length, yielding
>  an improved running time of ~22s down from ~30s for
>  the 2,500,000 case.
>
> Still, on a relatively small virtual machine running 64bit
> Debian, the program cannot handle the 25,000,000 case.

Is the line...:

      return Left.Last = Right.Last
	and then Left.Data(1 .. Left.Last) = Right.Data(1 .. Right.Last);

...doing the 1st check twice? Once explicitly and then won't it be
checked again in 'Standard."=" (L, R : String) return Boolean'?

It might be faster to do the check if it is more likely to be False
than True (could be cheaper than a call to Standard."=") but it might
be worth testing out...

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-03 20:29           ` Robert A Duff
@ 2009-08-04 15:43             ` Isaac Gouy
  2009-08-04 16:06               ` Isaac Gouy
  0 siblings, 1 reply; 116+ messages in thread
From: Isaac Gouy @ 2009-08-04 15:43 UTC (permalink / raw)


On Aug 3, 1:29 pm, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
> Isaac Gouy <igo...@yahoo.com> writes:
> > On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
> >> If we interpret this as the choice of the implementer, then there is
> >> no problem.  If the programs have to be able to read from stdin, then
> >> memory mapped files aren't of much use.
>
> > "read line-by-line a redirected FASTA format file from stdin"
>
> >http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>
> What exactly does that mean?
>
> Would it be "cheating" to read all of stdin until reaching end-of-file,
> and then break the result up into lines (by whatever means)?

iirc the default buffer size for Java reader is 8192 bytes - use that.


> What if it said "read line-by-line from a file named on the command
> line"?  Would it then be "cheating" to mmap the file, and then
> break it up into lines?

It doesn't say that.

Contribute a program that expects to find a file name on the command
line and the program will fail when it doesn't find a file name on the
command line.





^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-04 15:43             ` Isaac Gouy
@ 2009-08-04 16:06               ` Isaac Gouy
  2009-08-04 17:08                 ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Isaac Gouy @ 2009-08-04 16:06 UTC (permalink / raw)


On Aug 4, 8:43 am, Isaac Gouy <igo...@yahoo.com> wrote:
> On Aug 3, 1:29 pm, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
>
> > Isaac Gouy <igo...@yahoo.com> writes:
> > > On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
> > >> If we interpret this as the choice of the implementer, then there is
> > >> no problem.  If the programs have to be able to read from stdin, then
> > >> memory mapped files aren't of much use.
>
> > > "read line-by-line a redirected FASTA format file from stdin"
>
> > >http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>
> > What exactly does that mean?
>
> > Would it be "cheating" to read all of stdin until reaching end-of-file,
> > and then break the result up into lines (by whatever means)?
>
> iirc the default buffer size for Java reader is 8192 bytes - use that.


Better yet, the C++ programn seems to do just fine with this:


char buffer[128];




> > What if it said "read line-by-line from a file named on the command
> > line"?  Would it then be "cheating" to mmap the file, and then
> > break it up into lines?
>
> It doesn't say that.
>
> Contribute a program that expects to find a file name on the command
> line and the program will fail when it doesn't find a file name on the
> command line.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-04 16:06               ` Isaac Gouy
@ 2009-08-04 17:08                 ` Georg Bauhaus
  2009-08-05 15:58                   ` Isaac Gouy
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-04 17:08 UTC (permalink / raw)


Isaac Gouy schrieb:
> On Aug 4, 8:43 am, Isaac Gouy <igo...@yahoo.com> wrote:
>> On Aug 3, 1:29 pm, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
>>
>>> Isaac Gouy <igo...@yahoo.com> writes:
>>>> On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
>>>>> If we interpret this as the choice of the implementer, then there is
>>>>> no problem.  If the programs have to be able to read from stdin, then
>>>>> memory mapped files aren't of much use.
>>>> "read line-by-line a redirected FASTA format file from stdin"
>>>> http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>>> What exactly does that mean?
>>> Would it be "cheating" to read all of stdin until reaching end-of-file,
>>> and then break the result up into lines (by whatever means)?
>> iirc the default buffer size for Java reader is 8192 bytes - use that.
> 
> 
> Better yet, the C++ programn seems to do just fine with this:
> 
> 
> char buffer[128];


But the more pressing question is, to me at least---others
will have better ideas---what functionality may we use for
reading, say, 128 char values?  Can we import OS functions?
Or the POSIX library? I have already install libflorist in
a new testing VM, in order to compare Text_IO agains POSIX
input.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-04 17:08                 ` Georg Bauhaus
@ 2009-08-05 15:58                   ` Isaac Gouy
  2009-08-05 21:18                     ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Isaac Gouy @ 2009-08-05 15:58 UTC (permalink / raw)


On Aug 4, 10:08 am, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> Isaac Gouy schrieb:
>
>
>
> > On Aug 4, 8:43 am, Isaac Gouy <igo...@yahoo.com> wrote:
> >> On Aug 3, 1:29 pm, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
>
> >>> Isaac Gouy <igo...@yahoo.com> writes:
> >>>> On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
> >>>>> If we interpret this as the choice of the implementer, then there is
> >>>>> no problem.  If the programs have to be able to read from stdin, then
> >>>>> memory mapped files aren't of much use.
> >>>> "read line-by-line a redirected FASTA format file from stdin"
> >>>>http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> >>> What exactly does that mean?
> >>> Would it be "cheating" to read all of stdin until reaching end-of-file,
> >>> and then break the result up into lines (by whatever means)?
> >> iirc the default buffer size for Java reader is 8192 bytes - use that.
>
> > Better yet, the C++ programn seems to do just fine with this:
>
> > char buffer[128];
>
> But the more pressing question is, to me at least---others
> will have better ideas---what functionality may we use for
> reading, say, 128 char values?  Can we import OS functions?
> Or the POSIX library? I have already install libflorist in
> a new testing VM, in order to compare Text_IO agains POSIX
> input.


And was that an improvement?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 15:58                   ` Isaac Gouy
@ 2009-08-05 21:18                     ` Georg Bauhaus
  2009-08-05 22:04                       ` Ludovic Brenta
  2009-08-06  0:07                       ` Isaac Gouy
  0 siblings, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-05 21:18 UTC (permalink / raw)


Isaac Gouy wrote:
> On Aug 4, 10:08 am, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
>> Isaac Gouy schrieb:
>>
>>
>>
>>> On Aug 4, 8:43 am, Isaac Gouy <igo...@yahoo.com> wrote:
>>>> On Aug 3, 1:29 pm, Robert A Duff <bobd...@shell01.TheWorld.com> wrote:
>>>>> Isaac Gouy <igo...@yahoo.com> writes:
>>>>>> On Aug 3, 5:36 am, Jacob Sparre Andersen <spa...@nbi.dk> wrote:
>>>>>>> If we interpret this as the choice of the implementer, then there is
>>>>>>> no problem.  If the programs have to be able to read from stdin, then
>>>>>>> memory mapped files aren't of much use.
>>>>>> "read line-by-line a redirected FASTA format file from stdin"
>>>>>> http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>>>>> What exactly does that mean?
>>>>> Would it be "cheating" to read all of stdin until reaching end-of-file,
>>>>> and then break the result up into lines (by whatever means)?
>>>> iirc the default buffer size for Java reader is 8192 bytes - use that.
>>> Better yet, the C++ programn seems to do just fine with this:
>>> char buffer[128];
>> But the more pressing question is, to me at least---others
>> will have better ideas---what functionality may we use for
>> reading, say, 128 char values?  Can we import OS functions?
>> Or the POSIX library? I have already install libflorist in
>> a new testing VM, in order to compare Text_IO agains POSIX
>> input.
> 
> 
> And was that an improvement?

Yes.

Using GNAT's binding to C streams improves running time
of fasta 25000000 from ~14s to ~10s on Ubuntu 9.04 64bit
running in VMware. ($ time ./fasta 25000000 > /dev/null)
The result is even more noticeable when either version's
output is redirected into a fresh (new) file on disk.
(~42s versus ~20s).
Using Florist should end up making similar calls,
so I expect similar improvements (TBD, soon).

The change is small (to fasta #1 as is):

1 - Add a small Print procedure that takes an Ada string,
copies it into a C string with \n and \0, and sends that
to fputs(3).

2 - Print then replaces every occurence of Text_IO.Put_Line

(I.e., even when building new strings from Ada strings for
the C functions the speedup seems quite noticeable.  There
are no occurrences of unchecked conversions or other high
speed techniques in the change (yet?).)

Might a change like this be acceptable?  (Obviously, fgets
could similarly be used when reading input in other programs.)

*** fasta.ada	old
--- fasta.ada	new
***************
*** 23 ****
! with Ada.Text_IO; use Ada.Text_IO;
--- 23,25 ----
! with Interfaces.C_Streams;  use Interfaces.C_Streams;
! with Interfaces.C;  use Interfaces.C;
! with Ada.IO_Exceptions;
***************
*** 74 ****
--- 77,84 ----
+    procedure Print (Item : String) is
+       Data : char_array renames To_C (Item & ASCII.LF);
+    begin
+       if fputs (Data'Address, stdout) < 0 then
+          raise Ada.IO_Exceptions.Device_Error;
+       end if;
+    end Print;
+
***************
*** 82 ****
!       Put_Line (">" & Id & ' ' & Desc);
--- 92 ----
!       Print (">" & Id & ' ' & Desc);
***************
*** 91 ****
!          Put_Line (Pick (1 .. M));
--- 101 ----
!          Print (Pick (1 .. M));
***************
*** 101 ****
!       Put_Line (">" & Id & ' ' & Desc);
--- 111 ----
!       Print (">" & Id & ' ' & Desc);
***************
*** 104 ****
!          Put_Line (S_App (K .. K + Integer'Min(Todo, Line_Length) - 1));
--- 114 ----
!          Print (S_App (K .. K + Integer'Min(Todo, Line_Length) - 1));



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 21:18                     ` Georg Bauhaus
@ 2009-08-05 22:04                       ` Ludovic Brenta
  2009-08-05 22:31                         ` Georg Bauhaus
  2009-08-06  0:07                       ` Isaac Gouy
  1 sibling, 1 reply; 116+ messages in thread
From: Ludovic Brenta @ 2009-08-05 22:04 UTC (permalink / raw)


Georg Bauhaus wrote on comp.lang.ada:
> +    procedure Print (Item : String) is
> +       Data : char_array renames To_C (Item & ASCII.LF);
> +    begin
> +       if fputs (Data'Address, stdout) < 0 then
> +          raise Ada.IO_Exceptions.Device_Error;
> +       end if;
> +    end Print;

I'm curious to know how the following performs:

Stdout : constant Ada.Text_IO.Text_Streams.Stream_Access :=
  Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Standard_Output);

procedure Print (Item : String) is
begin
   String'Write (Stdout, Item);
   Character'Write (Stdout, ASCII.LF);
end Print;

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 22:04                       ` Ludovic Brenta
@ 2009-08-05 22:31                         ` Georg Bauhaus
  2009-08-05 23:44                           ` Jeffrey R. Carter
  2009-08-06  7:51                           ` Dmitry A. Kazakov
  0 siblings, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-05 22:31 UTC (permalink / raw)


Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> +    procedure Print (Item : String) is
>> +       Data : char_array renames To_C (Item & ASCII.LF);
>> +    begin
>> +       if fputs (Data'Address, stdout) < 0 then
>> +          raise Ada.IO_Exceptions.Device_Error;
>> +       end if;
>> +    end Print;
> 
> I'm curious to know how the following performs:
>
> Stdout : constant Ada.Text_IO.Text_Streams.Stream_Access :=
>   Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Standard_Output);
> 
> procedure Print (Item : String) is
> begin
>    String'Write (Stdout, Item);
>    Character'Write (Stdout, ASCII.LF);
> end Print;


Terribly slow.  Requesting only a tenth of the required amount
(i.e. N = 2500000) has taken ~30s. By extrapolation the full
set is produced only after ~5m. That's almost 20x slower
than the original Ada.Text_IO.

(The RM explains that String'Write calls procedure
Character'Write for each component of the String...)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 22:31                         ` Georg Bauhaus
@ 2009-08-05 23:44                           ` Jeffrey R. Carter
  2009-08-06  8:38                             ` Georg Bauhaus
  2009-08-06  7:51                           ` Dmitry A. Kazakov
  1 sibling, 1 reply; 116+ messages in thread
From: Jeffrey R. Carter @ 2009-08-05 23:44 UTC (permalink / raw)


Georg Bauhaus wrote:
> 
> Terribly slow.  Requesting only a tenth of the required amount
> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
> set is produced only after ~5m. That's almost 20x slower
> than the original Ada.Text_IO.
> 
> (The RM explains that String'Write calls procedure
> Character'Write for each component of the String...)

What about if you unchecked convert the String into an appropriately sized 
Stream_Element_Array and write it?

-- 
Jeff Carter
"My legs are gray, my ears are gnarled, my eyes are old and bent."
Monty Python's Life of Brian
81



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 21:18                     ` Georg Bauhaus
  2009-08-05 22:04                       ` Ludovic Brenta
@ 2009-08-06  0:07                       ` Isaac Gouy
  2009-08-06  8:36                         ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Isaac Gouy @ 2009-08-06  0:07 UTC (permalink / raw)


On Aug 5, 2:18 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
-snip-
> > And was that an improvement?
>
> Yes.
-snip-

> Using Florist should end up making similar calls,
> so I expect similar improvements (TBD, soon).
-snip-

Programs using Florist would be interesting enough to make it as
alternative implementations - and probably I'd just show them
alongside everything else.

The jokes are predictable - How do you speed up Ada? Call a C library.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 22:31                         ` Georg Bauhaus
  2009-08-05 23:44                           ` Jeffrey R. Carter
@ 2009-08-06  7:51                           ` Dmitry A. Kazakov
  1 sibling, 0 replies; 116+ messages in thread
From: Dmitry A. Kazakov @ 2009-08-06  7:51 UTC (permalink / raw)


On Thu, 06 Aug 2009 00:31:00 +0200, Georg Bauhaus wrote:

> Ludovic Brenta wrote:
>> Georg Bauhaus wrote on comp.lang.ada:
>>> +    procedure Print (Item : String) is
>>> +       Data : char_array renames To_C (Item & ASCII.LF);
>>> +    begin
>>> +       if fputs (Data'Address, stdout) < 0 then
>>> +          raise Ada.IO_Exceptions.Device_Error;
>>> +       end if;
>>> +    end Print;
>> 
>> I'm curious to know how the following performs:
>>
>> Stdout : constant Ada.Text_IO.Text_Streams.Stream_Access :=
>>   Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Standard_Output);
>> 
>> procedure Print (Item : String) is
>> begin
>>    String'Write (Stdout, Item);
>>    Character'Write (Stdout, ASCII.LF);
>> end Print;
> 
> Terribly slow.  Requesting only a tenth of the required amount
> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
> set is produced only after ~5m. That's almost 20x slower
> than the original Ada.Text_IO.
> 
> (The RM explains that String'Write calls procedure
> Character'Write for each component of the String...)

Newer GNATs use blocks of 256 characters or so, I don't remember the
figure. But as Jeff has suggested, you could use raw stream Write from
Ada.Streams.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06  0:07                       ` Isaac Gouy
@ 2009-08-06  8:36                         ` Georg Bauhaus
  2009-08-06  9:09                           ` Dmitry A. Kazakov
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-06  8:36 UTC (permalink / raw)


Isaac Gouy schrieb:
> On Aug 5, 2:18 pm, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
> -snip-
>>> And was that an improvement?
>> Yes.
> -snip-
> 
>> Using Florist should end up making similar calls,
>> so I expect similar improvements (TBD, soon).
> -snip-
> 
> Programs using Florist would be interesting enough to make it as
> alternative implementations - and probably I'd just show them
> alongside everything else.
> 
> The jokes are predictable - How do you speed up Ada? Call a C library.


Yeah. True of any language performing I/O on Unix I should
think...?

Does it matter?  Gives C programmers the illusion that
it is their language that makes I/O fast when it is
Unix I/O and the wrappers that have naturally made it
into its handy byproduct C.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-05 23:44                           ` Jeffrey R. Carter
@ 2009-08-06  8:38                             ` Georg Bauhaus
  2009-08-06 21:39                               ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-06  8:38 UTC (permalink / raw)


Jeffrey R. Carter schrieb:
> Georg Bauhaus wrote:
>>
>> Terribly slow.  Requesting only a tenth of the required amount
>> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
>> set is produced only after ~5m. That's almost 20x slower
>> than the original Ada.Text_IO.
>>
>> (The RM explains that String'Write calls procedure
>> Character'Write for each component of the String...)
> 
> What about if you unchecked convert the String into an appropriately
> sized Stream_Element_Array and write it?


A similar thing is done in florist AFAICS; I'm working
on it.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06  8:36                         ` Georg Bauhaus
@ 2009-08-06  9:09                           ` Dmitry A. Kazakov
  2009-08-06 10:34                             ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Dmitry A. Kazakov @ 2009-08-06  9:09 UTC (permalink / raw)


On Thu, 06 Aug 2009 10:36:38 +0200, Georg Bauhaus wrote:

> Does it matter?  Gives C programmers the illusion that
> it is their language that makes I/O fast when it is
> Unix I/O and the wrappers that have naturally made it
> into its handy byproduct C.

Huh, Unix I/O is fast? I still remember good old times. Unix Sys V was at
least 5 times slower on a PDP-11 compared to RSX-11M. The reason? Unix used
an incredible amount of buffers mounted on buffers. An input character
travelled through 3-5 buffers before it landed in the program. RSX-11 I/O
was in-place with user-space single buffer passed to the driver. I think
the present situation is only worse, but there is nothing to compare with
(nobody would count Windows as a contender).

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06  9:09                           ` Dmitry A. Kazakov
@ 2009-08-06 10:34                             ` Georg Bauhaus
  2009-08-06 10:49                               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-06 10:34 UTC (permalink / raw)


Dmitry A. Kazakov schrieb:
> On Thu, 06 Aug 2009 10:36:38 +0200, Georg Bauhaus wrote:
> 
>> Does it matter?  Gives C programmers the illusion that
>> it is their language that makes I/O fast when it is
>> Unix I/O and the wrappers that have naturally made it
>> into its handy byproduct C.
> 
> Huh, Unix I/O is fast?

OK, I meant that most programs in the game use simple
I/O functions for line output, functions in particular that
were developed with Unix. (Calling fputs, or even routines
that have read(2) under the hood, I imagine.)

No one uses fprintf for line output; we have to, if
we stick to Text_IO.   If all goes well, Streams.Write
will do something close to what everyone else does, and
in plain Ada.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06 10:34                             ` Georg Bauhaus
@ 2009-08-06 10:49                               ` Dmitry A. Kazakov
  2009-08-06 15:21                                 ` Isaac Gouy
  0 siblings, 1 reply; 116+ messages in thread
From: Dmitry A. Kazakov @ 2009-08-06 10:49 UTC (permalink / raw)


On Thu, 06 Aug 2009 12:34:14 +0200, Georg Bauhaus wrote:

> Dmitry A. Kazakov schrieb:
>> On Thu, 06 Aug 2009 10:36:38 +0200, Georg Bauhaus wrote:
>> 
>>> Does it matter?  Gives C programmers the illusion that
>>> it is their language that makes I/O fast when it is
>>> Unix I/O and the wrappers that have naturally made it
>>> into its handy byproduct C.
>> 
>> Huh, Unix I/O is fast?
> 
> OK, I meant that most programs in the game use simple
> I/O functions for line output, functions in particular that
> were developed with Unix. (Calling fputs, or even routines
> that have read(2) under the hood, I imagine.)
> 
> No one uses fprintf for line output; we have to, if
> we stick to Text_IO.   If all goes well, Streams.Write
> will do something close to what everyone else does, and
> in plain Ada.

These shoot-outs are usually poorly thought and stated. Clearly I/O has
nothing to do with the language performance when the problem is not
directly related to communication with the file system. Otherwise, it is
usually the file system which is a problem and not the language... (:-)) 

OK, this is not a joke. In fact, Ada.Text_IO is fast enough for its
purpose, i.e. dealing with *text* files. If we have MBytes of *text*, there
must be something wrong with us, if we are using crippled OSes, which do
not distinguish texts and data. Ada was designed in the times when crippled
OSes didn't dominate the market...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06 10:49                               ` Dmitry A. Kazakov
@ 2009-08-06 15:21                                 ` Isaac Gouy
  0 siblings, 0 replies; 116+ messages in thread
From: Isaac Gouy @ 2009-08-06 15:21 UTC (permalink / raw)


On Aug 6, 3:49 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:
> On Thu, 06 Aug 2009 12:34:14 +0200, Georg Bauhaus wrote:
> > Dmitry A. Kazakov schrieb:
> >> On Thu, 06 Aug 2009 10:36:38 +0200, Georg Bauhaus wrote:
>
> >>> Does it matter?  Gives C programmers the illusion that
> >>> it is their language that makes I/O fast when it is
> >>> Unix I/O and the wrappers that have naturally made it
> >>> into its handy byproduct C.
>
> >> Huh, Unix I/O is fast?
>
> > OK, I meant that most programs in the game use simple
> > I/O functions for line output, functions in particular that
> > were developed with Unix. (Calling fputs, or even routines
> > that have read(2) under the hood, I imagine.)
>
> > No one uses fprintf for line output; we have to, if
> > we stick to Text_IO.   If all goes well, Streams.Write
> > will do something close to what everyone else does, and
> > in plain Ada.
>
> These shoot-outs are usually poorly thought and stated. Clearly I/O has
> nothing to do with the language performance when the problem is not
> directly related to communication with the file system. Otherwise, it is
> usually the file system which is a problem and not the language... (:-))
>
> OK, this is not a joke. In fact, Ada.Text_IO is fast enough for its
> purpose, i.e. dealing with *text* files. If we have MBytes of *text*, there
> must be something wrong with us, if we are using crippled OSes, which do
> not distinguish texts and data. Ada was designed in the times when crippled
> OSes didn't dominate the market...
>
> --
> Regards,
> Dmitry A. Kazakovhttp://www.dmitry-kazakov.de


Criticisms of these shoot-outs are usually poorly thought and stated.

If we have MBytes of *text*, we must be doing bioinformatics, if we
are using crippled OSes, which do not distinguish texts and data, we
must be concerned with the present rather than some bygone age.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06  8:38                             ` Georg Bauhaus
@ 2009-08-06 21:39                               ` Georg Bauhaus
  2009-08-06 22:42                                 ` Jeffrey R. Carter
  2009-08-13 20:56                                 ` jonathan
  0 siblings, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-06 21:39 UTC (permalink / raw)


Georg Bauhaus wrote:
> Jeffrey R. Carter schrieb:
>> Georg Bauhaus wrote:
>>> Terribly slow.  Requesting only a tenth of the required amount
>>> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
>>> set is produced only after ~5m. That's almost 20x slower
>>> than the original Ada.Text_IO.
>>>
>>> (The RM explains that String'Write calls procedure
>>> Character'Write for each component of the String...)
>> What about if you unchecked convert the String into an appropriately
>> sized Stream_Element_Array and write it?
> 
> 
> A similar thing is done in florist AFAICS; I'm working
> on it.

A new fasta program using Ada's Stream_IO is here:

http://home.arcor.de/bauhaus/Ada/fasta.ada

It is pleasantly fast, even a bit quicker than
calling the C functions as previously explained.
Good to stay inside Ada.

A stylistic issue:
The new Print procedure is currently written using
named parameter association and package prefixes.
This makes it differ from the rest of the program.
Should it be made a better match?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06 21:39                               ` Georg Bauhaus
@ 2009-08-06 22:42                                 ` Jeffrey R. Carter
  2009-08-07  8:53                                   ` Georg Bauhaus
  2009-08-13 20:56                                 ` jonathan
  1 sibling, 1 reply; 116+ messages in thread
From: Jeffrey R. Carter @ 2009-08-06 22:42 UTC (permalink / raw)


Georg Bauhaus wrote:
> 
> http://home.arcor.de/bauhaus/Ada/fasta.ada

>    procedure Print (Item : String) is
>       subtype Index is Stream_Element_Offset range
>         Stream_Element_Offset(Item'First) .. Stream_Element_Offset(Item'Last);
>       subtype XString is String (Item'Range);
>       subtype XBytes is Stream_Element_Array (Index);
>       function To_Bytes is new Unchecked_Conversion
>         (Source => XString,
>          Target => XBytes);
>       LF : constant Stream_Element_Array := (1 => To_Byte (ASCII.LF));
>    begin
>       Stream_IO.Write (stdout, To_Bytes (Item));
>       Stream_IO.Write (stdout, LF);
>    end Print;

How about

subtype Index is Stream_Element_Offset range
    Stream_Element_Offset(Item'First) .. Stream_Element_Offset(Item'Last + 1);
subtype XString is String (Item'First .. Item'Last + 1);

followed by

Stream_IO.Write (stdout, To_Bytes (Item & ASCII.LF) );

?

-- 
Jeff Carter
"Blessed are they who convert their neighbors'
oxen, for they shall inhibit their girth."
Monty Python's Life of Brian
83



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06 22:42                                 ` Jeffrey R. Carter
@ 2009-08-07  8:53                                   ` Georg Bauhaus
  2009-08-07  9:21                                     ` Jacob Sparre Andersen
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-07  8:53 UTC (permalink / raw)


Jeffrey R. Carter wrote:

> How about
> 
> subtype Index is Stream_Element_Offset range
>    Stream_Element_Offset(Item'First) .. Stream_Element_Offset(Item'Last
> + 1);
> subtype XString is String (Item'First .. Item'Last + 1);
> 
> followed by
> 
> Stream_IO.Write (stdout, To_Bytes (Item & ASCII.LF) );

Shorter, and consistently faster when strings are short (10:11).
I'll add this.

I don't know, though, whether or not exchanging one  "&" for
one less Stream_IO.Write is also good when strings are larger,
because of the copying in "&", CPU cache sizes, ...
The test results I get from the program below vary a lot
for anything but the short string (1..70) variant.

Since the test has been run in a VM on top of Vista, I don't
trust the timings that much anyway.  Maybe the order
of tests has an effect, too?  Does somebody have a stable
Linux (or Unix) test environment?  Could you try this?


package Print_Pak is

   procedure Print_Old (Item : String);

   procedure Print (Item : String);

end Print_Pak;


with Ada.Real_Time;  use Ada.Real_Time;
with Print_Pak;

procedure Test_Print is

   type String_Kind is (Short_Strings,
                        Medium_Strings,
                        Long_Strings);
   type String_Access is access String;
   type Run is record
      Data : String_Access;
      Rounds : Positive;
   end record;

   Test_Size : constant := 50_000_000;
   Max_Run : constant array (String_Kind) of Run :=
     (Short_Strings =>
        (Data => new String(1 .. 70 * 1),
         Rounds => Test_Size / 1),
      Medium_Strings =>
        (Data => new String(1 .. 70 * 100),
         Rounds => Test_Size / 100),
      Long_Strings =>
        (Data => new String(1 .. 70 * 10_000),
         Rounds => Test_Size / 10_000));

   Start, Stop: array (String_Kind) of Time;

   procedure Terminal (Message : String) is separate;

begin

   Old: for Kind in Short_Strings .. Long_Strings loop

      Start(Kind) := Clock;
      for K in 1 .. Max_Run(Kind).Rounds loop
         Print_Pak.Print_Old (Max_Run(Kind).Data.all);
      end loop;
      Stop(Kind) := Clock;

      Terminal ("Print_Old, "
                  & String_Kind'Image(Kind)
                  & ": "
                  & Duration'Image (To_Duration (Stop(Kind)
                                                   - Start(Kind))));
   end loop Old;

   for Kind in Short_Strings .. Long_Strings loop

      Start(Kind) := Clock;
      for K in 1 .. Max_Run(Kind).Rounds loop
         Print_Pak.Print (Max_Run(Kind).Data.all);
      end loop;
      Stop(Kind) := Clock;

      Terminal ("Print, "
                  & String_Kind'Image(Kind)
                  & ": "
                  & Duration'Image (To_Duration (Stop(Kind)
                                                   - Start(Kind))));
   end loop;


end Test_Print;


with Ada.Text_IO;

separate (Test_Print)
procedure Terminal (Message : String) is
begin
   Ada.Text_IO.Put_Line(Ada.Text_IO.Current_Error, Message);
end Terminal;


with Ada.Streams.Stream_IO;  use Ada.Streams;
with Unchecked_Conversion;

package body Print_Pak is

    function To_Byte is new Unchecked_Conversion
     (Source => Character,
      Target => Stream_Element);

   stdout : Stream_IO.File_Type;

   procedure Print_Old (Item : String) is
      subtype Index is Stream_Element_Offset range
        Stream_Element_Offset(Item'First) ..
Stream_Element_Offset(Item'Last);
      subtype XString is String (Item'Range);
      subtype XBytes is Stream_Element_Array (Index);
      function To_Bytes is new Unchecked_Conversion
        (Source => XString,
         Target => XBytes);
      LF : constant Stream_Element_Array := (1 => To_Byte (ASCII.LF));
   begin
      Stream_IO.Write (stdout, To_Bytes (Item));
      Stream_IO.Write (stdout, LF);
   end Print_Old;

   procedure Print (Item : String) is
      subtype Index is Stream_Element_Offset range
        Stream_Element_Offset(Item'First)
        .. Stream_Element_Offset(Item'Last + 1);
      subtype XString is String (Item'First .. Item'Last + 1);
      subtype XBytes is Stream_Element_Array (Index);
      function To_Bytes is new Unchecked_Conversion
        (Source => XString,
         Target => XBytes);
   begin
      Stream_IO.Write (stdout, To_Bytes (Item & ASCII.LF));
   end Print;

begin
   Stream_IO.Open (stdout,
                   Mode => Stream_IO.Out_File,
                   Name => "/dev/stdout");
end Print_Pak;



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-07  8:53                                   ` Georg Bauhaus
@ 2009-08-07  9:21                                     ` Jacob Sparre Andersen
  2009-08-07  9:23                                       ` Jacob Sparre Andersen
  2009-08-09 19:17                                       ` Georg Bauhaus
  0 siblings, 2 replies; 116+ messages in thread
From: Jacob Sparre Andersen @ 2009-08-07  9:21 UTC (permalink / raw)


Georg Bauhaus wrote:

> I don't know, though, whether or not exchanging one "&" for one less
> Stream_IO.Write is also good when strings are larger, because of the
> copying in "&", CPU cache sizes, ...

I fear that it might be slower in that case. - But it may not be relevant here.

> Since the test has been run in a VM on top of Vista, I don't trust
> the timings that much anyway.  Maybe the order of tests has an
> effect, too?  Does somebody have a stable Linux (or Unix) test
> environment?  Could you try this?

I tried on my Debian/stable workstation (made the program repeat the
test twice in case of initialisation effects):

Print_Old, SHORT_STRINGS:  12.368589000
Print_Old, MEDIUM_STRINGS:  1.143411000
Print_Old, LONG_STRINGS:  0.553761000
Print, SHORT_STRINGS:  13.874468000
Print, MEDIUM_STRINGS:  3.012395000
Print, LONG_STRINGS:  2.274855000
Print_Old, SHORT_STRINGS:  12.254596000
Print_Old, MEDIUM_STRINGS:  1.140442000
Print_Old, LONG_STRINGS:  0.573983000
Print, SHORT_STRINGS:  13.666659000
Print, MEDIUM_STRINGS:  3.014188000
Print, LONG_STRINGS:  2.286064000

The program was apparently using a nearly constant 94% of the CPU
resources during the whole run (except for the first few seconds).

Greetings,

Jacob
-- 
"Politicians are a lot like diapers. They should be changed
 frequently, and for the same reasons."



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-07  9:21                                     ` Jacob Sparre Andersen
@ 2009-08-07  9:23                                       ` Jacob Sparre Andersen
  2009-08-09 19:17                                       ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Jacob Sparre Andersen @ 2009-08-07  9:23 UTC (permalink / raw)


Jacob Sparre Andersen wrote:

> I tried on my Debian/stable workstation (made the program repeat the
> test twice in case of initialisation effects):

Compiler arguments:

gnatmake -gnat05 -gnatE -gnato -gnatv -gnati1 -gnatf -fstack-check -m -O2 -funroll-loops -funwind-tables -gnatn

Kernel:

Linux jspa-nykredit 2.6.26-2-686 #1 SMP Sun Jul 26 21:25:33 UTC 2009 i686 GNU/Linux

CPU:

Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz

Greetings,

Jacob
-- 
"They that can give up essential liberty to obtain a little
 temporary safety deserve neither liberty nor safety."
                                        -- Benjamin Franklin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-07  9:21                                     ` Jacob Sparre Andersen
  2009-08-07  9:23                                       ` Jacob Sparre Andersen
@ 2009-08-09 19:17                                       ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-09 19:17 UTC (permalink / raw)


Jacob Sparre Andersen wrote:

[printing via Stream_IO]

> I tried on my Debian/stable workstation (made the program repeat the
> test twice in case of initialisation effects):
> 
> Print_Old, SHORT_STRINGS:  12.368589000
> Print_Old, MEDIUM_STRINGS:  1.143411000
> Print_Old, LONG_STRINGS:  0.553761000
> Print, SHORT_STRINGS:  13.874468000
> Print, MEDIUM_STRINGS:  3.012395000
> Print, LONG_STRINGS:  2.274855000
> Print_Old, SHORT_STRINGS:  12.254596000
> Print_Old, MEDIUM_STRINGS:  1.140442000
> Print_Old, LONG_STRINGS:  0.573983000
> Print, SHORT_STRINGS:  13.666659000
> Print, MEDIUM_STRINGS:  3.014188000
> Print, LONG_STRINGS:  2.286064000


Thanks.  (I have in the meantime found similar results
on Mac OS X, BTW.)
Possible conclusion: for long strings, copying dominates
and should be avoided, call Stream_IO.Write for the text
to be appended ...?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-06 21:39                               ` Georg Bauhaus
  2009-08-06 22:42                                 ` Jeffrey R. Carter
@ 2009-08-13 20:56                                 ` jonathan
  2009-08-13 21:30                                   ` jonathan
  2009-08-13 22:08                                   ` Georg Bauhaus
  1 sibling, 2 replies; 116+ messages in thread
From: jonathan @ 2009-08-13 20:56 UTC (permalink / raw)


On Aug 6, 10:39 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Georg Bauhaus wrote:
> > Jeffrey R. Carter schrieb:
> >> Georg Bauhaus wrote:
> >>> Terribly slow.  Requesting only a tenth of the required amount
> >>> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
> >>> set is produced only after ~5m. That's almost 20x slower
> >>> than the original Ada.Text_IO.
>
> >>> (The RM explains that String'Write calls procedure
> >>> Character'Write for each component of the String...)
> >> What about if you unchecked convert the String into an appropriately
> >> sized Stream_Element_Array and write it?
>
> > A similar thing is done in florist AFAICS; I'm working
> > on it.
>
> A new fasta program using Ada's Stream_IO is here:
>
> http://home.arcor.de/bauhaus/Ada/fasta.ada
>
> It is pleasantly fast, even a bit quicker than
> calling the C functions as previously explained.
> Good to stay inside Ada.
>
> A stylistic issue:
> The new Print procedure is currently written using
> named parameter association and package prefixes.
> This makes it differ from the rest of the program.
> Should it be made a better match?

FWIW, I finally found time to experiment with compiler flags
on my x86_64 GNU/Linux pc. I used GNAT GPL  2009 (gcc 4.3.4).

fasta   running times:

with -gnatN         get 20% to 25% speedup (NOT -gnatn)
with -funroll-loops get 4% to 5% speedup (NOT -funroll-all-loops)
with -O2 same running time (almost) as -O3

and if you change

   Last : Natural := 42;

to

   type Unsigned_32 is mod 2**32;
   Last : Unsigned_32 := 42;

get 10% speedup! (I bet arithmetic on Natural is 64 bit;
likely same will happen on some of the shootout benchmarks.)

So I'ld use

 gnatmake -O3 -gnatNp  -funroll-loops  -march=native ...

Jonathan











^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-13 20:56                                 ` jonathan
@ 2009-08-13 21:30                                   ` jonathan
  2009-08-13 22:08                                   ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-13 21:30 UTC (permalink / raw)


On Aug 13, 9:56 pm, jonathan <johns...@googlemail.com> wrote:
> On Aug 6, 10:39 pm, Georg Bauhaus <rm.tsoh.plus-
>
>
>
> bug.bauh...@maps.futureapps.de> wrote:
> > Georg Bauhaus wrote:
> > > Jeffrey R. Carter schrieb:
> > >> Georg Bauhaus wrote:
> > >>> Terribly slow.  Requesting only a tenth of the required amount
> > >>> (i.e. N = 2500000) has taken ~30s. By extrapolation the full
> > >>> set is produced only after ~5m. That's almost 20x slower
> > >>> than the original Ada.Text_IO.
>
> > >>> (The RM explains that String'Write calls procedure
> > >>> Character'Write for each component of the String...)
> > >> What about if you unchecked convert the String into an appropriately
> > >> sized Stream_Element_Array and write it?
>
> > > A similar thing is done in florist AFAICS; I'm working
> > > on it.
>
> > A new fasta program using Ada's Stream_IO is here:
>
> >http://home.arcor.de/bauhaus/Ada/fasta.ada
>
> > It is pleasantly fast, even a bit quicker than
> > calling the C functions as previously explained.
> > Good to stay inside Ada.
>
> > A stylistic issue:
> > The new Print procedure is currently written using
> > named parameter association and package prefixes.
> > This makes it differ from the rest of the program.
> > Should it be made a better match?
>
> FWIW, I finally found time to experiment with compiler flags
> on my x86_64 GNU/Linux pc. I used GNAT GPL  2009 (gcc 4.3.4).
>
> fasta   running times:
>
> with -gnatN         get 20% to 25% speedup (NOT -gnatn)
> with -funroll-loops get 4% to 5% speedup (NOT -funroll-all-loops)
> with -O2 same running time (almost) as -O3
>
> and if you change
>
>    Last : Natural := 42;
>
> to
>
>    type Unsigned_32 is mod 2**32;
>    Last : Unsigned_32 := 42;
>
> get 10% speedup! (I bet arithmetic on Natural is 64 bit;
> likely same will happen on some of the shootout benchmarks.)

As soon as I posted this I decided to test -ffast-math, which
usually doesn't do much.  Here it gave a spendid
speedup of about 10% to 15%. (There's a bit of floating point
in fasta).

Overall, we seem to get a roughly 25-30% speedup from optimization
switches alone (ok, plus a bit from the 32 bit modular type).

In other words

gnatmake -gnatNp -O3 -funroll-loops -ffast-math -march=native
fasta.adb

produced an executable that ran in ~75% the running time of

gnatmake -gnatp -O3 -march=native fasta.adb

Oh, and the infamous -a flag gave another 2% speedup.  I won't
even speculate.

Jonathan






^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-13 20:56                                 ` jonathan
  2009-08-13 21:30                                   ` jonathan
@ 2009-08-13 22:08                                   ` Georg Bauhaus
  2009-08-14 15:31                                     ` jonathan
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-13 22:08 UTC (permalink / raw)


jonathan wrote:
> On Aug 6, 10:39 pm, Georg Bauhaus <rm.tsoh.plus-

> FWIW, I finally found time to experiment with compiler flags
> on my x86_64 GNU/Linux pc. I used GNAT GPL  2009 (gcc 4.3.4).
> 
> fasta   running times:
> 
> with -gnatN         get 20% to 25% speedup (NOT -gnatn)

Based on earlier findings regarding the inlining options,
I had suggested, as a followup to the submission,
to add -gnatn and -gnatN to the flags;
this has already been done a few days ago. (AFAICS, the
effects seem to have been less impressive than I would have
hoped.)

Also, while working on the regexdna program, adding -gnatN
caused exception followed by some valid outputs and then
something that hangs. Whatever this might mean in general.

> with -funroll-loops get 4% to 5% speedup (NOT -funroll-all-loops)
> with -O2 same running time (almost) as -O3
> 
> and if you change
> 
>    Last : Natural := 42;
> 
> to
> 
>    type Unsigned_32 is mod 2**32;
>    Last : Unsigned_32 := 42;
> 
> get 10% speedup! (I bet arithmetic on Natural is 64 bit;
> likely same will happen on some of the shootout benchmarks.)

I'll try this on some machines to which I have access;
next week, though.

> So I'ld use
> 
>  gnatmake -O3 -gnatNp  -funroll-loops  -march=native ...
> 
> Jonathan
> 
> 
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-13 22:08                                   ` Georg Bauhaus
@ 2009-08-14 15:31                                     ` jonathan
  0 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-14 15:31 UTC (permalink / raw)



OK, I found a few more minutes to look at these
things.

Most important thing to remember: Whether you
should use  -funroll-loops, -funroll-all-loops,
-gnatn, -gnatN, or none-of-the-above
depends on the compiler.

After a quick test using GNAT 4.3.2,
none-of-the-above  seems an acceptible choice.

After another test using GNAT 4.3.4 (GPL 2009),
-gnatN -funroll-loops  seems quite a good choice.

On both  GNAT 4.3.2 and  GNAT 4.3.4,
-ffast-math  is near essential for top speed.
In fact it seemed even better with GNAT 4.3.2;
15% to 20% improvement.

Well, -ffast-math isn't usually that good, so I
took the trouble to find out what's going on.

One thing (in retrospect the only thing) that could
have shaved 15% off the running time was removal
of the floating point division by replacing
"x / constant" with "x * (1/constant)".

To verify this I modified the code by replacing
the "x / constant" with "x * (1/constant)" explicitly.
When I did this the  -ffast-math  switch became
unnecessary.

I didn't know -ffast-math could do that.

Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (5 preceding siblings ...)
  2009-08-04  8:23 ` Martin
@ 2009-08-16 15:20 ` jonathan
  2009-08-17 15:46   ` Georg Bauhaus
  2009-08-27  9:05 ` Martin
  2009-10-07 11:44 ` Olivier Scalbert
  8 siblings, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-16 15:20 UTC (permalink / raw)


On Aug 1, 1:21 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> where some Ada programs have started to fail.

I took a 2nd look at your single core knucleotide.adb.

http://home.arcor.de/bauhaus/Ada/knucleotide.gnat

I was able to speed it up by 40% to 50% with one
change: I inlined the hashing function by hand.
Its not the prettiest code, but this is after
all an optimization exercise.

Running time under GNAT 4.3.2 went from 79 sec to
to 53 sec. Running time under GNAT 4.3.4 went from
70 sec to 50 sec. On both  GNAT 4.3.2, and
GNAT 4.3.4 (GPL 2009) the best compiler switches
I could find were:

   gnatmake -O3 -gnatp -march=native

Here is the original Hash function:

-- function Hash (Key : Fragment) return Hash_Type is
-- begin
--    return Hash_Function (Fragments_1.To_String (Key));
-- end Hash;

and here is the replacement:

 function Hash (Key : Fragment) return Hash_Type is
    type Uns_32 is mod 2**32;
    H : Uns_32 := 0;
    pragma Assert (Hash_Type'First = 0);
 begin
    for J in 1 .. Key.Last loop
      H := Character'Pos(Key.Data(J)) + H * 2**6 + H * 2**16 - H;
    end loop;
    return Hash_Type'Base (H mod Uns_32 (Hash_Type'Last + 1));
 end Hash;

A few more changes:

1. REMOVE:  pragma Restrictions (No_Finalization);
   Won't compile under GNAT 4.3.4, with mysterious
   error message. Could this be a problem on GNAT 4.3.3
   used at the shootout site? In any case, -gnatu
   says we're linking to storage pool routines, so
   this is probably an error.

2. I replaced:  type Hash_Type is range 0 .. 2**16;  with

      type Hash_Type is range 0 .. 2**17 - 1;

   Both work, but latter is a few % faster.

3. I moved  type Bounded_String  out of the private part
   and made it public so that function Hash could see it.
   (Just for convenience.)

      type Bounded_String is record
         Data : String(Frequencies);
         Last : Natural;
      end record;

4. The knucleotide.adb version that uses Ada.Containers is
   3 times slower than this new one, and seems to fail
   on the 250 Meg fasta text file used in the benchmark.

Final comment:

According to the shootout site, they're using
GNAT 4.3.3.  I wonder if this is the source of our
misery.  I can get GNAT 4.3.2 and  GNAT 4.3.4 (GPL)
the work very nicely on all machines I have access to.
The best optimation switches are often very different
on these 2 compilers, so I have no idea what GNAT 4.3.3
likes.

Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-16 15:20 ` jonathan
@ 2009-08-17 15:46   ` Georg Bauhaus
  2009-08-18 16:59     ` jonathan
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-17 15:46 UTC (permalink / raw)


jonathan schrieb:
> On Aug 1, 1:21 pm, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
>> This is about the K-Nucleotide program at the
>> Computer Language Benchmark Game,http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
>> where some Ada programs have started to fail.
> 
> I took a 2nd look at your single core knucleotide.adb.
> 
> http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
> 
> I was able to speed it up by 40% to 50% with one
> change: I inlined the hashing function by hand.
> Its not the prettiest code, but this is after
> all an optimization exercise.



I had played with the Hashing algorithm to no avail and forgot
something that ... if it is ... obvious!  Thanks.
For comparison I have tried Hash calling a new inlined
Element function from the Fragments string package.
With just that change, then, and full inlining the speedup
is 20%...

Because of the desructive effects of -gnatN in an experimental
version of the regexdna program, I speculate that inlining
by hand might be more reliable.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-17 15:46   ` Georg Bauhaus
@ 2009-08-18 16:59     ` jonathan
  2009-08-19 14:52       ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-18 16:59 UTC (permalink / raw)



OK, I did some more work on knuceotide.adb.  I'll say in
advance that the code may be .. a bit messy.  There is sometimes
no way around this if you are trying for fast rather than elegant.

The 1st and most important change improves running time of
the bounded string comparison operator "=".  This "=" is how the
hash routine decides if it has found the desired item, so it
could not be more important.  I'm rather proud of this because it
the speedup surprised me: apparently comparison of 32 or 64 bit ints
is a lot faster than comparing the Characters one by one in the
string.  So I used an unchecked_conversion from strings of 4
characters to 32 bit int.  The speedup was 15-20%. With this change
and with the second change described below at the end, the program
running time on my machine falls from about 53 sec to about 37 sec,
using GNAT 4.3.2 and gnatmake -O3 -gnatp.

with Unchecked_Conversion;
separate (KNucleotide_1)
package body Fragments_1 is

-- old version of "="

-- function "=" (Left, Right: Bounded_String) return Boolean is
-- begin
--      return Left.Last = Right.Last
--         and then Left.Data(1 .. Left.Last) = Right.Data(1 ..
Right.Last);
-- end "=";

-- new version of "=".  (Tell me if its right;)

   Bytes_per_Word : constant := 4;    -- 4 good, even on 64-bit archs.
   type Uns is mod 2**(8*Bytes_per_Word);
   for Uns'Size use 8*Bytes_per_Word;
   subtype Str is String(1..Bytes_per_Word);
   function To_Uns is new Unchecked_Conversion (Str, Uns);

   function "=" (Left, Right: Bounded_String) return Boolean is
      strt : Integer := 1;
      fnsh : Integer := Bytes_per_Word;
      Last : constant Integer := Left.Last;
   begin
      if Last /= Right.Last then
        return False;
      end if;

      loop
         exit when fnsh > Last;
         if To_Uns (Left.Data(strt..fnsh)) /= To_Uns (Right.Data
(strt..fnsh)) then
            return False;
         end if;
         strt := strt + Bytes_per_Word;
         fnsh := fnsh + Bytes_per_Word;
      end loop;

      for i in strt .. Last loop
         if Left.Data(i) /= Right.Data(i) then
            return False;
         end if;
      end loop;
      return True;
   end "=";

The second change gives another speedup of about half
that of the previous one.  I am shamelessly advocating
removing a layer of abstraction.  Remember, I moved  the
declaration of the bounded string to the public part, so that
the procedures can now see the 2 components of the record:
Data, and Last. Here all we do is avoid a call to
function To_Bounded_String in package Fragments_1.
Below this call is commented out:
  --
  --  Calculate the calculates the nucleotide frequencies
  --
  procedure Calculate_Frequencies (Length : Frequencies) is
     Key : Fragment;
     Len : constant Integer := Length;
  begin
     -- NEW initialization of Key.Last:
     Key.Last := Len;
     Table.Reset;
     for I in  1 .. Buffer'Last - Len + 1 loop

        -- NEW initialization of Key.Data:
        Key.Data(1 .. Len) := Buffer (I .. I + Len - 1);

        declare
         -- OLD initialization of Key.Data:
         --Key     : constant Fragment       :=
         --  Fragments_1.To_Bounded_String (Buffer (I .. I+Len-1));
           Element : constant Element_Access := Table.Get (Key);
        begin

I glance at the shootout benchmark site suggests we get a
respectable result.  Their machine is slower than mine, so
I would expect somewhere around 55 sec on their machine.
(But I am all the more impressed by the ones that are faster
than 50 sec.)  By the way, I notice that the top c++ entry
uses a static 250 megabyte array to store the data.  We
can also gain a few seconds by doing the following. (Notice
the optional 250 megabyte string. Yes, it worked.)

 procedure Read_Section is
   --Buffer     : String (1 .. 1024*1024*256); -- best; ~8% faster
than 10240
     Buffer     : String (1 .. 1024*1024*64);  -- almost as good as
above.
   --Buffer     : String (1 .. 1024*10);       -- standard setting
   --Buffer     : String (1 .. 8);             -- this works fine


Jonathan





^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-18 16:59     ` jonathan
@ 2009-08-19 14:52       ` Georg Bauhaus
  2009-08-19 16:40         ` Martin
                           ` (2 more replies)
  0 siblings, 3 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-19 14:52 UTC (permalink / raw)


jonathan schrieb:
> OK, I did some more work on knuceotide.adb.  I'll say in
> advance that the code may be .. a bit messy.  There is sometimes
> no way around this if you are trying for fast rather than elegant.

Ada programs won't be alone in that camp, I think? ;-)


> -- new version of "=".  (Tell me if its right;) [...]

Gradually integrating all of your patches into the multitasking
version (in reversed order for no planned reason) gives the
following improvements on a machine with an AMD Athlon(tm) 64 X2
Dual Core Processor 3800+, 2 GHz, 2GB RAM:

N = 2,500,000

10s - no patches
8.7s - new "=" and public definition of Fragments.Bounded_String
8.1s - increasing the read buffer size
5.7s - manual inlining of the hashing function
5.1s - changing Hash_Type to include 1 .. 2**17 - 1

GNAT 4.2.3

(Running the last version on 1 core, not 2, the real time doubled.)

With N = 25,000,000 I get

real	0m50.467s
user	1m28.540s
sys	0m0.410s

http://home.arcor.de/bauhaus/Ada/knucleotide.j.tasking.gnat

As a precaution, I have allocated the larger read buffer on the heap.
Emacs Ada mode may have capitalized some identifiers; I hadn't
changed the presets on this machine.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 14:52       ` Georg Bauhaus
@ 2009-08-19 16:40         ` Martin
  2009-08-19 18:24           ` Robert A Duff
  2009-08-19 21:37           ` Georg Bauhaus
  2009-08-19 19:22         ` jonathan
  2009-08-19 19:27         ` jonathan
  2 siblings, 2 replies; 116+ messages in thread
From: Martin @ 2009-08-19 16:40 UTC (permalink / raw)


On Aug 19, 3:52 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> jonathan schrieb:
>
> > OK, I did some more work on knuceotide.adb.  I'll say in
> > advance that the code may be .. a bit messy.  There is sometimes
> > no way around this if you are trying for fast rather than elegant.
>
> Ada programs won't be alone in that camp, I think? ;-)
>
> > -- new version of "=".  (Tell me if its right;) [...]
>
> Gradually integrating all of your patches into the multitasking
> version (in reversed order for no planned reason) gives the
> following improvements on a machine with an AMD Athlon(tm) 64 X2
> Dual Core Processor 3800+, 2 GHz, 2GB RAM:
>
> N = 2,500,000
>
> 10s - no patches
> 8.7s - new "=" and public definition of Fragments.Bounded_String
> 8.1s - increasing the read buffer size
> 5.7s - manual inlining of the hashing function
> 5.1s - changing Hash_Type to include 1 .. 2**17 - 1
>
> GNAT 4.2.3
>
> (Running the last version on 1 core, not 2, the real time doubled.)
>
> With N = 25,000,000 I get
>
> real    0m50.467s
> user    1m28.540s
> sys     0m0.410s
>
> http://home.arcor.de/bauhaus/Ada/knucleotide.j.tasking.gnat
>
> As a precaution, I have allocated the larger read buffer on the heap.
> Emacs Ada mode may have capitalized some identifiers; I hadn't
> changed the presets on this machine.

Isn't there something that could be done about the lines:

      return Unbounded.To_String (Data_Buffer);
   end Read;
and
   Buffer : constant String := Read;

...not sure what the answer is (don't have a compiler / debugger to
hand just now but it just looks like there's a copy too many in there
somewhere...

Also, is using "Ada.Text_IO.Get_Line" particularly fast?

Cheers
-- Martin




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 16:40         ` Martin
@ 2009-08-19 18:24           ` Robert A Duff
  2009-08-19 22:15             ` jonathan
  2009-08-19 21:37           ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Robert A Duff @ 2009-08-19 18:24 UTC (permalink / raw)


Martin <martin.dowie@btopenworld.com> writes:

> Isn't there something that could be done about the lines:
>
>       return Unbounded.To_String (Data_Buffer);
>    end Read;
> and
>    Buffer : constant String := Read;
>
> ...not sure what the answer is (don't have a compiler / debugger to
> hand just now but it just looks like there's a copy too many in there
> somewhere...
>
> Also, is using "Ada.Text_IO.Get_Line" particularly fast?

No.  Neither is Strings.Unbounded.  If you're trying to write fast code,
I would experiment with avoiding both.

- Bob



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 14:52       ` Georg Bauhaus
  2009-08-19 16:40         ` Martin
@ 2009-08-19 19:22         ` jonathan
  2009-08-19 19:27         ` jonathan
  2 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-19 19:22 UTC (permalink / raw)




Quick report on the the multitasking version
of knucleotide, using an 8-core machine.

Running on a single core, (single worker task)
it takes 35 seconds (same as the task-free version
of knucleotide):

   Split_Work: array (1 .. 1) of Writer;

begin
   Split_Work(1).Set (Nucleotide_Length => 1);
   Split_Work(1).Set (Nucleotide_Length => 2);
   Split_Work(1).Set (Fragments.Length (Fragment_3), Fragment_3);
   Split_Work(1).Set (Fragments.Length (Fragment_4), Fragment_4);
   Split_Work(1).Set (Fragments.Length (Fragment_6), Fragment_6);
   Split_Work(1).Set (Fragments.Length (Fragment_12), Fragment_12);
   Split_Work(1).Set (Fragments.Length (Fragment_18), Fragment_18);

time ./knucleotide < fasta250.dat

real    0m35.267s
user    0m35.002s
sys     0m0.252s


Using 4 worker tasks, it takes 13.6 seconds (best result). But
come to think of it, if there is a fifth env task aiding execution,
then it may benefit from the fact that there are 8 cores on
my machine.  Tasking overhead is negligible.

   Split_Work: array (1 .. 4) of Writer;

begin
   Split_Work(3).Set (Fragments.Length (Fragment_12), Fragment_12);
   Split_Work(4).Set (Fragments.Length (Fragment_18), Fragment_18);
   Split_Work(1).Set (Nucleotide_Length => 1);
   Split_Work(2).Set (Nucleotide_Length => 2);
   Split_Work(1).Set (Fragments.Length (Fragment_3), Fragment_3);
   Split_Work(2).Set (Fragments.Length (Fragment_4), Fragment_4);
   Split_Work(1).Set (Fragments.Length (Fragment_6), Fragment_6);

time ./knucleotide < fasta250.dat

real    0m13.653s
user    0m35.478s
sys     0m0.288s


I have put my version of the 1-core knucleotide.adb at

   http://web.am.qub.ac.uk/users/j.parker/

in directory bench_depository.
I made an attempt to prettify the text to my tastes, then lost
steam.  It seems to be identical to the new multicore version.
The only difference is a marginally stronger hash_function, which
I played with out of curiosity.  (Concluded that overhead
from calculating H is essentially insignificant, certainly
compared to benefit of avoiding collisions.)

  for J in 1 .. Key.Last loop
     H := Character'Pos (Key.Data (J)) +
           H * 2**3 + H * 2**6 + H * 2**11 - H;
  end loop;
  return Hash_Type'Base (H mod Uns_32(Hash_Type'Last + 1));

The older one, which may be better on more
general (ie longer) strings, and it is cetainly safer:
someone verified that it works in general.


Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 14:52       ` Georg Bauhaus
  2009-08-19 16:40         ` Martin
  2009-08-19 19:22         ` jonathan
@ 2009-08-19 19:27         ` jonathan
  2 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-19 19:27 UTC (permalink / raw)


On Aug 19, 3:52 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:

> (Running the last version on 1 core, not 2, the real time doubled.)
>
> With N = 25,000,000 I get
>
> real    0m50.467s
> user    1m28.540s
> sys     0m0.410s
>
I forgot to add, I got the same result as you on my 2-core machine.
I used the 3 tasks declared in the version you put on line.  Running
time fell by exactly a factor of 2 compared to the 1-core.

j.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 16:40         ` Martin
  2009-08-19 18:24           ` Robert A Duff
@ 2009-08-19 21:37           ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-19 21:37 UTC (permalink / raw)


Martin wrote:

> Isn't there something that could be done about the lines:
> 
>       return Unbounded.To_String (Data_Buffer);
>    end Read;
> and
>    Buffer : constant String := Read;

As a first try, I'm playing with the lightweight bounded
strings package (from the program), which is generic in
Jonathan Parker's version;  since a C++ program uses a
large buffer, we might try this as well, i.e. instantiate
the package for large strings and replace Unbounded.Append
with our own Append, effectively reading directly into the
target string.

> ...not sure what the answer is (don't have a compiler / debugger to
> hand just now but it just looks like there's a copy too many in there
> somewhere...
> 
> Also, is using "Ada.Text_IO.Get_Line" particularly fast?

Stream_IO might be faster, but the rules say that we have to
read line-by-line.  Would we not end up reinventing a buffering
scheme for line oriented standard I/O?
(Could be of some general utility, anyway, I'd think.
GNAT seems to be using C library function wrappers for its
high speed reading.)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 18:24           ` Robert A Duff
@ 2009-08-19 22:15             ` jonathan
  2009-08-19 23:50               ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-19 22:15 UTC (permalink / raw)


On Aug 19, 7:24 pm, Robert A Duff <bobd...@shell01.TheWorld.com>
wrote:
> Martin <martin.do...@btopenworld.com> writes:
> > Isn't there something that could be done about the lines:
>
> >       return Unbounded.To_String (Data_Buffer);
> >    end Read;
> > and
> >    Buffer : constant String := Read;
>
> > ...not sure what the answer is (don't have a compiler / debugger to
> > hand just now but it just looks like there's a copy too many in there
> > somewhere...
>
> > Also, is using "Ada.Text_IO.Get_Line" particularly fast?
>
> No.  Neither is Strings.Unbounded.  If you're trying to write fast code,
> I would experiment with avoiding both.
>
> - Bob


Ok, I finally put some print statements in to check.

   begin -- function Read:

      ada.text_io.put("1");
      Skip_To_Section;

      ada.text_io.put("2");
      Read_Section;

      ada.text_io.put("3");
      Unbounded.Translate (Data_Buffer, Mapping =>
Maps.Constants.Upper_Case_Map);

      ada.text_io.put("4");
      return Unbounded.To_String (Data_Buffer);

   end Read;

   --  execute function Read:

   Buffer : constant String := Read;

...
begin -- execution of the program
ada.text_io.put("5");

...

From 1 to 2 takes 3.5 sec.  It is reading 125 Meg of text line by line
and
discarding it. (Using Ada.Text_IO.Get_Line as the rules suggest.)

From 2 to 3 takes 3.5 sec.  It is reading 125 Meg of text line by line
and
storing it. (Using Ada.Text_IO.Get_Line as the rules suggest.)

From 3 to 5 is well under 1 sec.  From 4 to 5 is negligible.

Full running time is 36 sec.

Jonathan



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 22:15             ` jonathan
@ 2009-08-19 23:50               ` Georg Bauhaus
  2009-08-20 19:47                 ` jonathan
  2009-08-20 19:55                 ` jonathan
  0 siblings, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-19 23:50 UTC (permalink / raw)


jonathan wrote:

> Ok, I finally put some print statements in to check.

> ...
> 
> From 1 to 2 takes 3.5 sec.  It is reading 125 Meg of text line by line
> and
> discarding it. (Using Ada.Text_IO.Get_Line as the rules suggest.)
> 
> From 2 to 3 takes 3.5 sec.  It is reading 125 Meg of text line by line
> and
> storing it. (Using Ada.Text_IO.Get_Line as the rules suggest.)
> 
> From 3 to 5 is well under 1 sec.  From 4 to 5 is negligible.

Similar durations here, although tested on Windows.

Also, reading directly into a 128 MB target Buffer of
type String (i.e. not appending chunks to an Unbounded_String
and then performing To_String) seems to have a small
effect only.  Since a fixed buffer size makes the program
fail when inputs grow, I think the temporary Unbounded_String
is reasonable.

There are only a few calls to Unbounded.* anyway.
Get_Line seems to be the dominating subprogram here.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 23:50               ` Georg Bauhaus
@ 2009-08-20 19:47                 ` jonathan
  2009-08-20 22:15                   ` Georg Bauhaus
  2009-08-20 19:55                 ` jonathan
  1 sibling, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-20 19:47 UTC (permalink / raw)


On Aug 20, 12:50 am, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:

> Since a fixed buffer size makes the program
> fail when inputs grow, I think the temporary Unbounded_String
> is reasonable.
>
> There are only a few calls to Unbounded.* anyway.
> Get_Line seems to be the dominating subprogram here.


I agree. These timings mean that overhead from unbounded
strings is insignificant.

Unfortunately I got my timings a bit wrong.  Fortunately the
conclusion remains the same.  (I redid everything
carefully.)  IO is consistently ~11% running time on
my machine, so no great source of inefficiency.

I also measured the benefit of the 64 Megabyte buffer
more carefully:

      procedure Read_Section is
         ...
         Local_Buffer : String_Access;
      begin
         Local_Buffer := new String (1 .. 1024 * 1024 * 64);

I would now recommend

         Local_Buffer := new String (1 .. 1024 * 1024 * 16);

Its only .25 sec slower.

Finally, I found one last useful optimization, which
consistently shaves 6% off the running time on my machine.
(I was unpleasantly surprised that it wasn't better.)

The 125 Meg array (Buffer) is read through from beginning
to end numerous times and and each time about 125 million
substrings are written to string Key.Data:

  Key.Data(1 .. Length) := Buffer (I-Length+1 .. I);

The string Key.Data already contains most of the desired
data. Since Buffer may be in slower memory, and Key.Data
in very fast memory, it makes sense to update
Key.Data from itself whenever possible:

 Key.Last            := Length;
 Key.Data(2..Length) := Buffer (1..Length-1);--new essential init.

 for I in Length .. Buffer'Last loop

 -- Key.Data(1 .. Length) := Buffer (I-Length+1 .. I);--Old

    -- New. Use loop not sliding, (and -funroll-all-loops):
    for j in 1 .. Length-1 loop
       Key.Data(j) := Key.Data(j+1);
    end loop;
    Key.Data(Length) := Buffer (I);

    declare
       Element : constant Element_Access := Table.Get (Key);
    ...



Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-19 23:50               ` Georg Bauhaus
  2009-08-20 19:47                 ` jonathan
@ 2009-08-20 19:55                 ` jonathan
  1 sibling, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-20 19:55 UTC (permalink / raw)



Now that I've measured IO overheard, I can
say more about the efficiency of the multitasking
knuckeotide.
The optimal result would be about 11 sec on
4 cores. The best I measure is about 13.2 seconds -
hard to improve on.

Here is the best timing I get on 4 cores. (I put
in the optimization I described previously;
results are a bit improved over yesterday. The
actual times fluctuate wildly. This is the
lower bound.)

real    0m13.174s
user    0m32.294s
sys     0m0.420s

3.7 sec go to IO, which won't parallelize.
Those 3.7 sec can't be reduced by adding more tasks.

The remaining 28.6 sec we hope to see reduced to
28.6 / 4 = 7.15 sec on 4 cores in the perfect limit.
That means the best result we can hope for
is a running time of 3.7 + 7.15 = 10.85 sec.

Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-20 19:47                 ` jonathan
@ 2009-08-20 22:15                   ` Georg Bauhaus
  2009-08-21 21:43                     ` jonathan
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-20 22:15 UTC (permalink / raw)


jonathan wrote:

>   Key.Data(1 .. Length) := Buffer (I-Length+1 .. I);
> 
> The string Key.Data already contains most of the desired
> data. Since Buffer may be in slower memory, and Key.Data
> in very fast memory, it makes sense to update
> Key.Data from itself whenever possible:

Maybe, then, it is worthwhile loading a page worth
of characters from the big string for the next round
of fill-up completions?  If this loading doesn't happen
perfectly anyway, that is.

Or if we are at it, we might experiment with slice renamings
or pointer artihmetic; requires a rewrite, I'd think.
If the rules permit this at all ;-)




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-20 22:15                   ` Georg Bauhaus
@ 2009-08-21 21:43                     ` jonathan
  2009-08-22 22:35                       ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: jonathan @ 2009-08-21 21:43 UTC (permalink / raw)


On Aug 20, 11:15 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> jonathan wrote:
> >   Key.Data(1 .. Length) := Buffer (I-Length+1 .. I);
>
> > The string Key.Data already contains most of the desired
> > data. Since Buffer may be in slower memory, and Key.Data
> > in very fast memory, it makes sense to update
> > Key.Data from itself whenever possible:
>
> Maybe, then, it is worthwhile loading a page worth
> of characters from the big string for the next round
> of fill-up completions?  If this loading doesn't happen
> perfectly anyway, that is.
>
> Or if we are at it, we might experiment with slice renamings
> or pointer artihmetic; requires a rewrite, I'd think.
> If the rules permit this at all ;-)

I did make another attempt here.  I made the string:

   type Bounded_String is record
      Data  : String(1 .. Max_String_Length);
      First : Natural;
      Last  : Natural;
   end record;

Now make Max_String_Length large, and if you want to slide
array Data a distance Shift_Length all you have to do in most
cases is increment First and Last by Shift_Length.

This essentially removed all the overhead in sliding
array Data.  I was very pleased with myself. (And again
astonished at how much an Ada compiler can do to help you
get it right.)  Unfortunately, it slowed the program
down quite a bit.  I think it just made the string "="
more complicated and slower. SO I've happily dropped that
approach.

I'll be off-line for a few days, so let me summarize what
we have.  The program is in good shape and very fast.
Martin Krischik wrote a good program, and we've now added a
light-weight string, inlined a few things by hand, and
worked around a GNAT 4.3.3 problem by using unbound strings.

Here's a rough speed estimate, single-core version. My Intel
quad core is similar to the benchmarking machine, just more
MHz.  knucleotide.adb runs in ~33 sec on my machine (either
GNAT 4.3.2 or GNAT 4.3.4).  I would expect it to run a
bit above 40 sec on the benchmarking machine. On the
benchmarking machine the top entries have running times of:
  20 sec, 43 sec, 48 sec, and 55 sec.
Most entries are much slower. So knucleotide.adb is
very respectible by my estimate.

Here's how I made that speed estimate.  I took the
faster-than-light (20 second) entry and compiled it
on my quad-core:

g++ -O3 -I ../boost/boost_1_36_0 -fopenmp knucleotide.c++

It ran in 16 sec on my machine, 20 sec on the benchmarking
machine. If that ratio holds, knucleotide.adb runs in
40 sec on the benchmarking machine. (By the way, I can't
make head or tails of the source code of knucleotide.c++.
It doesn't help that I don't know c++;)
It seems to be designed to make good use of OpenMP
style multiprocessing, but resemblance to the original
benchmark is somewhat obscure.  Still I am very
impressed, and I wonder how it works so well.)

Anyone who wants to report bugs or offer improvements can
find my single-core only version (knucleotide_3.adb) at

   http://web.am.qub.ac.uk/users/j.parker/

in directory  bench_depository. Georg has given
his site in the 1st post in this thread.

Jonathan



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-21 21:43                     ` jonathan
@ 2009-08-22 22:35                       ` Georg Bauhaus
  2009-08-23 22:21                         ` jonathan
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-22 22:35 UTC (permalink / raw)


jonathan wrote:

>> Or if we are at it, we might experiment with slice renamings
>> or pointer artihmetic; requires a rewrite, I'd think.
>> If the rules permit this at all ;-)
> 
> I did make another attempt here.  [...]

> This essentially removed all the overhead in sliding
> array Data.

With the help of generics I have been able to further
reduce string processing load.  Instances will adapt String
to fragments, as needed.
The result is another 20% speed-up.  A pleasant surprise,
and fairly easy.  This allows using simple renames for
key data. :-)

I'd like to further merge the programs, following the
comments so far.  In particular, there doesn't seem to be
a real loss when using tasks on a single core machine,
IIUC.  I'll try to confirm this some more.
We could then submit just one program, if you agree.


Meanwhile, the latest is here,
http://home.arcor.de/bauhaus/Ada/knucleotide.string.tasking.gnat



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-22 22:35                       ` Georg Bauhaus
@ 2009-08-23 22:21                         ` jonathan
  0 siblings, 0 replies; 116+ messages in thread
From: jonathan @ 2009-08-23 22:21 UTC (permalink / raw)


On Aug 22, 11:35 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> jonathan wrote:
> >> Or if we are at it, we might experiment with slice renamings
> >> or pointer artihmetic; requires a rewrite, I'd think.
> >> If the rules permit this at all ;-)
>
> > I did make another attempt here.  [...]
> > This essentially removed all the overhead in sliding
> > array Data.
>
> With the help of generics I have been able to further
> reduce string processing load.  Instances will adapt String
> to fragments, as needed.
> The result is another 20% speed-up.  A pleasant surprise,
> and fairly easy.  This allows using simple renames for
> key data. :-)
>


The program now runs in a splendid 23 sec on my
machine. For those who have not been paying
attention, we started some time ago with a program
that ran in > 72 sec.  Its now (I should think) quite
a bit faster than all but one at the benchmarking
site.

The string renaming turned out to be such a neat
trick, I'll write it out:

  Key(1 .. Length) := Buffer(I .. I + Length - 1);

(in its various forms) was replaced by

  Key : String renames Buffer(I .. I + Length - 1);

This combined with the smaller strings did the
trick.  On my machine execution fell from ~32 sec to
~23 sec from these 2 steps alone.

One useful feature of the new design is we can
explicitly "grow the hash table" as the rules
seem to specify, (although they accepted the static
sized table last time).  We now have the option
of changing Table_Size with each instantiation:

--Table_Size : constant Integer := 2 ** 17;
  Table_Size : constant Integer
           := 2 ** Integer'Min (Fragment'Last*4, 17);

  subtype Hash_Type is Integer range 0 .. Table_Size-1;

This is the form used by many other entries.  The new
table size does not change the speed in the slightest.
I prefer the static  2 ** 17 myself.


> I'd like to further merge the programs, following the
> comments so far.  In particular, there doesn't seem to be
> a real loss when using tasks on a single core machine,
> IIUC.  I'll try to confirm this some more.
> We could then submit just one program, if you agree.
>

To fuse single-core and multitasking version I tried
the following:

if Multitasking_Desired then
   Work_On_2.Writer.Set (Nucleotide_Length => 2);
   Work_On_18.Writer.Set (Fragment_18'Length, Fragment_18);
   Work_On_1.Writer.Set (Nucleotide_Length => 1);
   Work_On_12.Writer.Set (Fragment_12'Length, Fragment_12);
   Work_On_6.Writer.Set (Fragment_6'Length, Fragment_6);
   Work_On_4.Writer.Set (Fragment_4'Length, Fragment_4);
   Work_On_3.Writer.Set (Fragment_3'Length, Fragment_3);
else
   Work_On_1.Write (Nucleotide_Length => 1);
   Work_On_2.Write (Nucleotide_Length => 2);
   Work_On_3.Write (Fragment_3'Length, Fragment_3);
   Work_On_4.Write (Fragment_4'Length, Fragment_4);
   Work_On_6.Write (Fragment_6'Length, Fragment_6);
   Work_On_12.Write (Fragment_12'Length, Fragment_12);
   Work_On_18.Write (Fragment_18'Length, Fragment_18);
end if;

With Multitasking_Desired set to False, a typical
(good) timing might look like:

real    0m23.552s
user    0m23.141s
sys     0m0.404s

With Multitasking_Desired set to True, a typical
(good) timing might look like:

real    0m10.373s
user    0m23.625s
sys     0m0.432s

Fluctuations are not tiny; maybe .7 sec.
And remember, Linux in probably putting the 7 Work
tasks on 7 cores of the 8 available on my machine.

And there's a problem I can't solve.  One out of 5 to
20 runs gives this result:

real    0m15.691s
user    0m34.150s
sys     0m0.396s

Always the same ~16 seconds.
I can't tell if its a Linux hiccup, the phase of the
moon, or what, but I can't get rid of it.

That's one reason I am not sure it is a good idea
just yet to fuse the single-core version and the
tasking version. I'ld rather see the cleaner
single-core version polished up and submitted quick.

To make a task-free version I just directly
removed all the tasking from your latest. Works
well .. my versions are obsolete .. just an
option to consider.

Jonathan




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (6 preceding siblings ...)
  2009-08-16 15:20 ` jonathan
@ 2009-08-27  9:05 ` Martin
  2009-08-27  9:08   ` Martin
  2009-10-07 11:44 ` Olivier Scalbert
  8 siblings, 1 reply; 116+ messages in thread
From: Martin @ 2009-08-27  9:05 UTC (permalink / raw)


Just tried to build the new version from the shootout site and it
doesn't build:

46:16     warning: unreachable code

in Data_Input.Read_Section.

And sure enough I can't see how you get out the outer loop...

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-27  9:05 ` Martin
@ 2009-08-27  9:08   ` Martin
  2009-08-27 10:01     ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Martin @ 2009-08-27  9:08 UTC (permalink / raw)


On Aug 27, 10:05 am, Martin <martin.do...@btopenworld.com> wrote:
> Just tried to build the new version from the shootout site and it
> doesn't build:
>
> 46:16     warning: unreachable code
>
> in Data_Input.Read_Section.
>
> And sure enough I can't see how you get out the outer loop...
>
> Cheers
> -- Martin

Sorry, it does build - it's a warning. My question is - if you can't
get out of the loop (except by exception) what are these lines doing
(other than making the SLOC metric very slightly worse)?

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-27  9:08   ` Martin
@ 2009-08-27 10:01     ` Georg Bauhaus
  0 siblings, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-27 10:01 UTC (permalink / raw)


Martin schrieb:
> On Aug 27, 10:05 am, Martin <martin.do...@btopenworld.com> wrote:
>> Just tried to build the new version from the shootout site and it
>> doesn't build:
>>
>> 46:16     warning: unreachable code
>>
>> in Data_Input.Read_Section.
>>
>> And sure enough I can't see how you get out the outer loop...
>>
>> Cheers
>> -- Martin
> 
> Sorry, it does build - it's a warning. My question is - if you can't
> get out of the loop (except by exception) what are these lines doing
> (other than making the SLOC metric very slightly worse)?


They read data into the package global buffer which is
then used for producing the result of the Read function.  The loop
(and all) is odd indeed but "justified" by the fact that
Skip_To_Section reads until the third (and last) section begins.
Read_Section starts there.  A second "justification" is
that Read_Section is a dumb rewrite of a previously recursive
routine that concatenated strings (thereby hitting stack
limits, no TCE, and making the test fail at the site).

I had seen that warning, too, but other things being
more pressing...
We are in the process of preparing the multitasking version
for submission.  How should we change Read_Section?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-02 12:55   ` Georg Bauhaus
@ 2009-08-27 11:17     ` Ole-Hjalmar Kristensen
  2009-08-27 12:25       ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-08-27 11:17 UTC (permalink / raw)


Character'read could very well be slower if there is no buffering
involved. The way to read efficiently is to read a large buffer
(multiple of disk block size) and chop it into lines yourself.

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-27 11:17     ` Ole-Hjalmar Kristensen
@ 2009-08-27 12:25       ` Georg Bauhaus
  2009-09-01  8:06         ` Ole-Hjalmar Kristensen
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-08-27 12:25 UTC (permalink / raw)


Ole-Hjalmar Kristensen schrieb:
> Character'read could very well be slower if there is no buffering
> involved. The way to read efficiently is to read a large buffer
> (multiple of disk block size) and chop it into lines yourself.
> 

Indeed, stream attributes won't help.

However, replacing Text_IO.Get_Line---which we use now---
might still be considered controversial:

- Is "chopping" an acceptable interpretation of "read line by line"?

- writing a correct double buffering scheme (or a buffer
  with end-of-buffer triggers for Read_A_Line_From_Buffer)
  is still a bit of work, unless there is something we could
  copy.

A package containing a quick getline will definitely be
useful in general Unix programming, I should think.
What is being used in the Socket packages, BTW?

Equally useful will be a package for fast unbounded strings.
Replace_Slice is what prevents showcasing GNAT's Spitbol
pattern matching as one of the fastest things on this planet!



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-27 12:25       ` Georg Bauhaus
@ 2009-09-01  8:06         ` Ole-Hjalmar Kristensen
  2009-09-01  9:29           ` Georg Bauhaus
  2009-09-01 10:16           ` Ole-Hjalmar Kristensen
  0 siblings, 2 replies; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-01  8:06 UTC (permalink / raw)


I see the C++ version uses fgets, which is a reasonably fast way of
getting a line. Is there any reason not to use fgets?
fgets (OpenSolaris version) uses memccpy to actually copy the buffer
until the newline:

     43 extern int _filbuf();
     44 extern char *memccpy();
     45 
     46 char *
     47 fgets(ptr, size, iop)
     48 char *ptr;
     49 register int size;
     50 register FILE *iop;
     51 {
     52 	char *p, *ptr0 = ptr;
     53 	register int n;
     54 
     55 	if ( !(iop->_flag & (_IOREAD|_IORW)) ) {
     56 		iop->_flag |= _IOERR;
     57 		return (NULL);
     58 	}
     59 
     60 	for (size--; size > 0; size -= n) {
     61 		if (iop->_cnt <= 0) { /* empty buffer */
     62 			if (_filbuf(iop) == EOF) {
     63 				if (ptr0 == ptr)
     64 					return (NULL);
     65 				break; /* no more data */
     66 			}
     67 			iop->_ptr--;
     68 			iop->_cnt++;
     69 		}
     70 		n = MIN(size, iop->_cnt);
     71 		if ((p = memccpy(ptr, (char *) iop->_ptr, '\n', n)) != NULL)
     72 			n = p - ptr;
     73 		ptr += n;
     74 		iop->_cnt -= n;
     75 		iop->_ptr += n;
     76 		_BUFSYNC(iop);
     77 		if (p != NULL)
     78 			break; /* found '\n' in buffer */
     79 	}
     80 	*ptr = '\0';
     81 	return (ptr0);
     82 }
>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:

    GB> Ole-Hjalmar Kristensen schrieb:
    >> Character'read could very well be slower if there is no buffering
    >> involved. The way to read efficiently is to read a large buffer
    >> (multiple of disk block size) and chop it into lines yourself.
    >> 

    GB> Indeed, stream attributes won't help.

    GB> However, replacing Text_IO.Get_Line---which we use now---
    GB> might still be considered controversial:

    GB> - Is "chopping" an acceptable interpretation of "read line by line"?

    GB> - writing a correct double buffering scheme (or a buffer
    GB>   with end-of-buffer triggers for Read_A_Line_From_Buffer)
    GB>   is still a bit of work, unless there is something we could
    GB>   copy.

    GB> A package containing a quick getline will definitely be
    GB> useful in general Unix programming, I should think.
    GB> What is being used in the Socket packages, BTW?

    GB> Equally useful will be a package for fast unbounded strings.
    GB> Replace_Slice is what prevents showcasing GNAT's Spitbol
    GB> pattern matching as one of the fastest things on this planet!

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01  8:06         ` Ole-Hjalmar Kristensen
@ 2009-09-01  9:29           ` Georg Bauhaus
  2009-09-01 10:48             ` Ole-Hjalmar Kristensen
  2009-09-01 10:16           ` Ole-Hjalmar Kristensen
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-01  9:29 UTC (permalink / raw)


Ole-Hjalmar Kristensen schrieb:
> I see the C++ version uses fgets, which is a reasonably fast way of
> getting a line. Is there any reason not to use fgets?
> fgets (OpenSolaris version) uses memccpy to actually copy the buffer
> until the newline:

There were two reasons not to import fgets:

1 - fgets turns out not to be faster than Stream_IO.
2 - It's C. ;-)


The fasta entry at the Shootout which uses Unchecked_Conversion,
not 'Address could even be improved a small bit if we change
it to use 'Address.  But then, this adds to the line count.
Is it worth it, for a shorter program?

There was a discussion leading to this
(<4a7b4daa$0$31333$9b4e6d93@newsspool4.arcor-online.net>)
early in August.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01  8:06         ` Ole-Hjalmar Kristensen
  2009-09-01  9:29           ` Georg Bauhaus
@ 2009-09-01 10:16           ` Ole-Hjalmar Kristensen
  2009-09-01 11:34             ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-01 10:16 UTC (permalink / raw)


Or you could use the Unix read system call and roll your own get_line function
along these lines. It will read 1000000 lines of 60 characters line by line in about a second.

with Interfaces.C_Streams; use Interfaces.C_Streams;
with Ada.Text_Io;

procedure Get_Line_Test is
   In_Buf : String(1..8192);
   for In_Buf'Alignment use 8192;

   First: Integer := In_Buf'last;
   Last: Integer := 0;
   N : Size_T := 1;

   function Read(Fd : Int; Buffer : Voids; Nbyte : Size_T) return Size_T;
   pragma Import(C,read);

   function Empty_Buffer return Boolean is
   begin
      return First > Last;
   end Empty_Buffer;
   pragma  Inline(Empty_Buffer);

   function End_File return Boolean is
   begin
      return N <= 0;
   end End_File;
   pragma  Inline(End_File);
   
   -- Get line from stdin, return "" if end of file
   function Get_Line return String is
   begin
      -- Get buffer from file if empty
      if Empty_buffer then
         N := Read(0,In_Buf(1)'Address, Size_T(In_Buf'Last));
         First := 1;
         Last := Integer(N);
      end if;
      if Empty_Buffer then
         return "";
      end if;

      -- Look for end of line and return it
      for I in First..Last loop
         if In_Buf(I) = Ascii.LF then
            declare
               Start : Integer := First;
            begin
               First := I + 1;
               return In_Buf(Start..I-1);
            end;
         end if;
      end loop;

      -- Did not find end of line, seek further
      declare
         Copy: string := In_Buf(First..Last);
      begin
         Last := 0;
         return Copy & Get_Line;
      end;
   end Get_Line;
   pragma Inline(Get_Line);

begin
   while not End_File loop
      declare
         Line : String := Get_Line;
      begin
         -- Do something with line here
         null;
      end;
   end loop;
end Get_Line_Test;

astra06:~/ada/div-ada> uname -a
SunOS astra06 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-V440
astra06:~/ada/div-ada> ls -l x.txt
-rw-------   1 ok145024 clustra  60000000 Sep  1 12:03 x.txt
astra06:~/ada/div-ada> time ./get_line_test < x.txt
0.49u 0.09s 0:00.59 98.3%

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01  9:29           ` Georg Bauhaus
@ 2009-09-01 10:48             ` Ole-Hjalmar Kristensen
  0 siblings, 0 replies; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-01 10:48 UTC (permalink / raw)


>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:
OK, I see. In GNAT, the stream_io is a relatively thin layer on top of
the C streams, isn't it? So you will get essentially the same speed as
the C standard IO library. But does Stream_Io have a version of
get_line? I know the C streams interface has a binding to fgets, but
that is just using fgets directly. How did you read line by line with
Stream_Io?

The only way I can think of to improve on fgets is to get
rid of the copying, but would you then strictly be reading line by
line?

I am thinking of something like 

procedure get_line(buffer : in out string; first : out natural; last :out natural);

This get_line would only need to copy when it encounters a string which
extends past the end of the buffer, in the normal case it would only
return the indices of the slice.

    GB> Ole-Hjalmar Kristensen schrieb:
    >> I see the C++ version uses fgets, which is a reasonably fast way of
    >> getting a line. Is there any reason not to use fgets?
    >> fgets (OpenSolaris version) uses memccpy to actually copy the buffer
    >> until the newline:

    GB> There were two reasons not to import fgets:

    GB> 1 - fgets turns out not to be faster than Stream_IO.
    GB> 2 - It's C. ;-)


    GB> The fasta entry at the Shootout which uses Unchecked_Conversion,
    GB> not 'Address could even be improved a small bit if we change
    GB> it to use 'Address.  But then, this adds to the line count.
    GB> Is it worth it, for a shorter program?

    GB> There was a discussion leading to this
    GB> (<4a7b4daa$0$31333$9b4e6d93@newsspool4.arcor-online.net>)
    GB> early in August.

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01 10:16           ` Ole-Hjalmar Kristensen
@ 2009-09-01 11:34             ` Georg Bauhaus
  2009-09-01 12:11               ` Ole-Hjalmar Kristensen
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-01 11:34 UTC (permalink / raw)


Ole-Hjalmar Kristensen schrieb:

> astra06:~/ada/div-ada> uname -a
> SunOS astra06 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-V440
> astra06:~/ada/div-ada> ls -l x.txt
> -rw-------   1 ok145024 clustra  60000000 Sep  1 12:03 x.txt
> astra06:~/ada/div-ada> time ./get_line_test < x.txt
> 0.49u 0.09s 0:00.59 98.3%
> 


Comparing the two programs, they are on a par, almost
exactly.

However, yours is IMHO less messy. It also has

        return Copy & Get_Line;

which, I guess, solves the problem of long lines, even
at the expense of copying.

Can we reduce the copying overhead?
The copying would be have O(f(Item'Length)) with
f(n) ~= s*(s+1)/2 where s = n/BUFSIZ, so it is in O(n**2).
We could, I think, use a storage pool:

- whenever copying is needed, allocate another buffer
  right next to the current buffer. Use the newly allocated
  buffer for the recursive call.

- on reaching of line/file, reinterpret the sequence of
  buffers in the storage pool as a String. Finally, Free.

Can this be done?

(This won't make a Shootout program any smaller, but
seems good for the general case...)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01 11:34             ` Georg Bauhaus
@ 2009-09-01 12:11               ` Ole-Hjalmar Kristensen
  2009-09-01 14:36                 ` Georg Bauhaus
  2009-09-02  8:44                 ` Georg Bauhaus
  0 siblings, 2 replies; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-01 12:11 UTC (permalink / raw)


>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:

I think the problem is not the explicit copy which happens when we hit
the end of the buffer (easily verified, just increase the buffer size
10 times, the copying will happen 1/10 of the time, but execution time
stays the same), but the implicit copy which seems to happen in the statement
return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
or if the compiler could have optimized it away.

The idea of using a storage pool is an interesting one, since it could
allow you to avoid copying when you want to return a string from a
function. But how would you overlay the string on top of the read
buffer? An Ada string is not a C string, you would need to put the
descriptor somewhere as well.

    GB> Ole-Hjalmar Kristensen schrieb:
    >> astra06:~/ada/div-ada> uname -a
    >> SunOS astra06 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-V440
    >> astra06:~/ada/div-ada> ls -l x.txt
    >> -rw-------   1 ok145024 clustra  60000000 Sep  1 12:03 x.txt
    >> astra06:~/ada/div-ada> time ./get_line_test < x.txt
    >> 0.49u 0.09s 0:00.59 98.3%
    >> 


    GB> Comparing the two programs, they are on a par, almost
    GB> exactly.

    GB> However, yours is IMHO less messy. It also has

    GB>         return Copy & Get_Line;

    GB> which, I guess, solves the problem of long lines, even
    GB> at the expense of copying.

    GB> Can we reduce the copying overhead?
    GB> The copying would be have O(f(Item'Length)) with
    GB> f(n) ~= s*(s+1)/2 where s = n/BUFSIZ, so it is in O(n**2).
    GB> We could, I think, use a storage pool:

    GB> - whenever copying is needed, allocate another buffer
    GB>   right next to the current buffer. Use the newly allocated
    GB>   buffer for the recursive call.

    GB> - on reaching of line/file, reinterpret the sequence of
    GB>   buffers in the storage pool as a String. Finally, Free.

    GB> Can this be done?

    GB> (This won't make a Shootout program any smaller, but
    GB> seems good for the general case...)

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01 12:11               ` Ole-Hjalmar Kristensen
@ 2009-09-01 14:36                 ` Georg Bauhaus
  2009-09-03  8:49                   ` Ole-Hjalmar Kristensen
  2009-09-02  8:44                 ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-01 14:36 UTC (permalink / raw)


Ole-Hjalmar Kristensen schrieb:
>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:
> 
> I think the problem is not the explicit copy which happens when we hit
> the end of the buffer (easily verified, just increase the buffer size
> 10 times, the copying will happen 1/10 of the time, but execution time
> stays the same), but the implicit copy which seems to happen in the statement
> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
> or if the compiler could have optimized it away.

WRT return In_Buf(Start..I-1), there seems to be a way to have the
bounds shift towards 1 .. Result'Length without copying:

with Ada.Text_IO;
procedure Bnds is
   function S return String is
   begin
      return String'(10 .. 20 => '*');
   end S;

   X : constant String := S;
begin
   Ada.Text_IO.Put_Line(Positive'Image(X'First));
   declare
      subtype XString is String(1 .. X'Length);
      Z : XString;
      pragma Import (Ada, Z);
      for Z'Address use X'Address;
   begin
      Ada.Text_IO.Put_Line(Positive'Image(Z'First));
   end;
end Bnds;

Is this "phantom sliding" accidental or can it be relied upon?


> The idea of using a storage pool is an interesting one, since it could
> allow you to avoid copying when you want to return a string from a
> function. But how would you overlay the string on top of the read
> buffer? An Ada string is not a C string, you would need to put the
> descriptor somewhere as well.

I'll reread "Accessing Memory as a String",
http://adapower.com/adapower1/lang/accessmem.html

Maybe this leads somewhere regarding this question.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01 12:11               ` Ole-Hjalmar Kristensen
  2009-09-01 14:36                 ` Georg Bauhaus
@ 2009-09-02  8:44                 ` Georg Bauhaus
  2009-09-02 11:07                   ` Ole-Hjalmar Kristensen
  2009-09-03  8:13                   ` Ole-Hjalmar Kristensen
  1 sibling, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-02  8:44 UTC (permalink / raw)


Ole-Hjalmar Kristensen wrote:
>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:
> 
> I think the problem is not the explicit copy which happens when we hit
> the end of the buffer (easily verified, just increase the buffer size
> 10 times, the copying will happen 1/10 of the time, but execution time
> stays the same), but the implicit copy which seems to happen in the statement
> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
> or if the compiler could have optimized it away.

It looks like copying In_Buf(Start..I-1) be circumvented.
I have added a solution to Line_IO.

read() performance vs. Stream_IO.Read performance seems
to depend on the OS.  On a Mac here, read() is faster.
On Ubuntu, Stream_IO.Read is faster.

There is an issue with GNATs and End_File (GNATs other
than I guess the one you have). When comparing N <= 0,
``operator for type "System.Crtl.size_t" is not
directly visible''.  To work around this, I wrote

   function End_File return Boolean is
      type Size_T is mod 2**Standard'Address_Size;
      N0 : constant Size_T := Size_T(N);
      Zero : constant Size_T := 0;
   begin
      return N0 <= Zero;
   end End_File;

Finally, I think Get_Line_Test can be made more portable among Unix
compilers by changing GNAT specific Interfaces.C_Streams
into Interfaces.C; Could be slightly less convenient to
write, though?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-02  8:44                 ` Georg Bauhaus
@ 2009-09-02 11:07                   ` Ole-Hjalmar Kristensen
  2009-09-03  8:13                   ` Ole-Hjalmar Kristensen
  1 sibling, 0 replies; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-02 11:07 UTC (permalink / raw)


>>>>> "GB" == Georg Bauhaus <rm.tsoh.plus-bug.bauhaus@maps.futureapps.de> writes:

    GB> Ole-Hjalmar Kristensen wrote:
    >>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:
    >> 
    >> I think the problem is not the explicit copy which happens when we hit
    >> the end of the buffer (easily verified, just increase the buffer size
    >> 10 times, the copying will happen 1/10 of the time, but execution time
    >> stays the same), but the implicit copy which seems to happen in the statement
    >> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
    >> or if the compiler could have optimized it away.

    GB> It looks like copying In_Buf(Start..I-1) be circumvented.
    GB> I have added a solution to Line_IO.

    GB> read() performance vs. Stream_IO.Read performance seems
    GB> to depend on the OS.  On a Mac here, read() is faster.
    GB> On Ubuntu, Stream_IO.Read is faster.

Strange. I cannot really imagine how Stream_IO.Read is faster since it has to
do the system call to do the actual read anyway. One possibility is
that In_Buf is not properly aligned, so that read() into In_Buf is
slower than the equivalent read() into the C stdio buffers. The system
call should be the same in any case. 

    GB> There is an issue with GNATs and End_File (GNATs other
    GB> than I guess the one you have). When comparing N <= 0,
    GB> ``operator for type "System.Crtl.size_t" is not
    GB> directly visible''.  To work around this, I wrote

    GB>    function End_File return Boolean is
    GB>       type Size_T is mod 2**Standard'Address_Size;
    GB>       N0 : constant Size_T := Size_T(N);
    GB>       Zero : constant Size_T := 0;
    GB>    begin
    GB>       return N0 <= Zero;
    GB>    end End_File;

Yes. I was using an old version of GNAT.

    GB> Finally, I think Get_Line_Test can be made more portable among Unix
    GB> compilers by changing GNAT specific Interfaces.C_Streams
    GB> into Interfaces.C; Could be slightly less convenient to
    GB> write, though?

I only included Interfaces.C_Streams because it was a convenient place
to get hold of the types needed to import the read() function, so if
you find them in Interfaces.C, that's just fine.

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-02  8:44                 ` Georg Bauhaus
  2009-09-02 11:07                   ` Ole-Hjalmar Kristensen
@ 2009-09-03  8:13                   ` Ole-Hjalmar Kristensen
  2009-09-03 10:39                     ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-03  8:13 UTC (permalink / raw)


>>>>> "GB" == Georg Bauhaus <rm.tsoh.plus-bug.bauhaus@maps.futureapps.de> writes:

    GB> Ole-Hjalmar Kristensen wrote:
    >>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:
    >> 
    >> I think the problem is not the explicit copy which happens when we hit
    >> the end of the buffer (easily verified, just increase the buffer size
    >> 10 times, the copying will happen 1/10 of the time, but execution time
    >> stays the same), but the implicit copy which seems to happen in the statement
    >> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
    >> or if the compiler could have optimized it away.

    GB> It looks like copying In_Buf(Start..I-1) be circumvented.
    GB> I have added a solution to Line_IO.

How did you avoid the copying?

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-01 14:36                 ` Georg Bauhaus
@ 2009-09-03  8:49                   ` Ole-Hjalmar Kristensen
  0 siblings, 0 replies; 116+ messages in thread
From: Ole-Hjalmar Kristensen @ 2009-09-03  8:49 UTC (permalink / raw)


Just another thought: Would reading the string into a large enough
buffer and just return a string descriptor (pointer and bounds) to the line
be considered cheating? That way, you could avoid ever copying
strings, just copy the whole of stdin to a large buffer and return the
lines as descriptors.

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03  8:13                   ` Ole-Hjalmar Kristensen
@ 2009-09-03 10:39                     ` Georg Bauhaus
  0 siblings, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-03 10:39 UTC (permalink / raw)


Ole-Hjalmar Kristensen schrieb:
>>>>>> "GB" == Georg Bauhaus <rm.tsoh.plus-bug.bauhaus@maps.futureapps.de> writes:

>     GB> It looks like copying In_Buf(Start..I-1) be circumvented.
>     GB> I have added a solution to Line_IO.
> 
> How did you avoid the copying?

Only in the non-recursive case, and only "inside" the subprogram, by
renaming the slice in question and then either

- create an overlay where some string object (of a subtype
  with the needed bounds)   uses the renamed slice
  as storage.  (Perhaps use Unchecked_Conversion if Randy's
  comments on Gem #39 suggest it.)  Or,

- convert said renamed slice to a string subtype with
  proper bounds.

I'm still struggling (in spare free time) with long lines
with or without terminators, proper End_Error (to mimick
Text_IO.Get_Line's behavior), terminal I/O, and recursion.


(An interesting test case (from a still open Python
bug) is this:

$ cat | ./line_reading_program
asdf
^D

$ ./line_reading_program
asdf
^D



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-02 11:39   ` Georg Bauhaus
  2009-08-02 16:00     ` jonathan
@ 2009-09-03 13:44     ` Olivier Scalbert
  2009-09-03 14:08       ` Ludovic Brenta
  2009-09-03 15:28       ` Georg Bauhaus
  1 sibling, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-03 13:44 UTC (permalink / raw)


Georg Bauhaus wrote:
> jonathan wrote:
>> I forgot to add I used:
>>
>>   gnatmake -gnatnp -O3 -funroll-all-loops -march=native knucleotide
>>
>> with a running time of (usually)  7.25 sec on the 25Meg data file:
>>
>>     time ./knucleotide < fasta25.dat
> 
> Thanks for running the program, glad to hear how they work.
> 
> Meanwhile I have a tasking version which runs a little (25%)
> faster.  The two new changes are
> 
> 1 - make 3 additional tasks call Write; round robin
> 
> 2 - order the results using a protected Printer object
>  with entry families.  Heck, these entry families have clever
>  features! :-)
> 
> 
> http://home.arcor.de/bauhaus/Ada/knucleotide.tasking.gnat

Hi,

Same result for me !

One question:
I have tried to profile knucleotide, so I have recompile it after 
removing all the inlines and optimization:

gnatmake -gnatNp -march=native -g -pg -f knucleotide.adb -o 
knucleotide.gnat_run

After running it with the 25M file, I have:
$ gprof ./knucleotide.gnat_run -b
Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
100.00     19.30    19.30        1    19.30    19.30  _ada_knucleotide
   0.00     19.30     0.00        1     0.00     0.00  adainit


			Call graph


granularity: each sample hit covers 4 byte(s) for 0.05% of 19.30 seconds

index % time    self  children    called     name
                                                  <spontaneous>
[1]    100.0    0.00   19.30                 main [1]
                19.30    0.00       1/1           _ada_knucleotide [2]
                 0.00    0.00       1/1           adainit [3]
-----------------------------------------------
                              279444692             _ada_knucleotide [2]
                19.30    0.00       1/1           main [1]
[2]    100.0   19.30    0.00       1+279444692 _ada_knucleotide [2]
                              279444692             _ada_knucleotide [2]
-----------------------------------------------
                 0.00    0.00       1/1           main [1]
[3]      0.0    0.00    0.00       1         adainit [3]
-----------------------------------------------


Index by function name

    [2] _ada_knucleotide        [3] adainit

Why I do not see all the different functions and procedures ? Should I 
miss something ?

Thanks,

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 13:44     ` Olivier Scalbert
@ 2009-09-03 14:08       ` Ludovic Brenta
  2009-09-03 15:13         ` Olivier Scalbert
  2009-09-03 15:28       ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-03 14:08 UTC (permalink / raw)


Olivier Scalbert wrote on comp.lang.ada:
> gnatmake -gnatNp -march=native -g -pg -f knucleotide.adb -o
> knucleotide.gnat_run
>
> After running it with the 25M file, I have:
> $ gprof ./knucleotide.gnat_run -b
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls   s/call   s/call  name
> 100.00     19.30    19.30        1    19.30    19.30  _ada_knucleotide
>    0.00     19.30     0.00        1     0.00     0.00  adainit
[...]
> Why I do not see all the different functions and procedures ? Should I
> miss something ?

You enabled front-end inlining with -gnatN; GNAT turned the whole
program into one procedure. Even then, it should be possible to run
the program under valgrind's callgrind tool to get accurate, per-line
(indeed per-instruction) execution costs. I don't know whether gprof
has such granularity or not.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 14:08       ` Ludovic Brenta
@ 2009-09-03 15:13         ` Olivier Scalbert
  2009-09-03 19:11           ` sjw
  0 siblings, 1 reply; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-03 15:13 UTC (permalink / raw)


Ludovic Brenta wrote:

> You enabled front-end inlining with -gnatN; GNAT turned the whole
> program into one procedure. Even then, it should be possible to run
> the program under valgrind's callgrind tool to get accurate, per-line
> (indeed per-instruction) execution costs. I don't know whether gprof
> has such granularity or not.
> 
> --
> Ludovic Brenta.

Thanks Ludovic.

I will try callgrind tomorrow.

Anyway with:
$ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run

I still have:
$ gprof -b ./knucleotide.gnat_run
Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
100.00     24.30    24.30        1    24.30    24.30  _ada_knucleotide
   0.00     24.30     0.00        1     0.00     0.00  adainit


			Call graph


granularity: each sample hit covers 4 byte(s) for 0.04% of 24.30 seconds

index % time    self  children    called     name
                                                  <spontaneous>
[1]    100.0    0.00   24.30                 main [1]
                24.30    0.00       1/1           _ada_knucleotide [2]
                 0.00    0.00       1/1           adainit [3]
-----------------------------------------------
                              306009664             _ada_knucleotide [2]
                24.30    0.00       1/1           main [1]
[2]    100.0   24.30    0.00       1+306009664 _ada_knucleotide [2]
                              306009664             _ada_knucleotide [2]
-----------------------------------------------
                 0.00    0.00       1/1           main [1]
[3]      0.0    0.00    0.00       1         adainit [3]
-----------------------------------------------


Index by function name

    [2] _ada_knucleotide        [3] adainit

So, no more details yet ...

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 13:44     ` Olivier Scalbert
  2009-09-03 14:08       ` Ludovic Brenta
@ 2009-09-03 15:28       ` Georg Bauhaus
  2009-09-04  1:24         ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-03 15:28 UTC (permalink / raw)


Olivier Scalbert schrieb:

> One question:
> I have tried to profile knucleotide, so I have recompile it after
> removing all the inlines and optimization:

Thanks for doing so.  Ludovic has mentioned -gnatN.

If all goes well I will put a new multitasking
version at the given address later this evening.
This will include all new patches we have collected
so far.

In order for gprof to show something that makes sense
(to me, at least), it seems a good idea to use an alphabetically
named Fragments."=" function.

package Fragments ...

   function Equals (Left, Right: Fragment) return Boolean is ...
   function "=" (Left, Right: Fragment) return Boolean
     renames Equals;

Then, use Equals in the actual parameter list for the table
generic.  I have used the single task program
the Shootout collection.

$ gnatmake -gnato -march=native -g -pg \
  -f knucleotide.adb -o knucleotide.gnat_run

Maybe without -gnato, and gradually increasing optimization.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 15:13         ` Olivier Scalbert
@ 2009-09-03 19:11           ` sjw
  2009-09-04  6:11             ` Olivier Scalbert
  0 siblings, 1 reply; 116+ messages in thread
From: sjw @ 2009-09-03 19:11 UTC (permalink / raw)


On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
wrote:
> Ludovic Brenta wrote:
> > You enabled front-end inlining with -gnatN; GNAT turned the whole
> > program into one procedure. Even then, it should be possible to run
> > the program under valgrind's callgrind tool to get accurate, per-line
> > (indeed per-instruction) execution costs. I don't know whether gprof
> > has such granularity or not.
..
> Anyway with:
> $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run

You need -ftest-coverage -fprofile-arcs for gprof to give per-line
coverage. Though not sure what it will do with really massive
inlining! (our experience on PowerPC has been that inlining usually
makes things worse - but that's for a large program with much logic,
little maths)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 15:28       ` Georg Bauhaus
@ 2009-09-04  1:24         ` Georg Bauhaus
  0 siblings, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04  1:24 UTC (permalink / raw)


Georg Bauhaus wrote:

> If all goes well I will put a new multitasking
> version at the given address later this evening.

It went almost well.  Here is a new single tasking
version which incorporates many if not all of the
latest patches, including a new Line_IO.

Interesting to play with, for example, is

 Bytes_Per_Word : constant := ? ;

in KNucleotide's String_Fragments, where ? = 4 or ? = 8.
Seems to make a difference in some environments.

To see for yourself you would need these sources:
http://home.arcor.de/bauhaus/Ada/knucleotide.single.gnat
http://home.arcor.de/bauhaus/Ada/line_io.ada

(The final Line_IO for K-Nucleotide will be with a null
Write implementation only as Write isn't used. Should
be a few lines shorter, then.)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-03 19:11           ` sjw
@ 2009-09-04  6:11             ` Olivier Scalbert
  2009-09-04  8:18               ` Ludovic Brenta
  2009-09-04  9:27               ` Georg Bauhaus
  0 siblings, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-04  6:11 UTC (permalink / raw)


sjw wrote:
> On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
> wrote:
>> Ludovic Brenta wrote:
>>> You enabled front-end inlining with -gnatN; GNAT turned the whole
>>> program into one procedure. Even then, it should be possible to run
>>> the program under valgrind's callgrind tool to get accurate, per-line
>>> (indeed per-instruction) execution costs. I don't know whether gprof
>>> has such granularity or not.
> ..
>> Anyway with:
>> $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run
> 
> You need -ftest-coverage -fprofile-arcs for gprof to give per-line
> coverage. Though not sure what it will do with really massive
> inlining! (our experience on PowerPC has been that inlining usually
> makes things worse - but that's for a large program with much logic,
> little maths)

Thanks Simon.
But same result !
$ gnat -version
GNAT 4.3.3
Copyright 1996-2007, Free Software Foundation, Inc.

$ gnatmake -pg -f -ftest-coverage -fprofile-arcs knucleotide.adb -o 
knucleotide.gnat_run

$ ./knucleotide.gnat_run < fasta/fasta25m.dat
$ gprof -b ./knucleotide.gnat_run

Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
100.00    114.75   114.75        1   114.75   114.75  _ada_knucleotide
   0.00    114.75     0.00        1     0.00     0.00  adainit


			Call graph


granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds

index % time    self  children    called     name
                                                  <spontaneous>
[1]    100.0    0.00  114.75                 main [1]
               114.75    0.00       1/1           _ada_knucleotide [2]
                 0.00    0.00       1/1           adainit [3]
-----------------------------------------------
                              460000417             _ada_knucleotide [2]
               114.75    0.00       1/1           main [1]
[2]    100.0  114.75    0.00       1+460000417 _ada_knucleotide [2]
                              460000417             _ada_knucleotide [2]
-----------------------------------------------
                 0.00    0.00       1/1           main [1]
[3]      0.0    0.00    0.00       1         adainit [3]
-----------------------------------------------


Index by function name

    [2] _ada_knucleotide        [3] adainit

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04  6:11             ` Olivier Scalbert
@ 2009-09-04  8:18               ` Ludovic Brenta
  2009-09-04 10:17                 ` Olivier Scalbert
  2009-09-04  9:27               ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-04  8:18 UTC (permalink / raw)


I cannot help more with gprof as I've never used it before but it
seems to me that profiling the unoptimized program is pointless. It is
much better to profile the fully optimized and inlined program; for
this I still recommend valgrind because it gives accurate measurements
for every instruction.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04  6:11             ` Olivier Scalbert
  2009-09-04  8:18               ` Ludovic Brenta
@ 2009-09-04  9:27               ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04  9:27 UTC (permalink / raw)


Olivier Scalbert schrieb:

> $ ./knucleotide.gnat_run < fasta/fasta25m.dat
> $ gprof -b ./knucleotide.gnat_run
> 
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
> 100.00    114.75   114.75        1   114.75   114.75  _ada_knucleotide
>   0.00    114.75     0.00        1     0.00     0.00  adainit
> 
> 
>             Call graph
> 
> 
> granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds
> 
> index % time    self  children    called     name
>                                                  <spontaneous>
> [1]    100.0    0.00  114.75                 main [1]
>               114.75    0.00       1/1           _ada_knucleotide [2]
>                 0.00    0.00       1/1           adainit [3]
> -----------------------------------------------
>                              460000417             _ada_knucleotide [2]
>               114.75    0.00       1/1           main [1]
> [2]    100.0  114.75    0.00       1+460000417 _ada_knucleotide [2]
>                              460000417             _ada_knucleotide [2]
> -----------------------------------------------
>                 0.00    0.00       1/1           main [1]
> [3]      0.0    0.00    0.00       1         adainit [3]
> -----------------------------------------------
> 
> 
> Index by function name
> 
>    [2] _ada_knucleotide        [3] adainit
> 
> Olivier


This looks as though gprof's option -a might have been given
(somewhere).  Does /full/path/to/normal/gprof -b ...give the same
result?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04  8:18               ` Ludovic Brenta
@ 2009-09-04 10:17                 ` Olivier Scalbert
  2009-09-04 12:37                   ` Ludovic Brenta
  2009-09-04 12:50                   ` Ludovic Brenta
  0 siblings, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-04 10:17 UTC (permalink / raw)


Ludovic Brenta wrote:
> I cannot help more with gprof as I've never used it before but it
> seems to me that profiling the unoptimized program is pointless. It is
> much better to profile the fully optimized and inlined program; for
> this I still recommend valgrind because it gives accurate measurements
> for every instruction.
> 
> --
> Ludovic Brenta.

Ok, let's switch to valgrind !

First, let's check memory leaks:

$ gnatmake -gnatNp -O3 -g -fomit-frame-pointer -march=native -f 
knucleotide.adb -o knucleotide.gnat_run

$ valgrind --max-stackframe=20000000 --leak-check=full 
--show-reachable=yes ./knucleotide.gnat_run < fasta/fasta25m.dat
==13802==
==13802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 1)
==13802== malloc/free: in use at exit: 12,512,380 bytes in 3 blocks.
==13802== malloc/free: 145 allocs, 142 frees, 396,773,180 bytes allocated.
==13802== For counts of detected errors, rerun with: -v
==13802== searching for pointers to 3 not-freed blocks.
==13802== checked 12,680,612 bytes.
==13802==
==13802== 12,512,380 bytes in 3 blocks are still reachable in loss 
record 1 of 1
==13802==    at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==13802==    by 0x42C1664: __gnat_malloc (in /usr/lib/libgnat-4.3.so.1)
==13802==    by 0x406993E: system__task_primitives__operations__new_atcb 
(in /usr/lib/libgnarl-4.3.so.1)
==13802==    by 0x406C7AD: system__tasking__initialize (in 
/usr/lib/libgnarl-4.3.so.1)
==13802==    by 0x406C186: system__tasking__initialization__init_rts (in 
/usr/lib/libgnarl-4.3.so.1)
==13802==    by 0x406C2A6: system__tasking__initialization___elabb (in 
/usr/lib/libgnarl-4.3.so.1)
==13802==    by 0x804AFA0: adainit (b~knucleotide.adb:162)
==13802==    by 0x804AD50: ??? (start.S:119)
==13802==
==13802== LEAK SUMMARY:
==13802==    definitely lost: 0 bytes in 0 blocks.
==13802==      possibly lost: 0 bytes in 0 blocks.
==13802==    still reachable: 12,512,380 bytes in 3 blocks.
==13802==         suppressed: 0 bytes in 0 blocks.

Mmmm, 3 blocks are still reachable. 3 workers ?

Second, let's profile:
$ valgrind --max-stackframe=20000000 --leak-check=yes 
./knucleotide.gnat_run < fasta/fasta25m.dat

25 minutes later ...
callgrind_annotate --inclusive=yes --tree=both --auto=yes 
callgrind.out.14048  > callgrind_out_14048.txt

As the file is quit big, you can find it there:
http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt

As it is the first time I use this tool, do not hesitate to ask me a new 
run with other options !

Enjoy,

Olivier






^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 10:17                 ` Olivier Scalbert
@ 2009-09-04 12:37                   ` Ludovic Brenta
  2009-09-04 13:50                     ` Olivier Scalbert
  2009-09-04 12:50                   ` Ludovic Brenta
  1 sibling, 1 reply; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-04 12:37 UTC (permalink / raw)


On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
wrote:
> Mmmm, 3 blocks are still reachable. 3 workers ?

Yes, these are the task control blocks. Perhaps there is a way to
teach valgrind to ignore those as normal.

> Second, let's profile:
> $ valgrind --max-stackframe=20000000 --leak-check=yes
> ./knucleotide.gnat_run < fasta/fasta25m.dat
>
> 25 minutes later ...
> callgrind_annotate --inclusive=yes --tree=both --auto=yes
> callgrind.out.14048  > callgrind_out_14048.txt

OK, now try "kcachegrind callgrind.out.14048" and you can navigate
graphically in the call graph and see where the time is spent.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 10:17                 ` Olivier Scalbert
  2009-09-04 12:37                   ` Ludovic Brenta
@ 2009-09-04 12:50                   ` Ludovic Brenta
  2009-09-04 16:22                     ` Georg Bauhaus
  2009-09-04 17:56                     ` Martin
  1 sibling, 2 replies; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-04 12:50 UTC (permalink / raw)


Olivier Scalbert wrote on comp.lang.ada:
> http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt

Let's see:

43,644,857,598  PROGRAM TOTALS
[...]

            .
----------------------------------------------------------------------------
            .     --
            .     --  Calculate and write data - either a percentage
for all fragments found or - when
            .     --  Nucleotide_Fragment is given - the count for
that fragment.
            .     --
  262,499,981     procedure Write
8,251,549,172  => ???:system__secondary_stack__ss_mark (7x)
            .       (Nucleotide_Length   : in Frequencies;
            .        Nucleotide_Fragment : in Fragment :=
Fragments.Null_Bounded_String)
            .     is

Apparently, passing unconstrained strings to procedure Write involves
allocations on the secondary stack which account for 20% of the entire
execution time. That's hot spot #1. I wonder whether it is feasible to
pass access values to the strings.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 12:37                   ` Ludovic Brenta
@ 2009-09-04 13:50                     ` Olivier Scalbert
  0 siblings, 0 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-04 13:50 UTC (permalink / raw)


Ludovic Brenta wrote:
> On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
> wrote:
>> Mmmm, 3 blocks are still reachable. 3 workers ?
> 
> Yes, these are the task control blocks. Perhaps there is a way to
> teach valgrind to ignore those as normal.
> 
>> Second, let's profile:
>> $ valgrind --max-stackframe=20000000 --leak-check=yes
>> ./knucleotide.gnat_run < fasta/fasta25m.dat
>>
>> 25 minutes later ...
>> callgrind_annotate --inclusive=yes --tree=both --auto=yes
>> callgrind.out.14048  > callgrind_out_14048.txt
> 
> OK, now try "kcachegrind callgrind.out.14048" and you can navigate
> graphically in the call graph and see where the time is spent.
> 
> --
> Ludovic Brenta.

What an impressive program !
Should I rerun with --dump-instr=yes and --trace-jump=yes ?
(I run it on a 32 bits linux)

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 12:50                   ` Ludovic Brenta
@ 2009-09-04 16:22                     ` Georg Bauhaus
  2009-09-04 16:29                       ` Georg Bauhaus
  2009-09-04 17:56                     ` Martin
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04 16:22 UTC (permalink / raw)


Ludovic Brenta schrieb:

>   262,499,981     procedure Write
> 8,251,549,172  => ???:system__secondary_stack__ss_mark (7x)
>             .       (Nucleotide_Length   : in Frequencies;
>             .        Nucleotide_Fragment : in Fragment :=
> Fragments.Null_Bounded_String)
>             .     is
> 
> Apparently, passing unconstrained strings to procedure Write involves
> allocations on the secondary stack which account for 20% of the entire
> execution time. That's hot spot #1.

Indeed, and this particular hot spot had been cooled down twice:

Step 1 - we replaced Bounded_String with our own Bounded_String

Step 2 - we replaced this new Bounded_String with plain
 constrained  strings of suitable fixed length (using generics)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 16:22                     ` Georg Bauhaus
@ 2009-09-04 16:29                       ` Georg Bauhaus
  2009-09-04 16:38                         ` Ludovic Brenta
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04 16:29 UTC (permalink / raw)


Georg Bauhaus schrieb:
> Ludovic Brenta schrieb:

>> Apparently, passing unconstrained strings to procedure Write involves
>> allocations on the secondary stack which account for 20% of the entire
>> execution time. That's hot spot #1.
> 
> Indeed, and this particular hot spot had been cooled down twice:
> 
> Step 1 - we replaced Bounded_String with our own Bounded_String
> 
> Step 2 - we replaced this new Bounded_String with plain
>  constrained  strings of suitable fixed length (using generics)

I should add that the current program spends much of its time
in equality comparison of fragment strings,
and then some in the hash function.
So not only are the bounded_strings gone;
Jonathan has also contributed a highly efficient hashing
function and a cute string equality function.

(As mentioned, to actually see the effects (of the current
program), String_Fragments."=" should be a renaming of Equals.
Operator subprograms seem to confuse the profiling programs,
or am I missing some setting?)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 16:29                       ` Georg Bauhaus
@ 2009-09-04 16:38                         ` Ludovic Brenta
  2009-09-04 17:58                           ` Georg Bauhaus
  2009-09-07  7:45                           ` Olivier Scalbert
  0 siblings, 2 replies; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-04 16:38 UTC (permalink / raw)


Georg Bauhaus wrote on comp.lang.ada:
> Georg Bauhaus schrieb:
>
> > Ludovic Brenta schrieb:
> >> Apparently, passing unconstrained strings to procedure Write involves
> >> allocations on the secondary stack which account for 20% of the entire
> >> execution time. That's hot spot #1.
>
> > Indeed, and this particular hot spot had been cooled down twice:
>
> > Step 1 - we replaced Bounded_String with our own Bounded_String
>
> > Step 2 - we replaced this new Bounded_String with plain
> >  constrained  strings of suitable fixed length (using generics)
>
> I should add that the current program spends much of its time
> in equality comparison of fragment strings,
> and then some in the hash function.
> So not only are the bounded_strings gone;
> Jonathan has also contributed a highly efficient hashing
> function and a cute string equality function.
>
> (As mentioned, to actually see the effects (of the current
> program), String_Fragments."=" should be a renaming of Equals.
> Operator subprograms seem to confuse the profiling programs,
> or am I missing some setting?)

So I gather that Olivier was profiling an old version of the program.
Correct?

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 12:50                   ` Ludovic Brenta
  2009-09-04 16:22                     ` Georg Bauhaus
@ 2009-09-04 17:56                     ` Martin
  2009-09-04 18:26                       ` Georg Bauhaus
  2009-09-04 18:51                       ` Ludovic Brenta
  1 sibling, 2 replies; 116+ messages in thread
From: Martin @ 2009-09-04 17:56 UTC (permalink / raw)


On Sep 4, 1:50 pm, Ludovic Brenta <ludo...@ludovic-brenta.org> wrote:
> Olivier Scalbert wrote on comp.lang.ada:
>
> >http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt
>
> Let's see:
>
> 43,644,857,598  PROGRAM TOTALS
> [...]
>
>             .
> ----------------------------------------------------------------------------
>             .     --
>             .     --  Calculate and write data - either a percentage
> for all fragments found or - when
>             .     --  Nucleotide_Fragment is given - the count for
> that fragment.
>             .     --
>   262,499,981     procedure Write
> 8,251,549,172  => ???:system__secondary_stack__ss_mark (7x)
>             .       (Nucleotide_Length   : in Frequencies;
>             .        Nucleotide_Fragment : in Fragment :=
> Fragments.Null_Bounded_String)
>             .     is
>
> Apparently, passing unconstrained strings to procedure Write involves
> allocations on the secondary stack which account for 20% of the entire
> execution time. That's hot spot #1. I wonder whether it is feasible to
> pass access values to the strings.
>
> --
> Ludovic Brenta.

I'm not familiar with this tool but if 'big numbers' for innocuous
looking code is to be checked then what about:

            .           function Get_Key (E : Element_Access) return
Fragment is
            .           begin
1,395,255,084              return E.all.Key;

Is there some accessibility check going on in this? Could a 'pragma
Assert (E.all /= null);' help?

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 16:38                         ` Ludovic Brenta
@ 2009-09-04 17:58                           ` Georg Bauhaus
  2009-09-04 18:12                             ` Georg Bauhaus
  2009-09-05 20:16                             ` Georg Bauhaus
  2009-09-07  7:45                           ` Olivier Scalbert
  1 sibling, 2 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04 17:58 UTC (permalink / raw)


Ludovic Brenta schrieb:

> So I gather that Olivier was profiling an old version of the program.
> Correct?

Yes.  It likely is the one that has the home made Bounded_String.
But this one does not have the plain String generic yet.
(And a few other improvements.)

I haven't finished merging the new multitasking edition and
all of the new patches yet.  A final draft should
be ready this evening.  (Just drop the file name from the link
then to see a scratch paper style "manifest" if you like.)

Another possible improvement (maybe for regexdna, too) that
I have been thinking about is this:

Currently, the working tasks will themselves print
their results, after having finished all their work.
They do so in the required order with the help
of a protected Printer.  The printer features an
entry family (effectively one member per task). Guarding
is such that tasks can print only one after the other,
in the required order. (Printing is the last small thing
that the tasks perform before they can be terminated.)

The option I have been thinking of is, in the regexdna
case in particular, to allow some "future" work to
be done while the shared variable is collecting
results:  There might be free CPU capacity because, say
on a 4-core, one task hasn't finished, leaving two out of
four cores idling until the next rush of tasks
starts working on the second part of the program, e.g..

Miscellanea:

Does the size of the entry family (18, some not used) matter
much? (Most of the work is done by the tasks undisturbed.)

Should we requeue instead?

For K-Nucleotide all this didn't seem to matter much,
at least on the machines to which we have access:
Dropping the requirement of orderly printing is not
a big win. But who knows.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 17:58                           ` Georg Bauhaus
@ 2009-09-04 18:12                             ` Georg Bauhaus
  2009-09-05 20:16                             ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04 18:12 UTC (permalink / raw)


Georg Bauhaus schrieb:

> The option I have been thinking of is, in the regexdna
> case in particular, to allow some "future" work to
> be done while the shared variable is collecting
> results:  There might be free CPU capacity because, say
> on a 4-core, one task hasn't finished, leaving two out of
> four cores idling until the next rush of tasks
> starts working on the second part of the program, e.g..

Maybe there is another reasonably simple solution to
the "optimal" task:cpu allocation problem. Add priorities
reflecting:

- required printing order (minor issue - printing not much work)

- time needed per task (loads of computation)

We could estimate the latter, more or less I think,
by separating out single tasks that have been assigned
a particular piece of work and then measure how long
they run on average.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 17:56                     ` Martin
@ 2009-09-04 18:26                       ` Georg Bauhaus
  2009-09-04 21:51                         ` Robert A Duff
  2009-09-04 18:51                       ` Ludovic Brenta
  1 sibling, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-04 18:26 UTC (permalink / raw)


Martin schrieb:

>             .           function Get_Key (E : Element_Access) return
> Fragment is
>             .           begin
> 1,395,255,084              return E.all.Key;
> 
> Is there some accessibility check going on in this? Could a 'pragma
> Assert (E.all /= null);' help?

I'll try this.  Thanks. (-gnatp is on; does it suppress
accessibility checks, too?)

Hm. Null excluding subtypes...



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 17:56                     ` Martin
  2009-09-04 18:26                       ` Georg Bauhaus
@ 2009-09-04 18:51                       ` Ludovic Brenta
  1 sibling, 0 replies; 116+ messages in thread
From: Ludovic Brenta @ 2009-09-04 18:51 UTC (permalink / raw)


On Sep 4, 7:56 pm, Martin <martin.do...@btopenworld.com> wrote:
>> Olivier Scalbert wrote on comp.lang.ada:
>
>>>http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt
>
>> Let's see:
>
>> 43,644,857,598  PROGRAM TOTALS
>> [...]
>
>>             .
>> ----------------------------------------------------------------------------
>>             .     --
>>             .     --  Calculate and write data - either a percentage
>> for all fragments found or - when
>>             .     --  Nucleotide_Fragment is given - the count for
>> that fragment.
>>             .     --
>>   262,499,981     procedure Write
>> 8,251,549,172  => ???:system__secondary_stack__ss_mark (7x)
>>             .       (Nucleotide_Length   : in Frequencies;
>>             .        Nucleotide_Fragment : in Fragment :=
>> Fragments.Null_Bounded_String)
>>             .     is

> I'm not familiar with this tool but if 'big numbers' for innocuous
> looking code is to be checked then what about:

Right.  The numbers, big or small, are "ticks" of valgrind's internal
virtual machine, or if you will, number of processor instructions
executed.  Olivier's annotated output hides the smallest numbers and
only shows the larger ones.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 18:26                       ` Georg Bauhaus
@ 2009-09-04 21:51                         ` Robert A Duff
  0 siblings, 0 replies; 116+ messages in thread
From: Robert A Duff @ 2009-09-04 21:51 UTC (permalink / raw)


Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:

> Martin schrieb:
>
>>             .           function Get_Key (E : Element_Access) return
>> Fragment is
>>             .           begin
>> 1,395,255,084              return E.all.Key;
>> 
>> Is there some accessibility check going on in this? Could a 'pragma
>> Assert (E.all /= null);' help?

No, pragma Assert will not help, efficiency-wise.

There is no accessibility check, here.  There's a null check,
which can be suppressed, assuming you "know" it can't fail.

> I'll try this.  Thanks. (-gnatp is on; does it suppress
> accessibility checks, too?)

-gnatp controlls language-defined checks, including the above null
 check.  Pragma Assert is not affected by -gnatp -- pragma Assert
is off by default, and is turned on by -gnata.  If pragma Assert
is turned off, the compiler does not base any optimization decisions on
it -- it's as if it were just erased from the program.

> Hm. Null excluding subtypes...

Yes, I believe adding "not null" will eliminate the above null check in
recent versions of GNAT.  The null check then moves to the call site.
I didn't look at the whole code, but if you're lucky, the call site
will also have "not null" -- the idea is to push the "not null"s as
far as possible, until you get to "new" or "'Access", which obviously
cannot return null.

But if you're using -gnatp, all null checks will be eliminated anyway.

- Bob



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 17:58                           ` Georg Bauhaus
  2009-09-04 18:12                             ` Georg Bauhaus
@ 2009-09-05 20:16                             ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-05 20:16 UTC (permalink / raw)


Georg Bauhaus wrote:
> Ludovic Brenta schrieb:
> 
>> So I gather that Olivier was profiling an old version of the program.
>> Correct?
> 
> Yes.  It likely is the one that has the home made Bounded_String.
> But this one does not have the plain String generic yet.
> (And a few other improvements.)
> 
> I haven't finished merging the new multitasking edition and
> all of the new patches yet.  A final draft should
> be ready this evening.  (Just drop the file name from the link
> then to see a scratch paper style "manifest" if you like.)

Here is the new multicore Knucleotide to be submitted
to the Benchmark Game.  It has all the latest changes.

http://home.arcor.de/bauhaus/Ada/knucleotide.multicore.gnat
http://home.arcor.de/bauhaus/Ada/line_io.ada

(Submitted with the two files suitably merged.)

Measurements, shouts, bug reports, improvements most welcome.

- Georg



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-04 16:38                         ` Ludovic Brenta
  2009-09-04 17:58                           ` Georg Bauhaus
@ 2009-09-07  7:45                           ` Olivier Scalbert
  2009-09-07  9:19                             ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-07  7:45 UTC (permalink / raw)


Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> Georg Bauhaus schrieb:
>>
>>> Ludovic Brenta schrieb:
>>>> Apparently, passing unconstrained strings to procedure Write involves
>>>> allocations on the secondary stack which account for 20% of the entire
>>>> execution time. That's hot spot #1.
>>> Indeed, and this particular hot spot had been cooled down twice:
>>> Step 1 - we replaced Bounded_String with our own Bounded_String
>>> Step 2 - we replaced this new Bounded_String with plain
>>>  constrained  strings of suitable fixed length (using generics)
>> I should add that the current program spends much of its time
>> in equality comparison of fragment strings,
>> and then some in the hash function.
>> So not only are the bounded_strings gone;
>> Jonathan has also contributed a highly efficient hashing
>> function and a cute string equality function.
>>
>> (As mentioned, to actually see the effects (of the current
>> program), String_Fragments."=" should be a renaming of Equals.
>> Operator subprograms seem to confuse the profiling programs,
>> or am I missing some setting?)
> 
> So I gather that Olivier was profiling an old version of the program.
> Correct?
> 
> --
> Ludovic Brenta.

Ooops, sorry for that ...

Today I can provide profile for the last version on:
- 32 bits - Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz - gcc 
version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
- 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 
(Debian 4.3.4-1)

Can it help ?

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07  7:45                           ` Olivier Scalbert
@ 2009-09-07  9:19                             ` Georg Bauhaus
  2009-09-07 13:31                               ` Olivier Scalbert
  0 siblings, 1 reply; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-07  9:19 UTC (permalink / raw)


Olivier Scalbert schrieb:

> Today I can provide profile for the last version on:
> - 32 bits - Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz - gcc
> version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
> - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
> (Debian 4.3.4-1)
> 
> Can it help ?

Yes, as we have few measurements of what happens on 4core and 1core
CPUs, and for GCC 4.3.3.

If you like, arrange the task starts in different order:
placing Work_On_1.Writer.Set (1) first seems to be a must.
The following two (12, 18) have run longest.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07  9:19                             ` Georg Bauhaus
@ 2009-09-07 13:31                               ` Olivier Scalbert
  2009-09-07 14:38                                 ` jonathan
  2009-09-08  0:22                                 ` Georg Bauhaus
  0 siblings, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-07 13:31 UTC (permalink / raw)


Georg Bauhaus wrote:
> Olivier Scalbert schrieb:
> 
>> Today I can provide profile for the last version on:
>> - 32 bits - Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz - gcc
>> version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>> - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
>> (Debian 4.3.4-1)
>>
>> Can it help ?
> 
> Yes, as we have few measurements of what happens on 4core and 1core
> CPUs, and for GCC 4.3.3.
> 
> If you like, arrange the task starts in different order:
> placing Work_On_1.Writer.Set (1) first seems to be a must.
> The following two (12, 18) have run longest.

Here are the results !

On 32 bits - Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz - gcc 
version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
---------------------------------------
$ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
$ time ./knucleotide.gnat_run < fasta/fasta250m.dat
real    0m14.607s
user    0m32.798s
sys     0m0.568s

$ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes 
./knucleotide.gnat_run < fasta/fasta25m.dat

see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.32bits.1

--

On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 
(Debian 4.3.4-1) �
$ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
$ time ./knucleotide.gnat_run < fasta/fasta250m.dat
real    1m10.190s
user    1m9.252s
sys     0m0.724s
$ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes 
./knucleotide.gnat_run < fasta/fasta25m.dat

see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 13:31                               ` Olivier Scalbert
@ 2009-09-07 14:38                                 ` jonathan
  2009-09-07 15:03                                   ` Olivier Scalbert
  2009-09-08  0:22                                 ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: jonathan @ 2009-09-07 14:38 UTC (permalink / raw)


On Sep 7, 2:31 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
wrote:

> On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
> (Debian 4.3.4-1) µ
> $ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
> $ time ./knucleotide.gnat_run < fasta/fasta250m.dat
> real    1m10.190s
> user    1m9.252s
> sys     0m0.724s
> $ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes
> ./knucleotide.gnat_run < fasta/fasta25m.dat
>
> see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1
>
> Olivier

This last test is worrisome.  Are you sharing the machine with other
processes?  Here is what I get when I have 8-cores to myself
(using GNAT 4.3.4 (GPL2009)):

 time ./knucleotide.gnat_run < /tmp/fasta250.dat

 real    0m6.647s
 user    0m17.273s
 sys     0m0.448s

and here is what I get when I share with another (heavy)
user of the machine:

 time ./knucleotide.gnat_run < /tmp/fasta250.dat

 real    0m25.475s
 user    0m24.766s
 sys     0m0.708s


Jonathan



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 14:38                                 ` jonathan
@ 2009-09-07 15:03                                   ` Olivier Scalbert
  2009-09-07 17:31                                     ` jonathan
  2009-09-07 18:07                                     ` Georg Bauhaus
  0 siblings, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-07 15:03 UTC (permalink / raw)


jonathan wrote:

> This last test is worrisome.  Are you sharing the machine with other
> processes?  Here is what I get when I have 8-cores to myself
> (using GNAT 4.3.4 (GPL2009)):
> 
>  time ./knucleotide.gnat_run < /tmp/fasta250.dat
> 
>  real    0m6.647s
>  user    0m17.273s
>  sys     0m0.448s
> 
> and here is what I get when I share with another (heavy)
> user of the machine:
> 
>  time ./knucleotide.gnat_run < /tmp/fasta250.dat
> 
>  real    0m25.475s
>  user    0m24.766s
>  sys     0m0.708s
> 
> 
> Jonathan

My 64 bits machine is an "old" single core Athlon (512KB cache size). So 
perhaps it is normal !

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 15:03                                   ` Olivier Scalbert
@ 2009-09-07 17:31                                     ` jonathan
  2009-09-07 17:54                                       ` Olivier Scalbert
  2009-09-07 18:07                                     ` Georg Bauhaus
  1 sibling, 1 reply; 116+ messages in thread
From: jonathan @ 2009-09-07 17:31 UTC (permalink / raw)


On Sep 7, 4:03 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
wrote:

> My 64 bits machine is an "old" single core Athlon (512KB cache size). So
> perhaps it is normal !
>
> Olivier

My error .. I saw the "64 processor" and thought, ooh, lovely, 64
cores.

I have a machine like yours with similar timings:

real    0m58.509s
user    0m57.928s
sys     0m0.588s

so your results are reassuring.

Jonathan



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 17:31                                     ` jonathan
@ 2009-09-07 17:54                                       ` Olivier Scalbert
  0 siblings, 0 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-09-07 17:54 UTC (permalink / raw)


jonathan wrote:
> 
> My error .. I saw the "64 processor" and thought, ooh, lovely, 64
> cores.
> 
> I have a machine like yours with similar timings:
> 
> real    0m58.509s
> user    0m57.928s
> sys     0m0.588s
> 
> so your results are reassuring.
> 
> Jonathan

64 cores !
Perhaps for Christmas in few years ...
;-)

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 15:03                                   ` Olivier Scalbert
  2009-09-07 17:31                                     ` jonathan
@ 2009-09-07 18:07                                     ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-07 18:07 UTC (permalink / raw)


Olivier Scalbert schrieb:

> My 64 bits machine is an "old" single core Athlon (512KB cache size). So
> perhaps it is normal !

FWIW, I get ~30 sec for the single core version and
~31 sec (� .5 sec) for the multicore version on a VMware
virtual machine: it runs Debian 5.0 and has one VM core
assigned to it, of an Intel(R) Core(TM)2 Duo CPU
E6750  @ 2.66GHz

In another Virtual Box that displays one CPU inside,
but runs on an Intel Core Duo 1300 @ 1.66GHz,
the results are 1m3s (� 3s) versus ~45 - ~55 sec.

Both programs are 32 bit executables.

I don't really know what causes the greater fluctuations within
the runs on the older Core Duo, except for the (obvious?) reasons:
It is a VirtualBox (2.1.4) and runs on a desktop computer...



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-09-07 13:31                               ` Olivier Scalbert
  2009-09-07 14:38                                 ` jonathan
@ 2009-09-08  0:22                                 ` Georg Bauhaus
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-09-08  0:22 UTC (permalink / raw)


Olivier Scalbert wrote:

> Here are the results !

Thanks for the data.  System.Pool_Size.Allocate
is called a lot by Work_On_12 and Work_On_18.
I don't know how to change that.  Still it has led
me to another idea---but that should not delay submitting
Knucleotide any further:

http://home.arcor.de/bauhaus/Ada/knucleotide.multi.gnat

Should I?  Are there any objections?

***

That idea will require that some essential parts be
rewritten, so please ignore it for the moment:

1/ enumerate the characters we are looking at in a type,
2/ pack them into 4 bits each (a representation)
3/ write new hash function...
4/ optionally, do (3) for fragments of length 12 and 18 only

The idea is that the packing, by assumption,
can decrease the number of times System.Pool_Size.Allocate
is called (which seems to involve the threading library
a lot, too).



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
                   ` (7 preceding siblings ...)
  2009-08-27  9:05 ` Martin
@ 2009-10-07 11:44 ` Olivier Scalbert
  2009-10-07 12:04   ` Georg Bauhaus
  2009-10-07 13:22   ` Martin
  8 siblings, 2 replies; 116+ messages in thread
From: Olivier Scalbert @ 2009-10-07 11:44 UTC (permalink / raw)


Georg Bauhaus wrote:
> This is about the K-Nucleotide program at the
> Computer Language Benchmark Game,
> http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&lang=gnat&id=1
> where some Ada programs have started to fail.
> 
> In order to have one of them work again, I have patched
> knucleotide.gnat, with some success.  New version is here:
> http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
> 
> Comments?  Does it work on your machine?
> 
> The two changes:
> 
> 1 - [stack exhaustion] a loop reading input lines into an
>  ubounded  string replaces the recursive String "&"ing
>  procedure. (Noting that the input text file is ~240MB ...)
> 
> 2 - [heavy bounded strings] a lightweight bounded string ADT
>  replaces an instance of Generic_Bounded_Length, yielding
>  an improved running time of ~22s down from ~30s for
>  the 2,500,000 case.
> 
> Still, on a relatively small virtual machine running 64bit
> Debian, the program cannot handle the 25,000,000 case.
> 
> 

Hi all,

I have just go to K-Nucleotide program at the Computer Language 
Benchmark Game.
Ada is running in 28.51 secs (and now C++ in 16.47 ...)
Is it the last version ?

Thanks,

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-10-07 11:44 ` Olivier Scalbert
@ 2009-10-07 12:04   ` Georg Bauhaus
  2009-10-07 13:22   ` Martin
  1 sibling, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-10-07 12:04 UTC (permalink / raw)


Olivier Scalbert schrieb:

> I have just go to K-Nucleotide program at the Computer Language
> Benchmark Game.
> Ada is running in 28.51 secs (and now C++ in 16.47 ...)
> Is it the last version ?

The Ada K-Nucleotide program that is online is the most recent
I'm aware of.  The leading C++ program (C++ at rank #1 - #3,
Ada at #4) seems to try hard to specialize hash functions
and comparison for the data given.  The Ada program's
hash function handles strings of all of the sizes.

Guessing, I think that someone with sufficient interest
in the set of DNA strings in the test _could_ try to spend
some time on producing specialized hash functions (and possibly
comparison functions) for strings of the given lengths.
Notice that the C++ programs are now comparing strings
using sequences of bytes, not characters, in specilizations...
;-)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-10-07 11:44 ` Olivier Scalbert
  2009-10-07 12:04   ` Georg Bauhaus
@ 2009-10-07 13:22   ` Martin
  2009-10-07 14:15     ` Olivier Scalbert
  1 sibling, 1 reply; 116+ messages in thread
From: Martin @ 2009-10-07 13:22 UTC (permalink / raw)


On Oct 7, 12:44 pm, Olivier Scalbert <olivier.scalb...@algosyn.com>
wrote:
> Georg Bauhaus wrote:
> > This is about the K-Nucleotide program at the
> > Computer Language Benchmark Game,
> >http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&...
> > where some Ada programs have started to fail.
>
> > In order to have one of them work again, I have patched
> > knucleotide.gnat, with some success.  New version is here:
> >http://home.arcor.de/bauhaus/Ada/knucleotide.gnat
>
> > Comments?  Does it work on your machine?
>
> > The two changes:
>
> > 1 - [stack exhaustion] a loop reading input lines into an
> >  ubounded  string replaces the recursive String "&"ing
> >  procedure. (Noting that the input text file is ~240MB ...)
>
> > 2 - [heavy bounded strings] a lightweight bounded string ADT
> >  replaces an instance of Generic_Bounded_Length, yielding
> >  an improved running time of ~22s down from ~30s for
> >  the 2,500,000 case.
>
> > Still, on a relatively small virtual machine running 64bit
> > Debian, the program cannot handle the 25,000,000 case.
>
> Hi all,
>
> I have just go to K-Nucleotide program at the Computer Language
> Benchmark Game.
> Ada is running in 28.51 secs (and now C++ in 16.47 ...)
> Is it the last version ?
>
> Thanks,
>
> Olivier

I notice that the CPU usage isn't particularly high - unlike the
fastest solutions...is that significant?

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-10-07 13:22   ` Martin
@ 2009-10-07 14:15     ` Olivier Scalbert
  2009-10-08  9:54       ` Georg Bauhaus
  0 siblings, 1 reply; 116+ messages in thread
From: Olivier Scalbert @ 2009-10-07 14:15 UTC (permalink / raw)


Martin wrote:
> 
> I notice that the CPU usage isn't particularly high - unlike the
> fastest solutions...is that significant?
> 
> Cheers
> -- Martin

And the memory consumption is higher than C++ (254 Mb versus 131 Mb). 
Perhaps it is also significant.

Olivier



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Ada Shootout program for K-Nucleotide (patches)
  2009-10-07 14:15     ` Olivier Scalbert
@ 2009-10-08  9:54       ` Georg Bauhaus
  0 siblings, 0 replies; 116+ messages in thread
From: Georg Bauhaus @ 2009-10-08  9:54 UTC (permalink / raw)


Olivier Scalbert schrieb:
> Martin wrote:
>>
>> I notice that the CPU usage isn't particularly high - unlike the
>> fastest solutions...is that significant?
>>
>> Cheers
>> -- Martin
> 
> And the memory consumption is higher than C++ (254 Mb versus 131 Mb).
> Perhaps it is also significant.

Possibly.  We'd have to know where and which memory
is being used by the Ada program in excess of what
the C++ program is using; just speculating, it could
be the data structure of elements in GNAT's Table.

FWIW, I have tried equality functions for each fragment
length specifically.  This is worth another 5% less time.

I guess it is possible to write tiny hashing functions
for at least small fragments without yet more person
days being wasted on this program.  The C++ programs
offer inspiration  ;-)



^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2009-10-08  9:54 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-01 12:21 Ada Shootout program for K-Nucleotide (patches) Georg Bauhaus
2009-08-01 12:59 ` Ludovic Brenta
2009-08-01 13:59   ` Georg Bauhaus
2009-08-01 13:50 ` Gautier write-only
2009-08-01 15:21 ` Ludovic Brenta
2009-08-01 15:37   ` Gautier write-only
2009-08-02 12:55   ` Georg Bauhaus
2009-08-27 11:17     ` Ole-Hjalmar Kristensen
2009-08-27 12:25       ` Georg Bauhaus
2009-09-01  8:06         ` Ole-Hjalmar Kristensen
2009-09-01  9:29           ` Georg Bauhaus
2009-09-01 10:48             ` Ole-Hjalmar Kristensen
2009-09-01 10:16           ` Ole-Hjalmar Kristensen
2009-09-01 11:34             ` Georg Bauhaus
2009-09-01 12:11               ` Ole-Hjalmar Kristensen
2009-09-01 14:36                 ` Georg Bauhaus
2009-09-03  8:49                   ` Ole-Hjalmar Kristensen
2009-09-02  8:44                 ` Georg Bauhaus
2009-09-02 11:07                   ` Ole-Hjalmar Kristensen
2009-09-03  8:13                   ` Ole-Hjalmar Kristensen
2009-09-03 10:39                     ` Georg Bauhaus
2009-08-03  8:56   ` Jacob Sparre Andersen
2009-08-03 11:43     ` Georg Bauhaus
2009-08-03 12:36       ` Jacob Sparre Andersen
2009-08-03 18:31         ` Isaac Gouy
2009-08-03 20:29           ` Robert A Duff
2009-08-04 15:43             ` Isaac Gouy
2009-08-04 16:06               ` Isaac Gouy
2009-08-04 17:08                 ` Georg Bauhaus
2009-08-05 15:58                   ` Isaac Gouy
2009-08-05 21:18                     ` Georg Bauhaus
2009-08-05 22:04                       ` Ludovic Brenta
2009-08-05 22:31                         ` Georg Bauhaus
2009-08-05 23:44                           ` Jeffrey R. Carter
2009-08-06  8:38                             ` Georg Bauhaus
2009-08-06 21:39                               ` Georg Bauhaus
2009-08-06 22:42                                 ` Jeffrey R. Carter
2009-08-07  8:53                                   ` Georg Bauhaus
2009-08-07  9:21                                     ` Jacob Sparre Andersen
2009-08-07  9:23                                       ` Jacob Sparre Andersen
2009-08-09 19:17                                       ` Georg Bauhaus
2009-08-13 20:56                                 ` jonathan
2009-08-13 21:30                                   ` jonathan
2009-08-13 22:08                                   ` Georg Bauhaus
2009-08-14 15:31                                     ` jonathan
2009-08-06  7:51                           ` Dmitry A. Kazakov
2009-08-06  0:07                       ` Isaac Gouy
2009-08-06  8:36                         ` Georg Bauhaus
2009-08-06  9:09                           ` Dmitry A. Kazakov
2009-08-06 10:34                             ` Georg Bauhaus
2009-08-06 10:49                               ` Dmitry A. Kazakov
2009-08-06 15:21                                 ` Isaac Gouy
2009-08-03 20:47           ` Ludovic Brenta
2009-08-01 17:07 ` jonathan
2009-08-01 17:59 ` jonathan
2009-08-02 11:39   ` Georg Bauhaus
2009-08-02 16:00     ` jonathan
2009-08-02 16:42       ` jonathan
2009-09-03 13:44     ` Olivier Scalbert
2009-09-03 14:08       ` Ludovic Brenta
2009-09-03 15:13         ` Olivier Scalbert
2009-09-03 19:11           ` sjw
2009-09-04  6:11             ` Olivier Scalbert
2009-09-04  8:18               ` Ludovic Brenta
2009-09-04 10:17                 ` Olivier Scalbert
2009-09-04 12:37                   ` Ludovic Brenta
2009-09-04 13:50                     ` Olivier Scalbert
2009-09-04 12:50                   ` Ludovic Brenta
2009-09-04 16:22                     ` Georg Bauhaus
2009-09-04 16:29                       ` Georg Bauhaus
2009-09-04 16:38                         ` Ludovic Brenta
2009-09-04 17:58                           ` Georg Bauhaus
2009-09-04 18:12                             ` Georg Bauhaus
2009-09-05 20:16                             ` Georg Bauhaus
2009-09-07  7:45                           ` Olivier Scalbert
2009-09-07  9:19                             ` Georg Bauhaus
2009-09-07 13:31                               ` Olivier Scalbert
2009-09-07 14:38                                 ` jonathan
2009-09-07 15:03                                   ` Olivier Scalbert
2009-09-07 17:31                                     ` jonathan
2009-09-07 17:54                                       ` Olivier Scalbert
2009-09-07 18:07                                     ` Georg Bauhaus
2009-09-08  0:22                                 ` Georg Bauhaus
2009-09-04 17:56                     ` Martin
2009-09-04 18:26                       ` Georg Bauhaus
2009-09-04 21:51                         ` Robert A Duff
2009-09-04 18:51                       ` Ludovic Brenta
2009-09-04  9:27               ` Georg Bauhaus
2009-09-03 15:28       ` Georg Bauhaus
2009-09-04  1:24         ` Georg Bauhaus
2009-08-04  8:23 ` Martin
2009-08-16 15:20 ` jonathan
2009-08-17 15:46   ` Georg Bauhaus
2009-08-18 16:59     ` jonathan
2009-08-19 14:52       ` Georg Bauhaus
2009-08-19 16:40         ` Martin
2009-08-19 18:24           ` Robert A Duff
2009-08-19 22:15             ` jonathan
2009-08-19 23:50               ` Georg Bauhaus
2009-08-20 19:47                 ` jonathan
2009-08-20 22:15                   ` Georg Bauhaus
2009-08-21 21:43                     ` jonathan
2009-08-22 22:35                       ` Georg Bauhaus
2009-08-23 22:21                         ` jonathan
2009-08-20 19:55                 ` jonathan
2009-08-19 21:37           ` Georg Bauhaus
2009-08-19 19:22         ` jonathan
2009-08-19 19:27         ` jonathan
2009-08-27  9:05 ` Martin
2009-08-27  9:08   ` Martin
2009-08-27 10:01     ` Georg Bauhaus
2009-10-07 11:44 ` Olivier Scalbert
2009-10-07 12:04   ` Georg Bauhaus
2009-10-07 13:22   ` Martin
2009-10-07 14:15     ` Olivier Scalbert
2009-10-08  9:54       ` Georg Bauhaus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox