Tasking for Mandelbrot program

comp.lang.ada
 help / color / mirror / Atom feed

* Tasking for Mandelbrot program
@ 2009-09-27  1:08 Georg Bauhaus
  2009-09-27 11:24 ` Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-27  1:08 UTC (permalink / raw)


A Mandelbrot program has been submitted to the language
Shootout by Jim Rogers, Pascal Obry, and
Gautier de Montmollin.  Given the no tricks approach
that it shares with other entries, it is performing
well, I'd think.  But, it is sequential.

The patch below adds tasking.

A few notes on patch:
The computation is split into parts, one per task;
many entries seem to use a similar approach.
The number of tasks is set high, task switching does
not seem to matter much in this program.
The alternative of making the environment task perform
the same subprogram as the tasks concurrently
and matching the number of tasks to the number of cores
was not significantly faster.  The alternative
spoils the simplicity of the main block, so I set
a higher number of tasks and got basically the same
performance.

Maybe these are possible improvements:

- Ada.Streams.Stream_IO? (In total absence of Text_IO,
  GNAT.IO performs well.)

- adjust compilation options to match the FPT hardware
  (type Real is now digits 16 because of this, cf.
  the spectral-norm entry at the Shootout)

Comments?  Can I ask what the authors think?


pragma Restrictions (No_Abort_Statements);
pragma Restrictions (Max_Asynchronous_Select_Nesting => 0);

with Ada.Command_Line; use Ada.Command_Line;
with Ada.Numerics.Generic_Complex_Types;

with Interfaces;       use Interfaces;
with GNAT.IO;          use GNAT.IO;


procedure Mandelbrot is
   type Real is digits 16;
   package R_Complex is new Ada.Numerics.Generic_Complex_Types (Real);
   use R_Complex;
   Iter           : constant := 50;
   Limit          : constant := 4.0;
   Size           : Positive;
   Zero           : constant Complex := (0.0, 0.0);
   Two_on_Size    : Real;

   subtype Output_Queue is String;
   type Output is access Output_Queue;

   task type X_Step is
      entry Compute_Z (Y1, Y2 : Natural);
      entry Get_Output (Result : out Output; Last : out Natural);
   end X_Step;

   procedure Allocate_Output_Queue (Y1, Y2 : Natural; Result : out Output);

   procedure Compute
     (Y1, Y2 : Natural; Result : Output; Last : out Natural)
   is
      Bit_Num     : Natural    := 0;
      Byte_Acc    : Unsigned_8 := 0;
      Z, C        : Complex;
   begin
      Last := 0;
      for Y in Y1 .. Y2 - 1 loop
         for X in 0 .. Size - 1 loop
            Z := Zero;
            C := Two_on_Size * (Real (X), Real (Y)) - (1.5, 1.0);

            declare
               Z2 : Complex;
            begin
               for I in 1 .. Iter + 1 loop
                  Z2 := (Z.re ** 2, Z.im ** 2);
                  Z  := (Z2.re - Z2.im, 2.0 * Z.re * Z.im) + C;
                  exit when Z2.re + Z2.im > Limit;
               end loop;

               if Z2.re + Z2.im > Limit then
                  Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#;
               else
                  Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#01#;
               end if;
            end;

            Bit_Num := Bit_Num + 1;

            if Bit_Num = 8 then
               Last := Last + 1;
               Result (Last) := Character'Val (Byte_Acc);
               Byte_Acc := 0;
               Bit_Num  := 0;
            elsif X = Size - 1 then
               Byte_Acc := Shift_Left (Byte_Acc, 8 - (Size mod 8));
               Last := Last + 1;
               Result (Last) := Character'Val (Byte_Acc);
               Byte_Acc := 0;
               Bit_Num  := 0;
            end if;
         end loop;
      end loop;
   end Compute;

   task body X_Step is
      Data        : Output;
      Pos         : Natural;
      Y1, Y2      : Natural;
   begin
      accept Compute_Z (Y1, Y2 : Natural) do
         X_Step.Y1 := Y1;
         X_Step.Y2 := Y2;
      end Compute_Z;

      Allocate_Output_Queue (Y1, Y2, Data);
      Compute (Y1, Y2, Data, Pos);

      accept Get_Output (Result : out Output; Last : out Natural) do
         Result := Data;
         Last := Pos;
      end Get_Output;
   end X_Step;

   procedure Allocate_Output_Queue (Y1, Y2 : Natural; Result : out
Output) is
   begin
      Result := new Output_Queue (1 .. (Y2 - Y1 + 8) * Size / 8);
   end Allocate_Output_Queue;


begin
   Size := Positive'Value (Argument (1));
   Two_on_Size := 2.0 / Real (Size);

   Put_Line ("P4");
   Put_Line (Argument (1) & " " & Argument (1));

   declare
      No_Of_Workers : constant := 16;
      Chunk_Size    : constant Positive :=
        (Size + No_Of_Workers) / No_Of_Workers;
      Pool          : array (0 .. No_Of_Workers) of X_Step;
      pragma          Assert (Pool'Length * Chunk_Size >= Size);
      Buffer        : Output;
      Last          : Natural;
   begin
      pragma Assert (Pool'First = 0);

      for P in Pool'Range loop
         Pool (P).Compute_Z
           (Y1 => P * Chunk_Size,
            Y2 => Positive'Min ((P + 1) * Chunk_Size, Size));
      end loop;

      for P in Pool'Range loop
         Pool (P).Get_Output (Buffer, Last);
         Put (Buffer (Buffer'First .. Last));
      end loop;
   end;

end Mandelbrot;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-27  1:08 Tasking for Mandelbrot program Georg Bauhaus
@ 2009-09-27 11:24 ` Martin
  2009-09-27 21:27   ` Georg Bauhaus
  0 siblings, 1 reply; 13+ messages in thread
From: Martin @ 2009-09-27 11:24 UTC (permalink / raw)


On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
[snip]
>                   Z2 := (Z.re ** 2, Z.im ** 2);
[snip]

Don't know about the rest of the program but on some platforms I've
found it to be much faster to replace "** 2" with a straight
multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".

Some "**" functions actually look for the "** 2" and just use a
straight multiplication - doing it explicitly removes the function
call and the check.

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-27 11:24 ` Martin
@ 2009-09-27 21:27   ` Georg Bauhaus
  2009-09-28  5:48     ` Martin
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-27 21:27 UTC (permalink / raw)


Martin wrote:
> On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
> [snip]
>>                   Z2 := (Z.re ** 2, Z.im ** 2);
> [snip]
> 
> Don't know about the rest of the program but on some platforms I've
> found it to be much faster to replace "** 2" with a straight
> multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".

I have just checked.  The code that gcc is generating looks
the same for ** 2 and for * Z.xx.  Irrespective of either
sse instructions or i387 instructions.

Some more observations:

- SSE2 code performs 8% faster when suitable compilation options
  are present, -mfpmath=sse -msse2 (this is currently the case).
  Then digits 15 should probably stay in the declaration of Real.

- writing the image bytes with Stream_IO removes 6% running time
  when compared to GNAT.IO.Put.
  This adds standard Ada but also adds about 10 lines of code for
  the Put procedure and a Stdout variable. Is it worth it?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-27 21:27   ` Georg Bauhaus
@ 2009-09-28  5:48     ` Martin
  2009-09-28 19:27     ` jonathan
  2009-09-28 19:52     ` jonathan
  2 siblings, 0 replies; 13+ messages in thread
From: Martin @ 2009-09-28  5:48 UTC (permalink / raw)


On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Martin wrote:
> > On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
> > bug.bauh...@maps.futureapps.de> wrote:
> > [snip]
> >>                   Z2 := (Z.re ** 2, Z.im ** 2);
> > [snip]
>
> > Don't know about the rest of the program but on some platforms I've
> > found it to be much faster to replace "** 2" with a straight
> > multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".
>
> I have just checked.  The code that gcc is generating looks
> the same for ** 2 and for * Z.xx.  Irrespective of either
> sse instructions or i387 instructions.

Well done that compiler! I guess with inlining and decent
optimisations a "**" would collapse down to the sensible 'optimal'
code.


> Some more observations:
>
> - SSE2 code performs 8% faster when suitable compilation options
>   are present, -mfpmath=sse -msse2 (this is currently the case).
>   Then digits 15 should probably stay in the declaration of Real.
>
> - writing the image bytes with Stream_IO removes 6% running time
>   when compared to GNAT.IO.Put.
>   This adds standard Ada but also adds about 10 lines of code for
>   the Put procedure and a Stdout variable. Is it worth it?

Yes!!!

I'm not sure there much in the way 'bragging rights' to be had from
the SLOC count metrics on the shootout - only raw speed is sexy :-)

Cheers
-- Martin



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-27 21:27   ` Georg Bauhaus
  2009-09-28  5:48     ` Martin
@ 2009-09-28 19:27     ` jonathan
  2009-09-29 15:26       ` Georg Bauhaus
  2009-09-28 19:52     ` jonathan
  2 siblings, 1 reply; 13+ messages in thread
From: jonathan @ 2009-09-28 19:27 UTC (permalink / raw)

On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Some more observations:
>
> - SSE2 code performs 8% faster when suitable compilation options
>   are present, -mfpmath=sse -msse2 (this is currently the case).
>   Then digits 15 should probably stay in the declaration of Real.
>

Some more notes on this puzzle ...

As far as I can tell, using

  type Real is digits 16;

is like using   -mfpmath=387   on the command line during
compilation.   -mfpmath=sse -msse2   is usually (always?)
the default.  Amazingly, you can now use -mfpmath=387,sse.
(When I try -mfpmath=387,sse, it usually makes things worse,
but not always.)

With mandelbrot.adb, I find   -mfpmath=387  (or digits 16)
the faster option.  It may just be an accident of my
machine+compiler combination.

My timings are all on Intel processors.  On AMD processors
I would not be surprised if digits 15 is the faster.

Here are some timings of mandelbrot.adb, using 1 worker
task, and

   gnatmake -O3 -gnatnp mandelbrot.adb

 (same as: gnatmake -O2 -gnatp  mandelbrot.adb)

On a fairly new PC, single core, with
gnat 4.3.4 or 4.3.2, xeon X5460 3.16GHz:

  digits 16:

   real    0m34.871s
   user    0m34.446s
   sys     0m0.068s

  digits 15 (with  -mfpmath=sse -msse2):

   real    0m43.657s
   user    0m43.247s
   sys     0m0.056s

On an old PC, single core:
gnat 4.3.2, xenon 2.8 GHz

  digits 16:

   real    1m31.885s
   user    1m31.210s
   sys     0m0.224s

  digits 15 (with  -mfpmath=sse -msse2):

   real    1m42.453s
   user    1m41.706s
   sys     0m0.184s

As mentioned in an earlier post, on spectralnorm.adb
(another one of these benchmarks at the shootout site),
"digits 16" was the faster choice.  It was a lot faster
than "digits 15" on my 2 PC's. On the test machine it was
faster, but by a smaller margin.  But  spectralnorm.adb
may not predict mandelbrot.adb very well.

Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-27 21:27   ` Georg Bauhaus
  2009-09-28  5:48     ` Martin
  2009-09-28 19:27     ` jonathan
@ 2009-09-28 19:52     ` jonathan
  2009-10-12 16:58       ` Georg Bauhaus
  2 siblings, 1 reply; 13+ messages in thread
From: jonathan @ 2009-09-28 19:52 UTC (permalink / raw)

On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:

> - writing the image bytes with Stream_IO removes 6% running time
>   when compared to GNAT.IO.Put.
>   This adds standard Ada but also adds about 10 lines of code for
>   the Put procedure and a Stdout variable. Is it worth it?

Speeding up IO would give you a very detectable improvement
in the multi-core benchmark, since the present program
parallelizes the computation well, and the remaining
problem is a small but irritating IO overhead that can't
be parallelized.

Here are a few timings on 8 cores.
Perfect parallelization would give speed-up factor = 8.

With Output enabled:

  No_Of_Workers (tasks) =  8, speed-up factor = 4.45
  No_Of_Workers (tasks) = 16, speed-up factor = 6.30
  No_Of_Workers (tasks) = 24, speed-up factor = 6.66
  No_Of_Workers (tasks) = 32, speed-up factor = 6.86

With Output disabled, it is nearer the optimal
speed-up factor of 8:

  No_Of_Workers (tasks) = 32, speed-up factor = 7.66

The actual benchmark uses 4 cores, so I suspect that the
present standard setting of No_Of_Workers = 16 is
good.

For those who are interested in this problem as
much as I am, a few more words of explanation ...
The difficulty  with mandelbrot is that if you
parallelize it by breaking it up into
work-segments (break up the outer loop into
segments of equal length), then some work-segments
finish quick, some slow, so we have a load balancing
problem.  The solution Georg came up with breaks
the problem into a number of independent tasks four
time greater in number than the number of cores.
The operating system successfully distributes the
tasks over the cores in such a way that the cores
do comparable amounts of work.
(Hope my description is accurate.)

Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-28 19:27     ` jonathan
@ 2009-09-29 15:26       ` Georg Bauhaus
  0 siblings, 0 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-29 15:26 UTC (permalink / raw)

jonathan wrote:
> On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
>> Some more observations:
>>
>> - SSE2 code performs 8% faster when suitable compilation options
>>   are present, -mfpmath=sse -msse2 (this is currently the case).
>>   Then digits 15 should probably stay in the declaration of Real.
>>
> 
> Some more notes on this puzzle ...

Indeed...  I might well have lost track in the
labyrinth of settings, but seeing, as I do, a factor near 2
in Mandelbrot speed when switching from 15 to 16 digits
(or back) looks odd, especially when the combinations
alluded to below do not appear to be similarly far
from each other.

Next thing I'll do is work through the exponential
table of options and FPT definitions and CPUs
and compilers and OSs and ... carefully measuring
each cell. For now, here is a little test setup that does some
of this.
If you like, unpack in a fresh directory and type "make".
This will compile and run a few FPT related combinations
taking the core loop of Mandelbrot as an example.
(On Windows, type "make all-not-native" , after switching
three OS-related variables near the head of the Makefile.)

http://home.arcor.de/bauhaus/Ada/test1516fpt.zip

I will be short of time during the next few
days, and maybe off line.

FTR, as an experiment I have tried to
  pragma Import(Intrinsic, MULPD, "__builtin_ia32_mulpd")
i.e. GCC builtins for SIMD multiplication etc, like the leading
programs do.  Formally, this appears to be working, but
the compiler finally spit a bug box.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-09-28 19:52     ` jonathan
@ 2009-10-12 16:58       ` Georg Bauhaus
  2009-10-12 22:46         ` jonathan
  2009-10-13  9:11         ` Mark Lorenzen
  0 siblings, 2 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-10-12 16:58 UTC (permalink / raw)


Lo and behold!  An Ada program is at #1 in two lists
at the Shootout site.  Look for mandelbrot, the 64
bit rankings.  Enjoy the moment while it lasts. :-)

The high speed is largely due to the new inner loop,
composed by Jonathan Parker.
I have learned from it the possibility of setting
up variables for a computation, tailored to the
CPU's facilities, so that the compiler can
effectively distribute calculations to SSE registers.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-10-12 16:58       ` Georg Bauhaus
@ 2009-10-12 22:46         ` jonathan
  2009-10-12 23:42           ` Anh Vo
  2009-10-13  9:11         ` Mark Lorenzen
  1 sibling, 1 reply; 13+ messages in thread
From: jonathan @ 2009-10-12 22:46 UTC (permalink / raw)

On Oct 12, 5:58 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> Lo and behold!  An Ada program is at #1 in two lists
> at the Shootout site.  Look for mandelbrot, the 64
> bit rankings.  Enjoy the moment while it lasts. :-)
>

Here is the address:

http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&lang=all

> The high speed is largely due to the new inner loop,
> composed by Jonathan Parker.

Well, I figured it out by reading the C program;) The C and C++
programs use INTEL sse2 intrinsics to get the best
performance out of the sse2 floating point units.
The INTEL sse2 intrinsics are IIUC a convenient interface
to intel sse2 assembly language:
http://msdn.microsoft.com/en-us/library/kcwz153a(VS.71).aspx

The result we can be proud of is the finding that GNAT/gcc
is just smart enough to produce optimal code without the
use of INTEL sse2 intrinsics, at least on the 64 bit machine.
The key to getting the best performance out of
the SSE hardware was presenting it
with 2 identical streams of instructions, each of which
starts with different initial conditions. It can
be done in high-level language as well as
the sse2 intrinsics.

The challenge in this part of the shootout was
non-trivial:  to get the mandelbrot calculation to
exploit all 4 cores of the test machine in parallel, (a
load-balancing/distributed-processing problem I mentioned
earlier in this thread) and to simultaneously put the
SSE floating point units to best use. The 4 programs
that did the best at this used:
C   + pthreads + INTEL sse2 intrinsics,
C++ + OpenMP   + INTEL sse2 intrinsics,
ATS + pthreads(?) + INTEL sse2 intrinsics,
and
Ada + Ada + Ada.

Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-10-12 22:46         ` jonathan
@ 2009-10-12 23:42           ` Anh Vo
  0 siblings, 0 replies; 13+ messages in thread
From: Anh Vo @ 2009-10-12 23:42 UTC (permalink / raw)


On Oct 12, 3:46 pm, jonathan <johns...@googlemail.com> wrote:
> On Oct 12, 5:58 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
>
> > Lo and behold!  An Ada program is at #1 in two lists
> > at the Shootout site.  Look for mandelbrot, the 64
> > bit rankings.  Enjoy the moment while it lasts. :-)
>
> Here is the address:
>
> http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&...
>
> > The high speed is largely due to the new inner loop,
> > composed by Jonathan Parker.
>
> Well, I figured it out by reading the C program;) The C and C++
> programs use INTEL sse2 intrinsics to get the best
> performance out of the sse2 floating point units.
> The INTEL sse2 intrinsics are IIUC a convenient interface
> to intel sse2 assembly language:http://msdn.microsoft.com/en-us/library/kcwz153a(VS.71).aspx
>
> The result we can be proud of is the finding that GNAT/gcc
> is just smart enough to produce optimal code without the
> use of INTEL sse2 intrinsics, at least on the 64 bit machine.
> The key to getting the best performance out of
> the SSE hardware was presenting it
> with 2 identical streams of instructions, each of which
> starts with different initial conditions. It can
> be done in high-level language as well as
> the sse2 intrinsics.
>
> The challenge in this part of the shootout was
> non-trivial:  to get the mandelbrot calculation to
> exploit all 4 cores of the test machine in parallel, (a
> load-balancing/distributed-processing problem I mentioned
> earlier in this thread) and to simultaneously put the
> SSE floating point units to best use. The 4 programs
> that did the best at this used:
> C   + pthreads + INTEL sse2 intrinsics,
> C++ + OpenMP   + INTEL sse2 intrinsics,
> ATS + pthreads(?) + INTEL sse2 intrinsics,
> and
> Ada + Ada + Ada.

It is impressive. I feel proud that Ada comes in first. So, Ada foes
have no more execuses to say Ada is slow. Great job Jonathan.

Anh Vo




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-10-12 16:58       ` Georg Bauhaus
  2009-10-12 22:46         ` jonathan
@ 2009-10-13  9:11         ` Mark Lorenzen
  2009-10-13  9:39           ` Gautier write-only
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Lorenzen @ 2009-10-13  9:11 UTC (permalink / raw)


On 12 Okt., 18:58, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> Lo and behold!  An Ada program is at #1 in two lists
> at the Shootout site.  Look for mandelbrot, the 64
> bit rankings.  Enjoy the moment while it lasts. :-)

Great!

Please note that the "bit-wise or" operation in the following line has
no effect:
     Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#

When shifting left, you are guaranteed that zeroes are shifted in.

I have no idea if this extremely small optimization actually has any
impact on the performance though...

- Mark L



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-10-13  9:11         ` Mark Lorenzen
@ 2009-10-13  9:39           ` Gautier write-only
  2009-10-13 12:57             ` Georg Bauhaus
  0 siblings, 1 reply; 13+ messages in thread
From: Gautier write-only @ 2009-10-13  9:39 UTC (permalink / raw)


On 13 Okt., 11:11, Mark Lorenzen <mark.loren...@gmail.com> wrote:

> Please note that the "bit-wise or" operation in the following line has
> no effect:
>      Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#
>
> When shifting left, you are guaranteed that zeroes are shifted in.

Well "or 0" has no effect anyway (shift or not shift). If you wanted
to filter something, you would write "and 16#FFFFFF00# :-)

Gautier



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tasking for Mandelbrot program
  2009-10-13  9:39           ` Gautier write-only
@ 2009-10-13 12:57             ` Georg Bauhaus
  0 siblings, 0 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-10-13 12:57 UTC (permalink / raw)


Gautier write-only schrieb:
> On 13 Okt., 11:11, Mark Lorenzen <mark.loren...@gmail.com> wrote:
> 
>> Please note that the "bit-wise or" operation in the following line has
>> no effect:
>>      Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#
>>
>> When shifting left, you are guaranteed that zeroes are shifted in.
> 
> Well "or 0" has no effect anyway (shift or not shift). If you wanted
> to filter something, you would write "and 16#FFFFFF00# :-)

I'm stating the obvious when saying that the compiler
knows that "or 16#00#" has no effect... ;-)  FWIW, the
simple symmetric if/else around the above line seems
to give the fastest program with this distinction,
at least when produced by the Shootout compiler;
with GNATs such as GPL GNAT we might be able to reduce source
size by using Boolean'Pos instead of if/else or even anonymous
conditional expressions as are proposed for Ada 1Y.

Compile  with -gnatX if using GNAT, to see timing differences,
if any:


with Interfaces; use Interfaces;
with Ada.Command_Line ; use Ada.Command_Line;
with Ada.Calendar;  use Ada.Calendar;
with Ada.Text_IO;  use Ada.Text_IO;

procedure Bittest is

   Byte_Acc : Unsigned_8;
   Ntests : constant := 100;

   procedure Work (This_Way: Boolean) is
      pragma Inline (Work);
   begin
      for K in 1 .. Ntests loop
	 Byte_Acc := Shift_Left (Byte_Acc, 1)
                       or
                     Boolean'Pos(not This_Way);
      end loop;
   end work;

   procedure Work2 (This_Way: Boolean) is  -- original
      pragma Inline (Work2);
   begin
      for K in 1 .. Ntests loop
	 if This_Way then
	    Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#;
	 else
	    Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#01#;
	 end if;
      end loop;
   end Work2;

   procedure WorkX (This_Way: Boolean) is
      pragma Inline (WorkX);
   begin
      for K in 1 .. Ntests loop
	 Byte_Acc := Shift_Left (Byte_Acc, 1) or (if This_Way
						    then 16#00#
						    else 16#01#);
      end loop;
   end WorkX;

   Start, Finish: Time;
begin
   Byte_Acc := Boolean'Pos(Argument_Count > 1);

   Start := Clock;
   for K in 1 .. 5_000_000 loop
      Work (Argument(1) = "yes");
   end loop;
   Finish := Clock;
   Put_Line ("Work: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
	       " in " & Duration'Image (Finish - Start) & " seconds");

   Byte_Acc := Boolean'Pos(Argument_Count > 1);

   Start := Clock;
   for K in 1 .. 5_000_000 loop
      Work2 (Argument(1) = "yes");
   end loop;
   Finish := Clock;
   Put_Line ("Work2: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
	       " in " & Duration'Image (Finish - Start) & " seconds");

   Byte_Acc := Boolean'Pos(Argument_Count > 1);

   Start := Clock;
   for K in 1 .. 5_000_000 loop
      WorkX (Argument(1) = "yes");
   end loop;
   Finish := Clock;
   Put_Line ("WorkX: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
	       " in " & Duration'Image (Finish - Start) & " seconds");

end Bittest;



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-10-13 12:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-27  1:08 Tasking for Mandelbrot program Georg Bauhaus
2009-09-27 11:24 ` Martin
2009-09-27 21:27   ` Georg Bauhaus
2009-09-28  5:48     ` Martin
2009-09-28 19:27     ` jonathan
2009-09-29 15:26       ` Georg Bauhaus
2009-09-28 19:52     ` jonathan
2009-10-12 16:58       ` Georg Bauhaus
2009-10-12 22:46         ` jonathan
2009-10-12 23:42           ` Anh Vo
2009-10-13  9:11         ` Mark Lorenzen
2009-10-13  9:39           ` Gautier write-only
2009-10-13 12:57             ` Georg Bauhaus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox