* Tasking for Mandelbrot program
@ 2009-09-27 1:08 Georg Bauhaus
2009-09-27 11:24 ` Martin
0 siblings, 1 reply; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-27 1:08 UTC (permalink / raw)
A Mandelbrot program has been submitted to the language
Shootout by Jim Rogers, Pascal Obry, and
Gautier de Montmollin. Given the no tricks approach
that it shares with other entries, it is performing
well, I'd think. But, it is sequential.
The patch below adds tasking.
A few notes on patch:
The computation is split into parts, one per task;
many entries seem to use a similar approach.
The number of tasks is set high, task switching does
not seem to matter much in this program.
The alternative of making the environment task perform
the same subprogram as the tasks concurrently
and matching the number of tasks to the number of cores
was not significantly faster. The alternative
spoils the simplicity of the main block, so I set
a higher number of tasks and got basically the same
performance.
Maybe these are possible improvements:
- Ada.Streams.Stream_IO? (In total absence of Text_IO,
GNAT.IO performs well.)
- adjust compilation options to match the FPT hardware
(type Real is now digits 16 because of this, cf.
the spectral-norm entry at the Shootout)
Comments? Can I ask what the authors think?
pragma Restrictions (No_Abort_Statements);
pragma Restrictions (Max_Asynchronous_Select_Nesting => 0);
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Numerics.Generic_Complex_Types;
with Interfaces; use Interfaces;
with GNAT.IO; use GNAT.IO;
procedure Mandelbrot is
type Real is digits 16;
package R_Complex is new Ada.Numerics.Generic_Complex_Types (Real);
use R_Complex;
Iter : constant := 50;
Limit : constant := 4.0;
Size : Positive;
Zero : constant Complex := (0.0, 0.0);
Two_on_Size : Real;
subtype Output_Queue is String;
type Output is access Output_Queue;
task type X_Step is
entry Compute_Z (Y1, Y2 : Natural);
entry Get_Output (Result : out Output; Last : out Natural);
end X_Step;
procedure Allocate_Output_Queue (Y1, Y2 : Natural; Result : out Output);
procedure Compute
(Y1, Y2 : Natural; Result : Output; Last : out Natural)
is
Bit_Num : Natural := 0;
Byte_Acc : Unsigned_8 := 0;
Z, C : Complex;
begin
Last := 0;
for Y in Y1 .. Y2 - 1 loop
for X in 0 .. Size - 1 loop
Z := Zero;
C := Two_on_Size * (Real (X), Real (Y)) - (1.5, 1.0);
declare
Z2 : Complex;
begin
for I in 1 .. Iter + 1 loop
Z2 := (Z.re ** 2, Z.im ** 2);
Z := (Z2.re - Z2.im, 2.0 * Z.re * Z.im) + C;
exit when Z2.re + Z2.im > Limit;
end loop;
if Z2.re + Z2.im > Limit then
Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#;
else
Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#01#;
end if;
end;
Bit_Num := Bit_Num + 1;
if Bit_Num = 8 then
Last := Last + 1;
Result (Last) := Character'Val (Byte_Acc);
Byte_Acc := 0;
Bit_Num := 0;
elsif X = Size - 1 then
Byte_Acc := Shift_Left (Byte_Acc, 8 - (Size mod 8));
Last := Last + 1;
Result (Last) := Character'Val (Byte_Acc);
Byte_Acc := 0;
Bit_Num := 0;
end if;
end loop;
end loop;
end Compute;
task body X_Step is
Data : Output;
Pos : Natural;
Y1, Y2 : Natural;
begin
accept Compute_Z (Y1, Y2 : Natural) do
X_Step.Y1 := Y1;
X_Step.Y2 := Y2;
end Compute_Z;
Allocate_Output_Queue (Y1, Y2, Data);
Compute (Y1, Y2, Data, Pos);
accept Get_Output (Result : out Output; Last : out Natural) do
Result := Data;
Last := Pos;
end Get_Output;
end X_Step;
procedure Allocate_Output_Queue (Y1, Y2 : Natural; Result : out
Output) is
begin
Result := new Output_Queue (1 .. (Y2 - Y1 + 8) * Size / 8);
end Allocate_Output_Queue;
begin
Size := Positive'Value (Argument (1));
Two_on_Size := 2.0 / Real (Size);
Put_Line ("P4");
Put_Line (Argument (1) & " " & Argument (1));
declare
No_Of_Workers : constant := 16;
Chunk_Size : constant Positive :=
(Size + No_Of_Workers) / No_Of_Workers;
Pool : array (0 .. No_Of_Workers) of X_Step;
pragma Assert (Pool'Length * Chunk_Size >= Size);
Buffer : Output;
Last : Natural;
begin
pragma Assert (Pool'First = 0);
for P in Pool'Range loop
Pool (P).Compute_Z
(Y1 => P * Chunk_Size,
Y2 => Positive'Min ((P + 1) * Chunk_Size, Size));
end loop;
for P in Pool'Range loop
Pool (P).Get_Output (Buffer, Last);
Put (Buffer (Buffer'First .. Last));
end loop;
end;
end Mandelbrot;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-27 1:08 Tasking for Mandelbrot program Georg Bauhaus
@ 2009-09-27 11:24 ` Martin
2009-09-27 21:27 ` Georg Bauhaus
0 siblings, 1 reply; 13+ messages in thread
From: Martin @ 2009-09-27 11:24 UTC (permalink / raw)
On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
[snip]
> Z2 := (Z.re ** 2, Z.im ** 2);
[snip]
Don't know about the rest of the program but on some platforms I've
found it to be much faster to replace "** 2" with a straight
multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".
Some "**" functions actually look for the "** 2" and just use a
straight multiplication - doing it explicitly removes the function
call and the check.
Cheers
-- Martin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-27 11:24 ` Martin
@ 2009-09-27 21:27 ` Georg Bauhaus
2009-09-28 5:48 ` Martin
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-27 21:27 UTC (permalink / raw)
Martin wrote:
> On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
> [snip]
>> Z2 := (Z.re ** 2, Z.im ** 2);
> [snip]
>
> Don't know about the rest of the program but on some platforms I've
> found it to be much faster to replace "** 2" with a straight
> multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".
I have just checked. The code that gcc is generating looks
the same for ** 2 and for * Z.xx. Irrespective of either
sse instructions or i387 instructions.
Some more observations:
- SSE2 code performs 8% faster when suitable compilation options
are present, -mfpmath=sse -msse2 (this is currently the case).
Then digits 15 should probably stay in the declaration of Real.
- writing the image bytes with Stream_IO removes 6% running time
when compared to GNAT.IO.Put.
This adds standard Ada but also adds about 10 lines of code for
the Put procedure and a Stdout variable. Is it worth it?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-27 21:27 ` Georg Bauhaus
@ 2009-09-28 5:48 ` Martin
2009-09-28 19:27 ` jonathan
2009-09-28 19:52 ` jonathan
2 siblings, 0 replies; 13+ messages in thread
From: Martin @ 2009-09-28 5:48 UTC (permalink / raw)
On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Martin wrote:
> > On Sep 27, 2:08 am, Georg Bauhaus <rm.tsoh.plus-
> > bug.bauh...@maps.futureapps.de> wrote:
> > [snip]
> >> Z2 := (Z.re ** 2, Z.im ** 2);
> > [snip]
>
> > Don't know about the rest of the program but on some platforms I've
> > found it to be much faster to replace "** 2" with a straight
> > multiplication, e.g. "Z.re * Z.re, Z.im * Z.im".
>
> I have just checked. The code that gcc is generating looks
> the same for ** 2 and for * Z.xx. Irrespective of either
> sse instructions or i387 instructions.
Well done that compiler! I guess with inlining and decent
optimisations a "**" would collapse down to the sensible 'optimal'
code.
> Some more observations:
>
> - SSE2 code performs 8% faster when suitable compilation options
> are present, -mfpmath=sse -msse2 (this is currently the case).
> Then digits 15 should probably stay in the declaration of Real.
>
> - writing the image bytes with Stream_IO removes 6% running time
> when compared to GNAT.IO.Put.
> This adds standard Ada but also adds about 10 lines of code for
> the Put procedure and a Stdout variable. Is it worth it?
Yes!!!
I'm not sure there much in the way 'bragging rights' to be had from
the SLOC count metrics on the shootout - only raw speed is sexy :-)
Cheers
-- Martin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-27 21:27 ` Georg Bauhaus
2009-09-28 5:48 ` Martin
@ 2009-09-28 19:27 ` jonathan
2009-09-29 15:26 ` Georg Bauhaus
2009-09-28 19:52 ` jonathan
2 siblings, 1 reply; 13+ messages in thread
From: jonathan @ 2009-09-28 19:27 UTC (permalink / raw)
On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> Some more observations:
>
> - SSE2 code performs 8% faster when suitable compilation options
> are present, -mfpmath=sse -msse2 (this is currently the case).
> Then digits 15 should probably stay in the declaration of Real.
>
Some more notes on this puzzle ...
As far as I can tell, using
type Real is digits 16;
is like using -mfpmath=387 on the command line during
compilation. -mfpmath=sse -msse2 is usually (always?)
the default. Amazingly, you can now use -mfpmath=387,sse.
(When I try -mfpmath=387,sse, it usually makes things worse,
but not always.)
With mandelbrot.adb, I find -mfpmath=387 (or digits 16)
the faster option. It may just be an accident of my
machine+compiler combination.
My timings are all on Intel processors. On AMD processors
I would not be surprised if digits 15 is the faster.
Here are some timings of mandelbrot.adb, using 1 worker
task, and
gnatmake -O3 -gnatnp mandelbrot.adb
(same as: gnatmake -O2 -gnatp mandelbrot.adb)
On a fairly new PC, single core, with
gnat 4.3.4 or 4.3.2, xeon X5460 3.16GHz:
digits 16:
real 0m34.871s
user 0m34.446s
sys 0m0.068s
digits 15 (with -mfpmath=sse -msse2):
real 0m43.657s
user 0m43.247s
sys 0m0.056s
On an old PC, single core:
gnat 4.3.2, xenon 2.8 GHz
digits 16:
real 1m31.885s
user 1m31.210s
sys 0m0.224s
digits 15 (with -mfpmath=sse -msse2):
real 1m42.453s
user 1m41.706s
sys 0m0.184s
As mentioned in an earlier post, on spectralnorm.adb
(another one of these benchmarks at the shootout site),
"digits 16" was the faster choice. It was a lot faster
than "digits 15" on my 2 PC's. On the test machine it was
faster, but by a smaller margin. But spectralnorm.adb
may not predict mandelbrot.adb very well.
Jonathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-27 21:27 ` Georg Bauhaus
2009-09-28 5:48 ` Martin
2009-09-28 19:27 ` jonathan
@ 2009-09-28 19:52 ` jonathan
2009-10-12 16:58 ` Georg Bauhaus
2 siblings, 1 reply; 13+ messages in thread
From: jonathan @ 2009-09-28 19:52 UTC (permalink / raw)
On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
bug.bauh...@maps.futureapps.de> wrote:
> - writing the image bytes with Stream_IO removes 6% running time
> when compared to GNAT.IO.Put.
> This adds standard Ada but also adds about 10 lines of code for
> the Put procedure and a Stdout variable. Is it worth it?
Speeding up IO would give you a very detectable improvement
in the multi-core benchmark, since the present program
parallelizes the computation well, and the remaining
problem is a small but irritating IO overhead that can't
be parallelized.
Here are a few timings on 8 cores.
Perfect parallelization would give speed-up factor = 8.
With Output enabled:
No_Of_Workers (tasks) = 8, speed-up factor = 4.45
No_Of_Workers (tasks) = 16, speed-up factor = 6.30
No_Of_Workers (tasks) = 24, speed-up factor = 6.66
No_Of_Workers (tasks) = 32, speed-up factor = 6.86
With Output disabled, it is nearer the optimal
speed-up factor of 8:
No_Of_Workers (tasks) = 32, speed-up factor = 7.66
The actual benchmark uses 4 cores, so I suspect that the
present standard setting of No_Of_Workers = 16 is
good.
For those who are interested in this problem as
much as I am, a few more words of explanation ...
The difficulty with mandelbrot is that if you
parallelize it by breaking it up into
work-segments (break up the outer loop into
segments of equal length), then some work-segments
finish quick, some slow, so we have a load balancing
problem. The solution Georg came up with breaks
the problem into a number of independent tasks four
time greater in number than the number of cores.
The operating system successfully distributes the
tasks over the cores in such a way that the cores
do comparable amounts of work.
(Hope my description is accurate.)
Jonathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-28 19:27 ` jonathan
@ 2009-09-29 15:26 ` Georg Bauhaus
0 siblings, 0 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-09-29 15:26 UTC (permalink / raw)
jonathan wrote:
> On Sep 27, 10:27 pm, Georg Bauhaus <rm.tsoh.plus-
> bug.bauh...@maps.futureapps.de> wrote:
>> Some more observations:
>>
>> - SSE2 code performs 8% faster when suitable compilation options
>> are present, -mfpmath=sse -msse2 (this is currently the case).
>> Then digits 15 should probably stay in the declaration of Real.
>>
>
> Some more notes on this puzzle ...
Indeed... I might well have lost track in the
labyrinth of settings, but seeing, as I do, a factor near 2
in Mandelbrot speed when switching from 15 to 16 digits
(or back) looks odd, especially when the combinations
alluded to below do not appear to be similarly far
from each other.
Next thing I'll do is work through the exponential
table of options and FPT definitions and CPUs
and compilers and OSs and ... carefully measuring
each cell. For now, here is a little test setup that does some
of this.
If you like, unpack in a fresh directory and type "make".
This will compile and run a few FPT related combinations
taking the core loop of Mandelbrot as an example.
(On Windows, type "make all-not-native" , after switching
three OS-related variables near the head of the Makefile.)
http://home.arcor.de/bauhaus/Ada/test1516fpt.zip
I will be short of time during the next few
days, and maybe off line.
FTR, as an experiment I have tried to
pragma Import(Intrinsic, MULPD, "__builtin_ia32_mulpd")
i.e. GCC builtins for SIMD multiplication etc, like the leading
programs do. Formally, this appears to be working, but
the compiler finally spit a bug box.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-09-28 19:52 ` jonathan
@ 2009-10-12 16:58 ` Georg Bauhaus
2009-10-12 22:46 ` jonathan
2009-10-13 9:11 ` Mark Lorenzen
0 siblings, 2 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-10-12 16:58 UTC (permalink / raw)
Lo and behold! An Ada program is at #1 in two lists
at the Shootout site. Look for mandelbrot, the 64
bit rankings. Enjoy the moment while it lasts. :-)
The high speed is largely due to the new inner loop,
composed by Jonathan Parker.
I have learned from it the possibility of setting
up variables for a computation, tailored to the
CPU's facilities, so that the compiler can
effectively distribute calculations to SSE registers.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-10-12 16:58 ` Georg Bauhaus
@ 2009-10-12 22:46 ` jonathan
2009-10-12 23:42 ` Anh Vo
2009-10-13 9:11 ` Mark Lorenzen
1 sibling, 1 reply; 13+ messages in thread
From: jonathan @ 2009-10-12 22:46 UTC (permalink / raw)
On Oct 12, 5:58 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> Lo and behold! An Ada program is at #1 in two lists
> at the Shootout site. Look for mandelbrot, the 64
> bit rankings. Enjoy the moment while it lasts. :-)
>
Here is the address:
http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&lang=all
> The high speed is largely due to the new inner loop,
> composed by Jonathan Parker.
Well, I figured it out by reading the C program;) The C and C++
programs use INTEL sse2 intrinsics to get the best
performance out of the sse2 floating point units.
The INTEL sse2 intrinsics are IIUC a convenient interface
to intel sse2 assembly language:
http://msdn.microsoft.com/en-us/library/kcwz153a(VS.71).aspx
The result we can be proud of is the finding that GNAT/gcc
is just smart enough to produce optimal code without the
use of INTEL sse2 intrinsics, at least on the 64 bit machine.
The key to getting the best performance out of
the SSE hardware was presenting it
with 2 identical streams of instructions, each of which
starts with different initial conditions. It can
be done in high-level language as well as
the sse2 intrinsics.
The challenge in this part of the shootout was
non-trivial: to get the mandelbrot calculation to
exploit all 4 cores of the test machine in parallel, (a
load-balancing/distributed-processing problem I mentioned
earlier in this thread) and to simultaneously put the
SSE floating point units to best use. The 4 programs
that did the best at this used:
C + pthreads + INTEL sse2 intrinsics,
C++ + OpenMP + INTEL sse2 intrinsics,
ATS + pthreads(?) + INTEL sse2 intrinsics,
and
Ada + Ada + Ada.
Jonathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-10-12 22:46 ` jonathan
@ 2009-10-12 23:42 ` Anh Vo
0 siblings, 0 replies; 13+ messages in thread
From: Anh Vo @ 2009-10-12 23:42 UTC (permalink / raw)
On Oct 12, 3:46 pm, jonathan <johns...@googlemail.com> wrote:
> On Oct 12, 5:58 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
>
> > Lo and behold! An Ada program is at #1 in two lists
> > at the Shootout site. Look for mandelbrot, the 64
> > bit rankings. Enjoy the moment while it lasts. :-)
>
> Here is the address:
>
> http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&...
>
> > The high speed is largely due to the new inner loop,
> > composed by Jonathan Parker.
>
> Well, I figured it out by reading the C program;) The C and C++
> programs use INTEL sse2 intrinsics to get the best
> performance out of the sse2 floating point units.
> The INTEL sse2 intrinsics are IIUC a convenient interface
> to intel sse2 assembly language:http://msdn.microsoft.com/en-us/library/kcwz153a(VS.71).aspx
>
> The result we can be proud of is the finding that GNAT/gcc
> is just smart enough to produce optimal code without the
> use of INTEL sse2 intrinsics, at least on the 64 bit machine.
> The key to getting the best performance out of
> the SSE hardware was presenting it
> with 2 identical streams of instructions, each of which
> starts with different initial conditions. It can
> be done in high-level language as well as
> the sse2 intrinsics.
>
> The challenge in this part of the shootout was
> non-trivial: to get the mandelbrot calculation to
> exploit all 4 cores of the test machine in parallel, (a
> load-balancing/distributed-processing problem I mentioned
> earlier in this thread) and to simultaneously put the
> SSE floating point units to best use. The 4 programs
> that did the best at this used:
> C + pthreads + INTEL sse2 intrinsics,
> C++ + OpenMP + INTEL sse2 intrinsics,
> ATS + pthreads(?) + INTEL sse2 intrinsics,
> and
> Ada + Ada + Ada.
It is impressive. I feel proud that Ada comes in first. So, Ada foes
have no more execuses to say Ada is slow. Great job Jonathan.
Anh Vo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-10-12 16:58 ` Georg Bauhaus
2009-10-12 22:46 ` jonathan
@ 2009-10-13 9:11 ` Mark Lorenzen
2009-10-13 9:39 ` Gautier write-only
1 sibling, 1 reply; 13+ messages in thread
From: Mark Lorenzen @ 2009-10-13 9:11 UTC (permalink / raw)
On 12 Okt., 18:58, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> Lo and behold! An Ada program is at #1 in two lists
> at the Shootout site. Look for mandelbrot, the 64
> bit rankings. Enjoy the moment while it lasts. :-)
Great!
Please note that the "bit-wise or" operation in the following line has
no effect:
Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#
When shifting left, you are guaranteed that zeroes are shifted in.
I have no idea if this extremely small optimization actually has any
impact on the performance though...
- Mark L
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-10-13 9:11 ` Mark Lorenzen
@ 2009-10-13 9:39 ` Gautier write-only
2009-10-13 12:57 ` Georg Bauhaus
0 siblings, 1 reply; 13+ messages in thread
From: Gautier write-only @ 2009-10-13 9:39 UTC (permalink / raw)
On 13 Okt., 11:11, Mark Lorenzen <mark.loren...@gmail.com> wrote:
> Please note that the "bit-wise or" operation in the following line has
> no effect:
> Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#
>
> When shifting left, you are guaranteed that zeroes are shifted in.
Well "or 0" has no effect anyway (shift or not shift). If you wanted
to filter something, you would write "and 16#FFFFFF00# :-)
Gautier
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Tasking for Mandelbrot program
2009-10-13 9:39 ` Gautier write-only
@ 2009-10-13 12:57 ` Georg Bauhaus
0 siblings, 0 replies; 13+ messages in thread
From: Georg Bauhaus @ 2009-10-13 12:57 UTC (permalink / raw)
Gautier write-only schrieb:
> On 13 Okt., 11:11, Mark Lorenzen <mark.loren...@gmail.com> wrote:
>
>> Please note that the "bit-wise or" operation in the following line has
>> no effect:
>> Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#
>>
>> When shifting left, you are guaranteed that zeroes are shifted in.
>
> Well "or 0" has no effect anyway (shift or not shift). If you wanted
> to filter something, you would write "and 16#FFFFFF00# :-)
I'm stating the obvious when saying that the compiler
knows that "or 16#00#" has no effect... ;-) FWIW, the
simple symmetric if/else around the above line seems
to give the fastest program with this distinction,
at least when produced by the Shootout compiler;
with GNATs such as GPL GNAT we might be able to reduce source
size by using Boolean'Pos instead of if/else or even anonymous
conditional expressions as are proposed for Ada 1Y.
Compile with -gnatX if using GNAT, to see timing differences,
if any:
with Interfaces; use Interfaces;
with Ada.Command_Line ; use Ada.Command_Line;
with Ada.Calendar; use Ada.Calendar;
with Ada.Text_IO; use Ada.Text_IO;
procedure Bittest is
Byte_Acc : Unsigned_8;
Ntests : constant := 100;
procedure Work (This_Way: Boolean) is
pragma Inline (Work);
begin
for K in 1 .. Ntests loop
Byte_Acc := Shift_Left (Byte_Acc, 1)
or
Boolean'Pos(not This_Way);
end loop;
end work;
procedure Work2 (This_Way: Boolean) is -- original
pragma Inline (Work2);
begin
for K in 1 .. Ntests loop
if This_Way then
Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#00#;
else
Byte_Acc := Shift_Left (Byte_Acc, 1) or 16#01#;
end if;
end loop;
end Work2;
procedure WorkX (This_Way: Boolean) is
pragma Inline (WorkX);
begin
for K in 1 .. Ntests loop
Byte_Acc := Shift_Left (Byte_Acc, 1) or (if This_Way
then 16#00#
else 16#01#);
end loop;
end WorkX;
Start, Finish: Time;
begin
Byte_Acc := Boolean'Pos(Argument_Count > 1);
Start := Clock;
for K in 1 .. 5_000_000 loop
Work (Argument(1) = "yes");
end loop;
Finish := Clock;
Put_Line ("Work: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
" in " & Duration'Image (Finish - Start) & " seconds");
Byte_Acc := Boolean'Pos(Argument_Count > 1);
Start := Clock;
for K in 1 .. 5_000_000 loop
Work2 (Argument(1) = "yes");
end loop;
Finish := Clock;
Put_Line ("Work2: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
" in " & Duration'Image (Finish - Start) & " seconds");
Byte_Acc := Boolean'Pos(Argument_Count > 1);
Start := Clock;
for K in 1 .. 5_000_000 loop
WorkX (Argument(1) = "yes");
end loop;
Finish := Clock;
Put_Line ("WorkX: Byte_Acc = " & Unsigned_8'Image (Byte_Acc) &
" in " & Duration'Image (Finish - Start) & " seconds");
end Bittest;
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-10-13 12:57 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-27 1:08 Tasking for Mandelbrot program Georg Bauhaus
2009-09-27 11:24 ` Martin
2009-09-27 21:27 ` Georg Bauhaus
2009-09-28 5:48 ` Martin
2009-09-28 19:27 ` jonathan
2009-09-29 15:26 ` Georg Bauhaus
2009-09-28 19:52 ` jonathan
2009-10-12 16:58 ` Georg Bauhaus
2009-10-12 22:46 ` jonathan
2009-10-12 23:42 ` Anh Vo
2009-10-13 9:11 ` Mark Lorenzen
2009-10-13 9:39 ` Gautier write-only
2009-10-13 12:57 ` Georg Bauhaus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox