From: "Jeffrey R. Carter" <spam.not.jrcarter@acm.not.spam.org>
Subject: Re: GNAT compiler switches and optimization
Date: Sun, 22 Oct 2006 07:39:27 GMT
Date: 2006-10-22T07:39:27+00:00 [thread overview]
Message-ID: <OeF_g.1031657$084.91539@attbi_s22> (raw)
In-Reply-To: <sj3r04-rlv.ln1@newserver.thecreems.com>
Jeffrey Creem wrote:
>
> Actually, as a result of this, I submitted a bug report to the GCC
> bugzilla list. You can follow progress on it here:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29543
>
> Interesting initial feedback is that
> 1) Not an Ada bug.
> 2) Is a FORTRAN bug
> 3) Is a backend limitation of the optimizer.
>
> Of course, the FORTRAN one still runs correctly so I don't think most
> users will care that it is because of a bug :)
Interesting. I've been experimenting with some variations simply out of
curiosity and found some things that seem a bit strange. (All results
for an argument of 800.)
Adding the Sum variable makes an important difference, as others have
reported, in my case from 5.82 to 4.38 s. Hoisting the indexing
calculation for the result (C) matrix location is a basic optimization,
and I would be surprised if it isn't done. The only thing I can think of
is that it's a cache issue: that all 3 matrices can't be kept in cache
at once. Perhaps compiler writers would be able to make sense of this.
Previously, I found no difference between -O2 and -O3. With this change,
-O2 is faster.
The issue of using 'range compared to using "1 .. N" makes no difference
in my version of the program.
Something I found really surprising is that putting the multiplication
in a procedure makes the program faster, down to 4.03 s. I have no idea
why this would be so.
Compiled with MinGW GNAT 3.4.2, -O2, -gnatnp -fomit-frame-pointer. Run
under Windows XP SP2 on a 3.2 GHz Pentium 4 HT with 1 GB RAM.
Here's the code:
with Ada.Numerics.Float_Random;
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Calendar; use Ada.Calendar;
procedure Tst_Array is
package F_IO is new Ada.Text_IO.Float_IO (Float);
package D_IO is new Ada.Text_Io.Fixed_Io (Duration);
N : constant Positive := Integer'Value (Argument (1) );
type Real_Matrix is array (1 .. N, 1 .. N) of Float;
pragma Convention (FORTRAN, Real_Matrix);
G : Ada.Numerics.Float_Random.Generator;
A,B : Real_Matrix :=
(others => (others => Ada.Numerics.Float_Random.Random (G) ) );
C : Real_Matrix := (others => (others => 0.0) );
Start, Finish : Ada.Calendar.Time;
procedure Multiply is
Sum : Float;
begin -- Multiply
All_Rows : for Row in A'range (1) loop
All_Columns : for Column in B'range (2) loop
Sum := 0.0;
All_Common : for R in A'range (2) loop
Sum := Sum + A (Row, R) * B (R, Column);
end loop All_Common;
C (Row, Column) := Sum;
end loop All_Columns;
end loop All_Rows;
end Multiply;
begin
Start := Ada.Calendar.Clock;
Multiply;
Finish := Ada.Calendar.Clock;
F_IO.Put (C (1, 1) );
F_IO.Put (C (1, 2) );
New_Line;
F_IO.Put (C (2, 1) );
F_IO.Put (C (2, 2) );
New_Line;
Put ("Time: ");
D_IO.Put (Finish - Start);
New_Line;
end Tst_Array;
Next, since there have been reported some meaningful speed-up of quick
sort on a Pentium 4 HT processor by using 2 tasks, I thought I'd see
what effect that had. With 2 tasks, I got a time of 3.70 s. That's not a
significant speed up, about 9.1%.
Same compilation options and platform.
Here's that code:
with Ada.Numerics.Float_Random;
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Calendar; use Ada.Calendar;
procedure Tst_Array is
package F_IO is new Ada.Text_IO.Float_IO (Float);
package D_IO is new Ada.Text_Io.Fixed_Io (Duration);
N : constant Positive := Integer'Value (Argument (1) );
type Real_Matrix is array (1 .. N, 1 .. N) of Float;
pragma Convention (FORTRAN, Real_Matrix);
G : Ada.Numerics.Float_Random.Generator;
A, B : Real_Matrix :=
(others => (others => Ada.Numerics.Float_Random.Random (G) ) );
C : Real_Matrix := (others => (others => 0.0) );
Start, Finish : Ada.Calendar.Time;
procedure Multiply is
procedure Multiply
(Start_Row : in Positive; Stop_Row : in Positive)
is
Sum : Float;
begin -- Multiply
All_Rows : for Row in Start_Row .. Stop_Row loop
All_Columns : for Column in B'range (2) loop
Sum := 0.0;
All_Common : for R in A'range (2) loop
Sum := Sum + A (Row, R) * B (R, Column);
end loop All_Common;
C (Row, Column) := Sum;
end loop All_Columns;
end loop All_Rows;
end Multiply;
task type Multiplier (Start_Row : Positive; Stop_Row : Positive);
task body Multiplier is
-- null;
begin -- Multiplier
Multiply (Start_Row => Start_Row, Stop_Row => Stop_Row);
end Multiplier;
Stop : constant Positive := N / 2;
Start : constant Positive := Stop + 1;
Mult : Multiplier (Start_Row => 1, Stop_Row => Stop);
begin -- Multiply
Multiply (Start_Row => Start, Stop_Row => N);
end Multiply;
begin
Start := Ada.Calendar.Clock;
Multiply;
Finish := Ada.Calendar.Clock;
F_IO.Put (C (1, 1) );
F_IO.Put (C (1, 2) );
New_Line;
F_IO.Put (C (2, 1) );
F_IO.Put (C (2, 2) );
New_Line;
Put ("Time: ");
D_IO.Put (Finish - Start);
New_Line;
end Tst_Array;
If I inline the inner Multiply, or put equivalent code in the task and
the outer Mutliply, the time is much more than for the sequential
version, presumably due to cache effects.
Since it appears you have 2 physical processors ("Dual Xeon 2.8 Ghz"), I
would be interested in seeing what effect this concurrent version has on
that platform. I also wonder how easy such a version would be to create
in FORTRAN.
--
Jeff Carter
"Ada has made you lazy and careless. You can write programs in C that
are just as safe by the simple application of super-human diligence."
E. Robert Tisdale
72
next prev parent reply other threads:[~2006-10-22 7:39 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-20 10:47 GNAT compiler switches and optimization tkrauss
2006-10-20 11:04 ` Duncan Sands
2006-10-21 10:45 ` Stephen Leake
2006-10-20 11:42 ` Duncan Sands
2006-10-20 15:41 ` Martin Krischik
2006-10-20 12:09 ` Samuel Tardieu
2006-10-20 12:18 ` Samuel Tardieu
2006-10-20 12:12 ` Gautier
2006-10-20 12:35 ` Dmitry A. Kazakov
2006-10-20 15:53 ` Martin Krischik
2006-10-20 12:52 ` Gautier
2006-10-20 13:27 ` claude.simon
2006-10-20 15:38 ` Robert A Duff
2006-10-20 19:32 ` Gautier
2006-10-20 15:56 ` Jeffrey Creem
2006-10-20 16:30 ` Martin Krischik
2006-10-20 19:51 ` Gautier
2006-10-20 22:11 ` Jeffrey R. Carter
2006-10-20 23:52 ` Jeffrey Creem
2006-10-21 7:37 ` Gautier
2006-10-21 16:35 ` Jeffrey Creem
2006-10-21 17:04 ` Pascal Obry
2006-10-21 21:22 ` Jeffrey Creem
2006-10-22 3:03 ` Jeffrey Creem
2006-10-22 7:39 ` Jeffrey R. Carter [this message]
2006-10-22 11:48 ` tkrauss
2006-10-22 18:02 ` Georg Bauhaus
2006-10-22 18:24 ` Jeffrey Creem
2006-10-23 0:10 ` Georg Bauhaus
2006-10-22 20:20 ` Jeffrey R. Carter
2006-10-22 12:31 ` Gautier
2006-10-22 20:26 ` Jeffrey R. Carter
2006-10-22 21:22 ` Gautier
2006-10-22 18:01 ` tmoran
2006-10-22 20:54 ` Jeffrey R. Carter
2006-10-22 13:50 ` Alinabi
2006-10-22 15:41 ` Jeffrey Creem
2006-10-23 0:02 ` Alinabi
2006-10-23 5:28 ` Gautier
2006-10-23 16:32 ` Alinabi
2006-10-22 15:57 ` Jeffrey Creem
2006-10-22 19:32 ` Damien Carbonne
2006-10-22 20:00 ` Gautier
2006-10-22 20:51 ` Damien Carbonne
2006-10-23 2:15 ` Jeffrey Creem
2006-10-23 2:29 ` Jeffrey R. Carter
2006-10-23 1:31 ` Jeffrey Creem
2006-10-23 3:10 ` Jeffrey Creem
2006-10-23 7:31 ` Jeffrey R. Carter
2006-10-23 11:55 ` Jeffrey Creem
2006-10-23 19:52 ` Wiljan Derks
2006-10-23 20:25 ` Jeffrey R. Carter
2006-10-24 9:52 ` Dr. Adrian Wrigley
2006-10-24 11:50 ` Jeffrey Creem
2006-10-24 16:24 ` Jeffrey R. Carter
2006-10-25 3:50 ` Jeffrey Creem
2006-10-25 15:32 ` claude.simon
2006-10-24 19:21 ` Wiljan Derks
2006-10-23 12:33 ` Warner BRUNS
2006-10-23 12:40 ` Warner BRUNS
2006-10-23 13:52 ` Georg Bauhaus
2006-10-23 17:11 ` Warner BRUNS
2006-10-23 17:57 ` Dr. Adrian Wrigley
2006-10-23 15:02 ` Robert A Duff
2006-10-23 20:22 ` Jeffrey R. Carter
2006-10-21 18:28 ` tmoran
2006-10-23 6:28 ` Martin Krischik
2006-10-21 12:39 ` Dr. Adrian Wrigley
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox