From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,7767a311e01e1cd X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!newsfeed2.dallas1.level3.net!news.level3.com!newsfeed-00.mathworks.com!newscon02.news.prodigy.net!prodigy.net!wns14feed!worldnet.att.net!attbi_s22.POSTED!53ab2750!not-for-mail From: "Jeffrey R. Carter" Organization: jrcarter at acm dot org User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: GNAT compiler switches and optimization References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com> <9Qb_g.111857$aJ.65708@attbi_s21> <434o04-7g7.ln1@newserver.thecreems.com> <4539ce34$1_2@news.bluewin.ch> <453A532F.2070709@obry.net> <9kfq04-sgm.ln1@newserver.thecreems.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Message-ID: NNTP-Posting-Host: 12.201.97.213 X-Complaints-To: abuse@mchsi.com X-Trace: attbi_s22 1161502766 12.201.97.213 (Sun, 22 Oct 2006 07:39:26 GMT) NNTP-Posting-Date: Sun, 22 Oct 2006 07:39:26 GMT Date: Sun, 22 Oct 2006 07:39:27 GMT Xref: g2news2.google.com comp.lang.ada:7125 Date: 2006-10-22T07:39:27+00:00 List-Id: Jeffrey Creem wrote: > > Actually, as a result of this, I submitted a bug report to the GCC > bugzilla list. You can follow progress on it here: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29543 > > Interesting initial feedback is that > 1) Not an Ada bug. > 2) Is a FORTRAN bug > 3) Is a backend limitation of the optimizer. > > Of course, the FORTRAN one still runs correctly so I don't think most > users will care that it is because of a bug :) Interesting. I've been experimenting with some variations simply out of curiosity and found some things that seem a bit strange. (All results for an argument of 800.) Adding the Sum variable makes an important difference, as others have reported, in my case from 5.82 to 4.38 s. Hoisting the indexing calculation for the result (C) matrix location is a basic optimization, and I would be surprised if it isn't done. The only thing I can think of is that it's a cache issue: that all 3 matrices can't be kept in cache at once. Perhaps compiler writers would be able to make sense of this. Previously, I found no difference between -O2 and -O3. With this change, -O2 is faster. The issue of using 'range compared to using "1 .. N" makes no difference in my version of the program. Something I found really surprising is that putting the multiplication in a procedure makes the program faster, down to 4.03 s. I have no idea why this would be so. Compiled with MinGW GNAT 3.4.2, -O2, -gnatnp -fomit-frame-pointer. Run under Windows XP SP2 on a 3.2 GHz Pentium 4 HT with 1 GB RAM. Here's the code: with Ada.Numerics.Float_Random; with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_IO; use Ada.Text_IO; with Ada.Calendar; use Ada.Calendar; procedure Tst_Array is package F_IO is new Ada.Text_IO.Float_IO (Float); package D_IO is new Ada.Text_Io.Fixed_Io (Duration); N : constant Positive := Integer'Value (Argument (1) ); type Real_Matrix is array (1 .. N, 1 .. N) of Float; pragma Convention (FORTRAN, Real_Matrix); G : Ada.Numerics.Float_Random.Generator; A,B : Real_Matrix := (others => (others => Ada.Numerics.Float_Random.Random (G) ) ); C : Real_Matrix := (others => (others => 0.0) ); Start, Finish : Ada.Calendar.Time; procedure Multiply is Sum : Float; begin -- Multiply All_Rows : for Row in A'range (1) loop All_Columns : for Column in B'range (2) loop Sum := 0.0; All_Common : for R in A'range (2) loop Sum := Sum + A (Row, R) * B (R, Column); end loop All_Common; C (Row, Column) := Sum; end loop All_Columns; end loop All_Rows; end Multiply; begin Start := Ada.Calendar.Clock; Multiply; Finish := Ada.Calendar.Clock; F_IO.Put (C (1, 1) ); F_IO.Put (C (1, 2) ); New_Line; F_IO.Put (C (2, 1) ); F_IO.Put (C (2, 2) ); New_Line; Put ("Time: "); D_IO.Put (Finish - Start); New_Line; end Tst_Array; Next, since there have been reported some meaningful speed-up of quick sort on a Pentium 4 HT processor by using 2 tasks, I thought I'd see what effect that had. With 2 tasks, I got a time of 3.70 s. That's not a significant speed up, about 9.1%. Same compilation options and platform. Here's that code: with Ada.Numerics.Float_Random; with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_IO; use Ada.Text_IO; with Ada.Calendar; use Ada.Calendar; procedure Tst_Array is package F_IO is new Ada.Text_IO.Float_IO (Float); package D_IO is new Ada.Text_Io.Fixed_Io (Duration); N : constant Positive := Integer'Value (Argument (1) ); type Real_Matrix is array (1 .. N, 1 .. N) of Float; pragma Convention (FORTRAN, Real_Matrix); G : Ada.Numerics.Float_Random.Generator; A, B : Real_Matrix := (others => (others => Ada.Numerics.Float_Random.Random (G) ) ); C : Real_Matrix := (others => (others => 0.0) ); Start, Finish : Ada.Calendar.Time; procedure Multiply is procedure Multiply (Start_Row : in Positive; Stop_Row : in Positive) is Sum : Float; begin -- Multiply All_Rows : for Row in Start_Row .. Stop_Row loop All_Columns : for Column in B'range (2) loop Sum := 0.0; All_Common : for R in A'range (2) loop Sum := Sum + A (Row, R) * B (R, Column); end loop All_Common; C (Row, Column) := Sum; end loop All_Columns; end loop All_Rows; end Multiply; task type Multiplier (Start_Row : Positive; Stop_Row : Positive); task body Multiplier is -- null; begin -- Multiplier Multiply (Start_Row => Start_Row, Stop_Row => Stop_Row); end Multiplier; Stop : constant Positive := N / 2; Start : constant Positive := Stop + 1; Mult : Multiplier (Start_Row => 1, Stop_Row => Stop); begin -- Multiply Multiply (Start_Row => Start, Stop_Row => N); end Multiply; begin Start := Ada.Calendar.Clock; Multiply; Finish := Ada.Calendar.Clock; F_IO.Put (C (1, 1) ); F_IO.Put (C (1, 2) ); New_Line; F_IO.Put (C (2, 1) ); F_IO.Put (C (2, 2) ); New_Line; Put ("Time: "); D_IO.Put (Finish - Start); New_Line; end Tst_Array; If I inline the inner Multiply, or put equivalent code in the task and the outer Mutliply, the time is much more than for the sequential version, presumably due to cache effects. Since it appears you have 2 physical processors ("Dual Xeon 2.8 Ghz"), I would be interested in seeing what effect this concurrent version has on that platform. I also wonder how easy such a version would be to create in FORTRAN. -- Jeff Carter "Ada has made you lazy and careless. You can write programs in C that are just as safe by the simple application of super-human diligence." E. Robert Tisdale 72