From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,7767a311e01e1cd X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Path: g2news2.google.com!postnews.google.com!i3g2000cwc.googlegroups.com!not-for-mail From: claude.simon@equipement.gouv.fr Newsgroups: comp.lang.ada Subject: Re: GNAT compiler switches and optimization Date: 25 Oct 2006 08:32:53 -0700 Organization: http://groups.google.com Message-ID: <1161790373.277521.105970@i3g2000cwc.googlegroups.com> References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com> <9Qb_g.111857$aJ.65708@attbi_s21> <434o04-7g7.ln1@newserver.thecreems.com> <4539ce34$1_2@news.bluewin.ch> <453A532F.2070709@obry.net> <9kfq04-sgm.ln1@newserver.thecreems.com> <5vgs04-64f.ln1@newserver.thecreems.com> <453bc74e$0$19614$426a74cc@news.free.fr> <4jit04-0gq.ln1@newserver.thecreems.com> <453d1d36$0$25551$bf4948fe@news.tele2.nl> <2h3314-gak.ln1@newserver.thecreems.com> NNTP-Posting-Host: 212.23.162.39 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1161790386 31390 127.0.0.1 (25 Oct 2006 15:33:06 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Wed, 25 Oct 2006 15:33:06 +0000 (UTC) In-Reply-To: <2h3314-gak.ln1@newserver.thecreems.com> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7,gzip(gfe),gzip(gfe) X-HTTP-Via: 1.1 sesame.setra.fr:3128 (squid/2.5.STABLE3) Complaints-To: groups-abuse@google.com Injection-Info: i3g2000cwc.googlegroups.com; posting-host=212.23.162.39; posting-account=SnKwIw0AAABJwJI8idy3oqasZLmqdeAT Xref: g2news2.google.com comp.lang.ada:7196 Date: 2006-10-25T08:32:53-07:00 List-Id: A way to have better performance is to optimize the use of the memory cache. An Ada writen matmul procedure to tend to the fortran intrinsic matmul speed with -O2 and -gnatp ada version with array size of 600 : time =3D 0.6722 fortran version with array size of 600 : time =3D 0.6094 package Array_Product is type Real_Matrix is array(Integer range <>,Integer range <>) of Float; type Matrix_Access is access Real_Matrix; procedure Matmul (A, B : in Real_Matrix; C : out Real_Matrix); end Array_Product; package body Array_Product is procedure Matmul (A, B : in Real_Matrix; C : out Real_Matrix) is BL : array (1 .. B'Length (1) * B'Length (2)) of Float; D : Natural :=3D 0; Sum : Float :=3D 0.0; begin D :=3D 0; -- transposition gave the best speedup cache + only one indice Transpose_B_in_BL : for J in B'Range (1) loop for R in B'Range (2) loop BL( D + R) :=3D B(R,J); end loop; D :=3D D + B'Length (2); end loop Transpose_B_in_BL; for I in A'Range (1) loop declare -- give the cache best chance with A second speedup Ai : array (A'range (2)) of Float; begin for R in Ai'Range loop Ai (R) :=3D A (I, R); end loop; D :=3D 0; for J in A'range(2) loop Sum :=3D 0.0; for R in A'Range (2) loop Sum :=3D Sum + Ai(R)*BL(D + R); end loop; D :=3D D + A'Length (2); C(I,J) :=3D Sum; end loop; end; end loop; end Matmul; end Array_Product; Jeffrey Creem a =E9crit : > Jeffrey R. Carter wrote: > > Jeffrey Creem wrote: > > > >> > >> 2) Small changes to the original code that still meet the original > >> structure and intent of the original code can move the run time up and > >> down by at least 50% > >> 3) The two task based version of the code is running more than 2x > >> faster than the single task version on a 2 processor machine. Some of > >> this is from the two tasks but looking at the assembly, another > >> portion of it is related to #2 above in that the re-arrangement of > >> the math allows the compiler to get less brain dead. > > > > > > These seem quite odd to me. Perhaps whatever is causing this is also the > > cause of the speedup I saw in the sequential case when the > > multiplication is put in a procedure. > > > > Ok, we are all probably tired of taking about this topic but I posted > what I think will be the final write-up on this (until the bugzilla > issue for it is resolved). > > http://gnuada.sourceforge.net/pmwiki.php/Main/Oct2006CLAGFORTRANComparison > > And I agree that the > 2x speedup for the two task version is not real. > There are now other versions that are coded in a similar maner to the 2 > task version but which are single threaded. The 2 task version runs > almost exactly 2x faster (sometimes slightly slower) on my dual > processor machine.