From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,7767a311e01e1cd
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news2.google.com!postnews.google.com!i3g2000cwc.googlegroups.com!not-for-mail
From: claude.simon@equipement.gouv.fr
Newsgroups: comp.lang.ada
Subject: Re: GNAT compiler switches and optimization
Date: 25 Oct 2006 08:32:53 -0700
Organization: http://groups.google.com
Message-ID: <1161790373.277521.105970@i3g2000cwc.googlegroups.com>
References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com>
   <9Qb_g.111857$aJ.65708@attbi_s21>
   <434o04-7g7.ln1@newserver.thecreems.com>
   <4539ce34$1_2@news.bluewin.ch>
   <nrup04-5hj.ln1@newserver.thecreems.com>
   <453A532F.2070709@obry.net>
   <9kfq04-sgm.ln1@newserver.thecreems.com>
   <sj3r04-rlv.ln1@newserver.thecreems.com>
   <5vgs04-64f.ln1@newserver.thecreems.com>
   <453bc74e$0$19614$426a74cc@news.free.fr>
   <4jit04-0gq.ln1@newserver.thecreems.com>
   <jcot04-l2u.ln1@newserver.thecreems.com>
   <Ld__g.206207$FQ1.45478@attbi_s71>
   <c6nu04-m31.ln1@newserver.thecreems.com>
   <453d1d36$0$25551$bf4948fe@news.tele2.nl>
   <pan.2006.10.24.09.54.07.340442@linuxchip.demon.co.uk.uk.uk>
   <e8b114-7k1.ln1@newserver.thecreems.com>
   <Z6r%g.207607$FQ1.56563@attbi_s71>
   <2h3314-gak.ln1@newserver.thecreems.com>
NNTP-Posting-Host: 212.23.162.39
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1161790386 31390 127.0.0.1 (25 Oct 2006 15:33:06
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Wed, 25 Oct 2006 15:33:06 +0000 (UTC)
In-Reply-To: <2h3314-gak.ln1@newserver.thecreems.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr;
 rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7,gzip(gfe),gzip(gfe)
X-HTTP-Via: 1.1 sesame.setra.fr:3128 (squid/2.5.STABLE3)
Complaints-To: groups-abuse@google.com
Injection-Info: i3g2000cwc.googlegroups.com; posting-host=212.23.162.39;
   posting-account=SnKwIw0AAABJwJI8idy3oqasZLmqdeAT
Xref: g2news2.google.com comp.lang.ada:7196
Date: 2006-10-25T08:32:53-07:00
List-Id: <comp.lang.ada>

A way to have better performance is to optimize the use of the memory
cache.

An Ada writen matmul procedure to tend to the fortran intrinsic matmul
speed with -O2 and -gnatp


ada     version with array size of 600 : time =3D 0.6722
fortran version with array size of 600 : time =3D 0.6094


package Array_Product is

   type Real_Matrix is array(Integer range <>,Integer range <>) of
Float;
   type Matrix_Access is access Real_Matrix;

   procedure Matmul (A, B : in Real_Matrix; C : out Real_Matrix);

end Array_Product;


package body Array_Product is

   procedure Matmul (A, B : in Real_Matrix; C : out Real_Matrix)
   is
      BL : array (1 .. B'Length (1) * B'Length (2)) of Float;
      D  : Natural :=3D 0;
      Sum : Float :=3D 0.0;
   begin
      D :=3D 0;
      -- transposition gave the best speedup cache + only one indice
     Transpose_B_in_BL :
      for J in B'Range (1) loop
         for R in B'Range (2) loop
            BL( D + R) :=3D B(R,J);
         end loop;
         D :=3D D + B'Length (2);
      end loop Transpose_B_in_BL;

      for I in A'Range (1) loop
         declare
            -- give the cache best chance with A second speedup
            Ai : array (A'range (2)) of Float;
         begin
            for R in Ai'Range loop
               Ai (R) :=3D A (I, R);
            end loop;
            D :=3D 0;
            for J in A'range(2) loop
               Sum :=3D 0.0;
               for R in A'Range (2) loop
                  Sum :=3D Sum + Ai(R)*BL(D + R);
               end loop;
               D :=3D D + A'Length (2);
               C(I,J) :=3D Sum;
            end loop;
         end;
      end loop;
   end Matmul;

end Array_Product;


Jeffrey Creem a =E9crit :

> Jeffrey R. Carter wrote:
> > Jeffrey Creem wrote:
> >
> >>
> >> 2) Small changes to the original code that still meet the original
> >> structure and intent of the original code can move the run time up and
> >> down by at least 50%
> >> 3) The two task based version of the code is running more than 2x
> >> faster than the single task version on a 2 processor machine. Some of
> >> this is from the two tasks but looking at the assembly, another
> >> portion of it is  related to #2 above in that the re-arrangement of
> >> the math allows the compiler to get less brain dead.
> >
> >
> > These seem quite odd to me. Perhaps whatever is causing this is also the
> > cause of the speedup I saw in the sequential case when the
> > multiplication is put in a procedure.
> >
>
> Ok, we are all probably tired of taking about this topic but I posted
> what I think will be the final write-up on this (until the bugzilla
> issue for it is resolved).
>
> http://gnuada.sourceforge.net/pmwiki.php/Main/Oct2006CLAGFORTRANComparison
>
> And I agree that the > 2x speedup for the two task version is not real.
> There are now other versions that are coded in a similar maner to the 2
> task version but which are single threaded. The 2 task version runs
> almost exactly 2x faster (sometimes slightly slower) on my dual
> processor machine.