From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,b49d3a703a4b4db5
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!postnews.google.com!q22g2000yqm.googlegroups.com!not-for-mail
From: jonathan <johnscpg@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: compiler settings in AdaGIDE
Date: Sat, 24 Jul 2010 11:21:45 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: 
 <e715e179-6053-4190-94d4-6a348f9ea635@q22g2000yqm.googlegroups.com>
References: 
 <c072b766-9241-4580-9447-4adeb6ed0573@x21g2000yqa.googlegroups.com>
NNTP-Posting-Host: 143.117.23.236
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1279995705 30802 127.0.0.1 (24 Jul 2010 18:21:45
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sat, 24 Jul 2010 18:21:45 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: q22g2000yqm.googlegroups.com; posting-host=143.117.23.236;
	posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.19)
	Gecko/2010062510 Iceweasel/3.0.6 (Debian-3.0.6-3),gzip(gfe)
Xref: g2news1.google.com comp.lang.ada:12532
Date: 2010-07-24T11:21:45-07:00
List-Id: <comp.lang.ada>

On Jul 23, 10:52=A0am, Ada novice <po...@gmx.us> wrote:
> Hi,
> =A0 =A0 I'm using the AdaGIDE editor (version 7.45.2) together with the
> GNAT AdaCore libre compiler (release 2010) on a Win XP machine. I
> would like my codes to run as fast as possible and here is the content
> of a typical gnat.ago file that I use. Please let me know what
> improvements I can make in order for an Ada program to run in the
> minimum amount of time.
>
> Thanks.
> YC


I wrote a quick benchmark using our favorite
complex number eigen-routine, so now we can test
some these things rather quickly.

The benchmark does 100 calls to

  Complex_Eigenvalues.Eigen (P , W , V , FAIL);

using 121 x 121 complex matrices.

I compiled it with both the 2009 and 2010 GNAT GPL
compilers and timed it on a fast Intel PC
(Xeon X5460 @ 3.16GHz) with:

   time ./bench

Let's look at the best results first, and compare the
'09 and '10 GNAT GPL compilers:

   2010 GNAT GPL:
   gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll-
loops
   3.61 seconds

   2009 GNAT GPL:
   gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll-
loops
   4.57 seconds

This came as a nice surprise to me!
We can learn some more about the compiler switches by
toggling them. I'll stick to the 2010 GNAT GPL from now on:

   change -O2 to -O3:
   (gnatmake bench.adb -gnatnp -O3 -march=3Dnative -ffast-math -funroll-
loops)
   running time changes from 3.61 to 3.63 seconds

   remove -funroll-loops:
   (gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math)
   running time changes from 3.61 to 3.66 seconds

   remove -ffast-math:
   (gnatmake bench.adb -gnatnp -O2 -march=3Dnative)
   running time changes from 3.61 to 4.35 seconds

The -ffast-math had an amazing affect. I've never seen that
before ... maybe an interaction with complex number arithmetic.

Now let's check -gnato:

   add -gnato:
   (gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll-
loops -gnato)
   running time changes from 3.61 to 3.64 seconds

(Good news again.)

I suspect that the -mfpmath=3Dsse is the default on Intel. If you
are on an Intel processor, then you have the option of running
on the 387 stack by using -mfpmath=3D387. The compiler even tries
to use both sse and 387 if you use -mfpmath=3D387,sse, or so the
man pages say. (-mfpmath=3D387,sse isn't very good yet .. usually
slower.)

Finally, I always use the more portable:

   type Real is digits 15;

It will give you standard double precision without
much loss in speed as far as I can tell.
To get single precision:

   type Real is digits 6;

Its not much faster than the 15 digit version in the
present benchmark. My experience is that it is almost
always a bad idea to do much arithmetic (especially
something like eigenvalue calculations), in single
precision ... maybe at best useful for minimizing data
storage size if data is known to be inaccurate.)

I wasn't planning to do any more benchmarking than this,
but the 2009 vs 2010 results were so surprising and so
welcome that I took the time to do one more test.
You can find the code in the following public directory:

http://web.am.qub.ac.uk/users/j.parker/bench_depository/

You might also want to look at some other math
routines I keep at:

http://web.am.qub.ac.uk/users/j.parker/miscellany/

On this set of floating point routines, by the way,
here's the best I could come up with for maximum speed:

gnatmake xxx.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll-loops

I've never found anything much better .. I always start with
this set of switches (-gnatnp -O2 -march=3Dnative -ffast-math
-funroll-loops) and turn other switches on and off to see
if they help.

In the benchmark directory you can find some programs I've
either collected or written over the years. I'll compare
Ada and Fortran 90 versions of a jacobi iterative eigen-
decomposition for symmetric real matrices. The Ada version
has departed slightly from the fortran version over the years,
but the two are still very close ... do almost the same amount
of computation in the same way. Mostly I just want to
compare the 2009 and the 2010 GNAT GPL though - here is
a test on a 500 x 500 Moler's matrix:

   2010 GNAT GPL:
   gnatmake jacobi_eigen_bench_1 -gnatnp -O2 -march=3Dnative -ffast-math
-funroll-loops
   2.036 seconds

   2009 GNAT GPL:
   gnatmake jacobi_eigen_bench_1 -gnatnp -O2 -march=3Dnative -ffast-math
-funroll-loops
   2.676 seconds

Again, I found this so surprising that I copied over the
test routine  jacobi_eigen_tst_1.adb  from another directory,
just to make sure that the jacobi packages were actually
working. You may want to verify this. You'll need packages
matrix_sampler_3.ads and matrix_sampler_3.adb. For testing
I start with:

  gnatmake jacobi_eigen_tst_1.adb  -gnato -gnatVa -gnata

To make the Fortran comparison I used (1) jacobi_eigen.f90,
(2) a recent version (11.0) of the the Intel Fortran compiler,
and (3) a not very recent version of gfortran (based on
gcc 4.3.2):

   ifort -fast jacobi_eigen.f90

   gfortran jacobi_eigen.f90 -O3 -march=3Dnative -funroll-loops -ffast-
math

Results on a 500 x 500 matrix:

   2.038 seconds  (2010 GNAT GPL)
   2.012 seconds  (gfortran (based on gcc 4.3.2))
   1.824 seconds  (ifort -fast, version 11.0)

Actually, INTEL's ifort is usually 20% or more faster than
than the gcc's. It can do miraculous things with nested loops
but in this case I suspect that it may be doing some algebra on
the Jacobi inner loops and then factoring out some constants
(you get gcc speed if you suppress that kind of thing with
the ifort  -fp-model strict   switch).

J.