From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!postnews.google.com!z5g2000vba.googlegroups.com!not-for-mail
From: johnscpg@googlemail.com
Newsgroups: comp.lang.ada
Subject: benchmarking GPL
Date: Mon, 1 Jun 2009 04:38:24 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <222bea32-77df-49a3-9df3-b848e63daa68@z5g2000vba.googlegroups.com>
NNTP-Posting-Host: 143.117.23.126
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1243856305 15335 127.0.0.1 (1 Jun 2009 11:38:25
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 1 Jun 2009 11:38:25 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: z5g2000vba.googlegroups.com; posting-host=143.117.23.126;
	posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.0.9)
	Gecko/2009050519 Iceweasel/3.0.6 (Debian-3.0.6-1),gzip(gfe),gzip(gfe)
Xref: g2news2.google.com comp.lang.ada:6160
Date: 2009-06-01T04:38:24-07:00
List-Id: <comp.lang.ada>

Dear All,

In celebration of the arrival of the new GNAT GPL
(20090519) I decided to do some benchmarking
to see how the optimizer's coming along.
I was slightly more than rather pleased with
the results.

Results in gory detail are appended below.

I compared 6 (smallish) programs, written in
both Ada and Fortran. Four of them are in C
also. All of the routines can be found at:

http://web.am.qub.ac.uk/users/j.parker/bench_depository/

4 compilers are used:

   gcc 4.3.4,
   Intel Fortran ifort 11.0 (latest version),
   GNAT GPL (20090519), and
   gfortran (based on gcc version 4.3.2).

Operating system:  Debian Linux, Lenny.
Processor:  Intel x86-64 (Xeon X5460 @ 3.16GHz).

The Intel Fortran (ifort) is an aggressive
optimizing compiler, especially on numerical
linear algebra. Its use here is reassuring:
if our gcc-family results on numerical
calculations were suboptimal by a large factor,
ifort would likely let us know. It also
has the easiest optimization flags: its a
simple choice between -O3, -fast, -ipo
and a few other things that make almost no
difference.

Two of the 6 programs I wrote
myself: an FFT benchmark called fft1tst.adb,
and a jacobi eigendecomposition called
jacobi_eigen_bench_1.adb.  Both are
accompanied by near identical fortran
versions. This exhausted my entire supply of
inter-language benchmarking routines, so 4
of the test programs I downloaded from a
depository of small benchmarking routines:
 http://shootout.alioth.debian.org/gp4/
I downloaded C, Fortran, Ada versions of:
nsievebits, nbody, binarytree, mandelbrot.
I made small modifications to 2 of the Ada
programs. In one case I degraded the Ada code
to a slower, older version so that it was
identical to the C version I was comparing
with. In the other case I replaced a packed
boolean array with an array of unsigned
ints. These were the only changes to any of
the programs, so the exercise mostly amounted
to finding good optimization flags. I tried
to find good optimization flags for the C and
Fortran compilations too, and managed to speed
up a few of them after several attempts. The
original Ada test programs were written by
Pat Rogers and Pascal Obry.  (Thanks!)

Inter-language benchmarks shouldn't be taken too
seriously, but I learned a few useful things:
the -mfpmath=387 and -mfpmath=387,sse flags came
as real surprize to me. In several cases they
made all the difference. I also noticed that
GNAT seems to be be doing a better job of
optimizing operations on packed boolean arrays,
and a better job on some linear algebra
problems. Unless I'm mistaken, in the lin alg case
(jacobi_eigen below) the improvement over the
old days is almost a factor of 2. Thanks_gnat!

cheers,
jonathan


FFT1TST2:

  Compilation Commands:

     gnatmake  fft1tst2.adb -O3 -gnatNp -march=native
     gfortran  fft1tst2.f   -O3 -march=native  -o fft1tst2
     ifort     fft1tst2.f   -O3  -WB -xT       -o fft1tst2

  Execute:

     time ./fft1tst2

  Running Time (using 8192 data points):

     gnat:     2.748 seconds
     gfortran: 2.812 seconds
     ifort:    2.816 seconds

  Running Time (using 4096 data points):

     gnat:     1.000 seconds
     gfortran: 1.004 seconds
     ifort:    1.088 seconds

  Running Time (using 1024 data points):

     gnat:     0.168 seconds
     gfortran: 0.184 seconds
     ifort:    0.176 seconds

  Notes:

     The Ada and the Fortran77 FFT's were written over 17
     years ago. Both are Radix 4 fast fourier transforms.
     These versions were written for benchmarking rather
     than ultimate speed: the idea was to make the 2
     the same wherever possible, for language/compiler
     comparisons.

     IF I use -ffast-math in the GNAT compilation it runs
     exactly the same speed as the gfortran (very slightly
     slower), so I suspect -ffast-math is always used by
     gfortran.


JACOBI_EIGEN

  Compilation Commands:

     gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native -
ffast-math -funroll-loops -o eig
     gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll-
loops -o eig
     ifort    jacobi_eigen.f90 -O3 -ipo -static -o eig

  Execute:

     time ./eig

  Running Time (100x100 matrices (100 iterations)):

     gnat:     1.636 seconds
     gfortran: 1.720 seconds
     ifort:    1.440 seconds

  Running Time (1000x1000 matrices (1 iteration)):

     gnat:     23.7 seconds
     gfortran: 39.7 seconds
     ifort:    37.9 seconds

  Notes:

     Matrix size and no of iterations has to be
     typed in at the top of the 2 routines:
       jacobi_eigen.f90, jacobi_eigen_bench_1.adb.

     A year ago the GNAT executables for Jacobi
     were much slower than gfortran, and ifort.

     Don't know what happened in the 1000x1000
     case, but I am not displeased.
     Notice we are using the same compiler
     flags in the gfortran and GNAT cases.

     The number of arithmetical operations performed
     by these routines is exactly proportional to
        No_of_Rotations,
     which is output on completion. The difference
     between the Fortran No_of_Rotations and the Ada
     No_of_Rotations is under 3% here.


NBODY:

 Compilation Commands:

    gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll-
loops
                   -ftracer -freorder-blocks-and-partition -
mfpmath=387,sse
    gfortran nbody.f90  -O3 -march=native -funroll-all-loops -o nbody
    ifort    nbody.f90  -O3 -no-prec-div -o nbody
    gcc      nbody.c    -O3 -o nbody

  Execute:

    time ./nbody 24000000

  Running Time:

    gnat:     4.908 seconds
    gfortran: 5.602 seconds
    ifort:    4.660 seconds
    gcc:      4.472 seconds

  Notes:

    The obscure compilation flags
     (-ftracer -freorder-blocks-and-partition)
    are not needed if you write the inner loop of
    nbody_pck.Advance a bit differently.  I just
    wanted to use the original version of nbody.adb.
    nbody2.adb is more like the C version and has
    simpler optimization flags, but runs at same
    speed as nbody.adb.
    The gcc C is about 9% faster than nbody.adb -
    a small but interesting difference I don't
    understand.


NSIEVEBITS:

  Compilation Commands:

    gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops -
ftracer
    gfortran nsievebits.f90  -O3 -march=native -funroll-loops -o
nsievebits2
    ifort    nsievebits.f90  -O3 -ipo -static -o nsievebits2
    gcc      nsievebits.c    -O3 -march=native -funroll-loops -o
nsievebits2

  Execute:

    time ./nsievebits2 11

  Running Time:

    gnat:     0.320 seconds
    gfortran: 0.388 seconds
    ifort:    0.372 seconds
    gcc:      0.364 seconds

  Notes:

    nsievebits2.adb uses an array of unsigned
    ints to replace the packed boolean array in
    nsievebits.adb. Both methods could not be
    more legitimate in this exercize. GNAT has
    improved remarkably: the packed boolean array
    version (nsievebits.adb) is now competitive
    with the other languages.


MANDELBROT:

  Compilation Commands:

    gnatmake  mandelbrot.adb -O3 -gnatnp -march=native -ffast-math
                                 -funroll-loops -mfpmath=387
    gfortran  mandelbrot.f90 -O3 -march=native -funroll-loops -o
mandelbrot
    ifort     mandelbrot.f90 -O3 -ipo -static  -o mandelbrot
    gcc       mandelbrot.c   -O3 -march=native -ffast-math -funroll-
loops
                                 -mfpmath=387 -o mandelbrot

  Execute:

    time ./mandelbrot 3000

  Running Time:

    print-to-screen disabled:
     (present configuration.)
     (these are meaningful timings.)

      gnat:     0.980 seconds
      gfortran: 1.112 seconds
      ifort:    1.180 seconds
      gcc       0.960 seconds

    print-to-screen enabled
     (timings don't mean much here):

      gnat:     1.012 seconds
      gfortran: 1.582 seconds
      ifort:    1.376 seconds
      gcc       1.000 seconds

  Notes:

    print-to-screen was enabled only to verify that
    all 3 gave the same output.

    The Fortran uses the original complex number
    implementation of the mandelbrot inner loop.
    I modified the Ada version was to use exactly
    the same inner loop as the C version (even tho
    the modification slowed down the Ada version).
    In both the Ada and the C versions it was the
    -mfpmath=387 flag (which I assume disables sse)
    that did the trick speeding them up.


BINARYTREES:

  Compilation Commands:

     gnatmake  binarytrees.adb -O3 -gnatnp -march=native -ftracer
     gfortran  binarytrees.f90 -O3 -march=native   -o binarytrees
     ifort     binarytrees.f90 -fast -static       -o binarytrees
     gcc       binarytrees.c -O3 -march=native -lm -o binarytrees

  Execute:

     time ./binarytrees 16

  Running Time (fastest observed):

     gnat:     1.232 seconds
     gfortran: 1.084 seconds
     ifort:    1.676 seconds
     gcc       1.060 seconds

  Notes:

     Insensitive to optimization flags.