benchmarking GPL

comp.lang.ada
 help / color / mirror / Atom feed

* benchmarking GPL
@ 2009-06-01 11:38 johnscpg
  0 siblings, 0 replies; only message in thread
From: johnscpg @ 2009-06-01 11:38 UTC (permalink / raw)


Dear All,

In celebration of the arrival of the new GNAT GPL
(20090519) I decided to do some benchmarking
to see how the optimizer's coming along.
I was slightly more than rather pleased with
the results.

Results in gory detail are appended below.

I compared 6 (smallish) programs, written in
both Ada and Fortran. Four of them are in C
also. All of the routines can be found at:

http://web.am.qub.ac.uk/users/j.parker/bench_depository/

4 compilers are used:

   gcc 4.3.4,
   Intel Fortran ifort 11.0 (latest version),
   GNAT GPL (20090519), and
   gfortran (based on gcc version 4.3.2).

Operating system:  Debian Linux, Lenny.
Processor:  Intel x86-64 (Xeon X5460 @ 3.16GHz).

The Intel Fortran (ifort) is an aggressive
optimizing compiler, especially on numerical
linear algebra. Its use here is reassuring:
if our gcc-family results on numerical
calculations were suboptimal by a large factor,
ifort would likely let us know. It also
has the easiest optimization flags: its a
simple choice between -O3, -fast, -ipo
and a few other things that make almost no
difference.

Two of the 6 programs I wrote
myself: an FFT benchmark called fft1tst.adb,
and a jacobi eigendecomposition called
jacobi_eigen_bench_1.adb.  Both are
accompanied by near identical fortran
versions. This exhausted my entire supply of
inter-language benchmarking routines, so 4
of the test programs I downloaded from a
depository of small benchmarking routines:
 http://shootout.alioth.debian.org/gp4/
I downloaded C, Fortran, Ada versions of:
nsievebits, nbody, binarytree, mandelbrot.
I made small modifications to 2 of the Ada
programs. In one case I degraded the Ada code
to a slower, older version so that it was
identical to the C version I was comparing
with. In the other case I replaced a packed
boolean array with an array of unsigned
ints. These were the only changes to any of
the programs, so the exercise mostly amounted
to finding good optimization flags. I tried
to find good optimization flags for the C and
Fortran compilations too, and managed to speed
up a few of them after several attempts. The
original Ada test programs were written by
Pat Rogers and Pascal Obry.  (Thanks!)

Inter-language benchmarks shouldn't be taken too
seriously, but I learned a few useful things:
the -mfpmath=387 and -mfpmath=387,sse flags came
as real surprize to me. In several cases they
made all the difference. I also noticed that
GNAT seems to be be doing a better job of
optimizing operations on packed boolean arrays,
and a better job on some linear algebra
problems. Unless I'm mistaken, in the lin alg case
(jacobi_eigen below) the improvement over the
old days is almost a factor of 2. Thanks_gnat!

cheers,
jonathan


FFT1TST2:

  Compilation Commands:

     gnatmake  fft1tst2.adb -O3 -gnatNp -march=native
     gfortran  fft1tst2.f   -O3 -march=native  -o fft1tst2
     ifort     fft1tst2.f   -O3  -WB -xT       -o fft1tst2

  Execute:

     time ./fft1tst2

  Running Time (using 8192 data points):

     gnat:     2.748 seconds
     gfortran: 2.812 seconds
     ifort:    2.816 seconds

  Running Time (using 4096 data points):

     gnat:     1.000 seconds
     gfortran: 1.004 seconds
     ifort:    1.088 seconds

  Running Time (using 1024 data points):

     gnat:     0.168 seconds
     gfortran: 0.184 seconds
     ifort:    0.176 seconds

  Notes:

     The Ada and the Fortran77 FFT's were written over 17
     years ago. Both are Radix 4 fast fourier transforms.
     These versions were written for benchmarking rather
     than ultimate speed: the idea was to make the 2
     the same wherever possible, for language/compiler
     comparisons.

     IF I use -ffast-math in the GNAT compilation it runs
     exactly the same speed as the gfortran (very slightly
     slower), so I suspect -ffast-math is always used by
     gfortran.


JACOBI_EIGEN

  Compilation Commands:

     gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native -
ffast-math -funroll-loops -o eig
     gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll-
loops -o eig
     ifort    jacobi_eigen.f90 -O3 -ipo -static -o eig

  Execute:

     time ./eig

  Running Time (100x100 matrices (100 iterations)):

     gnat:     1.636 seconds
     gfortran: 1.720 seconds
     ifort:    1.440 seconds

  Running Time (1000x1000 matrices (1 iteration)):

     gnat:     23.7 seconds
     gfortran: 39.7 seconds
     ifort:    37.9 seconds

  Notes:

     Matrix size and no of iterations has to be
     typed in at the top of the 2 routines:
       jacobi_eigen.f90, jacobi_eigen_bench_1.adb.

     A year ago the GNAT executables for Jacobi
     were much slower than gfortran, and ifort.

     Don't know what happened in the 1000x1000
     case, but I am not displeased.
     Notice we are using the same compiler
     flags in the gfortran and GNAT cases.

     The number of arithmetical operations performed
     by these routines is exactly proportional to
        No_of_Rotations,
     which is output on completion. The difference
     between the Fortran No_of_Rotations and the Ada
     No_of_Rotations is under 3% here.


NBODY:

 Compilation Commands:

    gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll-
loops
                   -ftracer -freorder-blocks-and-partition -
mfpmath=387,sse
    gfortran nbody.f90  -O3 -march=native -funroll-all-loops -o nbody
    ifort    nbody.f90  -O3 -no-prec-div -o nbody
    gcc      nbody.c    -O3 -o nbody

  Execute:

    time ./nbody 24000000

  Running Time:

    gnat:     4.908 seconds
    gfortran: 5.602 seconds
    ifort:    4.660 seconds
    gcc:      4.472 seconds

  Notes:

    The obscure compilation flags
     (-ftracer -freorder-blocks-and-partition)
    are not needed if you write the inner loop of
    nbody_pck.Advance a bit differently.  I just
    wanted to use the original version of nbody.adb.
    nbody2.adb is more like the C version and has
    simpler optimization flags, but runs at same
    speed as nbody.adb.
    The gcc C is about 9% faster than nbody.adb -
    a small but interesting difference I don't
    understand.


NSIEVEBITS:

  Compilation Commands:

    gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops -
ftracer
    gfortran nsievebits.f90  -O3 -march=native -funroll-loops -o
nsievebits2
    ifort    nsievebits.f90  -O3 -ipo -static -o nsievebits2
    gcc      nsievebits.c    -O3 -march=native -funroll-loops -o
nsievebits2

  Execute:

    time ./nsievebits2 11

  Running Time:

    gnat:     0.320 seconds
    gfortran: 0.388 seconds
    ifort:    0.372 seconds
    gcc:      0.364 seconds

  Notes:

    nsievebits2.adb uses an array of unsigned
    ints to replace the packed boolean array in
    nsievebits.adb. Both methods could not be
    more legitimate in this exercize. GNAT has
    improved remarkably: the packed boolean array
    version (nsievebits.adb) is now competitive
    with the other languages.


MANDELBROT:

  Compilation Commands:

    gnatmake  mandelbrot.adb -O3 -gnatnp -march=native -ffast-math
                                 -funroll-loops -mfpmath=387
    gfortran  mandelbrot.f90 -O3 -march=native -funroll-loops -o
mandelbrot
    ifort     mandelbrot.f90 -O3 -ipo -static  -o mandelbrot
    gcc       mandelbrot.c   -O3 -march=native -ffast-math -funroll-
loops
                                 -mfpmath=387 -o mandelbrot

  Execute:

    time ./mandelbrot 3000

  Running Time:

    print-to-screen disabled:
     (present configuration.)
     (these are meaningful timings.)

      gnat:     0.980 seconds
      gfortran: 1.112 seconds
      ifort:    1.180 seconds
      gcc       0.960 seconds

    print-to-screen enabled
     (timings don't mean much here):

      gnat:     1.012 seconds
      gfortran: 1.582 seconds
      ifort:    1.376 seconds
      gcc       1.000 seconds

  Notes:

    print-to-screen was enabled only to verify that
    all 3 gave the same output.

    The Fortran uses the original complex number
    implementation of the mandelbrot inner loop.
    I modified the Ada version was to use exactly
    the same inner loop as the C version (even tho
    the modification slowed down the Ada version).
    In both the Ada and the C versions it was the
    -mfpmath=387 flag (which I assume disables sse)
    that did the trick speeding them up.


BINARYTREES:

  Compilation Commands:

     gnatmake  binarytrees.adb -O3 -gnatnp -march=native -ftracer
     gfortran  binarytrees.f90 -O3 -march=native   -o binarytrees
     ifort     binarytrees.f90 -fast -static       -o binarytrees
     gcc       binarytrees.c -O3 -march=native -lm -o binarytrees

  Execute:

     time ./binarytrees 16

  Running Time (fastest observed):

     gnat:     1.232 seconds
     gfortran: 1.084 seconds
     ifort:    1.676 seconds
     gcc       1.060 seconds

  Notes:

     Insensitive to optimization flags.




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-06-01 11:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-01 11:38 benchmarking GPL johnscpg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox