From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!postnews.google.com!z5g2000vba.googlegroups.com!not-for-mail From: johnscpg@googlemail.com Newsgroups: comp.lang.ada Subject: benchmarking GPL Date: Mon, 1 Jun 2009 04:38:24 -0700 (PDT) Organization: http://groups.google.com Message-ID: <222bea32-77df-49a3-9df3-b848e63daa68@z5g2000vba.googlegroups.com> NNTP-Posting-Host: 143.117.23.126 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: posting.google.com 1243856305 15335 127.0.0.1 (1 Jun 2009 11:38:25 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Mon, 1 Jun 2009 11:38:25 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: z5g2000vba.googlegroups.com; posting-host=143.117.23.126; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.0.9) Gecko/2009050519 Iceweasel/3.0.6 (Debian-3.0.6-1),gzip(gfe),gzip(gfe) Xref: g2news2.google.com comp.lang.ada:6160 Date: 2009-06-01T04:38:24-07:00 List-Id: Dear All, In celebration of the arrival of the new GNAT GPL (20090519) I decided to do some benchmarking to see how the optimizer's coming along. I was slightly more than rather pleased with the results. Results in gory detail are appended below. I compared 6 (smallish) programs, written in both Ada and Fortran. Four of them are in C also. All of the routines can be found at: http://web.am.qub.ac.uk/users/j.parker/bench_depository/ 4 compilers are used: gcc 4.3.4, Intel Fortran ifort 11.0 (latest version), GNAT GPL (20090519), and gfortran (based on gcc version 4.3.2). Operating system: Debian Linux, Lenny. Processor: Intel x86-64 (Xeon X5460 @ 3.16GHz). The Intel Fortran (ifort) is an aggressive optimizing compiler, especially on numerical linear algebra. Its use here is reassuring: if our gcc-family results on numerical calculations were suboptimal by a large factor, ifort would likely let us know. It also has the easiest optimization flags: its a simple choice between -O3, -fast, -ipo and a few other things that make almost no difference. Two of the 6 programs I wrote myself: an FFT benchmark called fft1tst.adb, and a jacobi eigendecomposition called jacobi_eigen_bench_1.adb. Both are accompanied by near identical fortran versions. This exhausted my entire supply of inter-language benchmarking routines, so 4 of the test programs I downloaded from a depository of small benchmarking routines: http://shootout.alioth.debian.org/gp4/ I downloaded C, Fortran, Ada versions of: nsievebits, nbody, binarytree, mandelbrot. I made small modifications to 2 of the Ada programs. In one case I degraded the Ada code to a slower, older version so that it was identical to the C version I was comparing with. In the other case I replaced a packed boolean array with an array of unsigned ints. These were the only changes to any of the programs, so the exercise mostly amounted to finding good optimization flags. I tried to find good optimization flags for the C and Fortran compilations too, and managed to speed up a few of them after several attempts. The original Ada test programs were written by Pat Rogers and Pascal Obry. (Thanks!) Inter-language benchmarks shouldn't be taken too seriously, but I learned a few useful things: the -mfpmath=387 and -mfpmath=387,sse flags came as real surprize to me. In several cases they made all the difference. I also noticed that GNAT seems to be be doing a better job of optimizing operations on packed boolean arrays, and a better job on some linear algebra problems. Unless I'm mistaken, in the lin alg case (jacobi_eigen below) the improvement over the old days is almost a factor of 2. Thanks_gnat! cheers, jonathan FFT1TST2: Compilation Commands: gnatmake fft1tst2.adb -O3 -gnatNp -march=native gfortran fft1tst2.f -O3 -march=native -o fft1tst2 ifort fft1tst2.f -O3 -WB -xT -o fft1tst2 Execute: time ./fft1tst2 Running Time (using 8192 data points): gnat: 2.748 seconds gfortran: 2.812 seconds ifort: 2.816 seconds Running Time (using 4096 data points): gnat: 1.000 seconds gfortran: 1.004 seconds ifort: 1.088 seconds Running Time (using 1024 data points): gnat: 0.168 seconds gfortran: 0.184 seconds ifort: 0.176 seconds Notes: The Ada and the Fortran77 FFT's were written over 17 years ago. Both are Radix 4 fast fourier transforms. These versions were written for benchmarking rather than ultimate speed: the idea was to make the 2 the same wherever possible, for language/compiler comparisons. IF I use -ffast-math in the GNAT compilation it runs exactly the same speed as the gfortran (very slightly slower), so I suspect -ffast-math is always used by gfortran. JACOBI_EIGEN Compilation Commands: gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native - ffast-math -funroll-loops -o eig gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll- loops -o eig ifort jacobi_eigen.f90 -O3 -ipo -static -o eig Execute: time ./eig Running Time (100x100 matrices (100 iterations)): gnat: 1.636 seconds gfortran: 1.720 seconds ifort: 1.440 seconds Running Time (1000x1000 matrices (1 iteration)): gnat: 23.7 seconds gfortran: 39.7 seconds ifort: 37.9 seconds Notes: Matrix size and no of iterations has to be typed in at the top of the 2 routines: jacobi_eigen.f90, jacobi_eigen_bench_1.adb. A year ago the GNAT executables for Jacobi were much slower than gfortran, and ifort. Don't know what happened in the 1000x1000 case, but I am not displeased. Notice we are using the same compiler flags in the gfortran and GNAT cases. The number of arithmetical operations performed by these routines is exactly proportional to No_of_Rotations, which is output on completion. The difference between the Fortran No_of_Rotations and the Ada No_of_Rotations is under 3% here. NBODY: Compilation Commands: gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll- loops -ftracer -freorder-blocks-and-partition - mfpmath=387,sse gfortran nbody.f90 -O3 -march=native -funroll-all-loops -o nbody ifort nbody.f90 -O3 -no-prec-div -o nbody gcc nbody.c -O3 -o nbody Execute: time ./nbody 24000000 Running Time: gnat: 4.908 seconds gfortran: 5.602 seconds ifort: 4.660 seconds gcc: 4.472 seconds Notes: The obscure compilation flags (-ftracer -freorder-blocks-and-partition) are not needed if you write the inner loop of nbody_pck.Advance a bit differently. I just wanted to use the original version of nbody.adb. nbody2.adb is more like the C version and has simpler optimization flags, but runs at same speed as nbody.adb. The gcc C is about 9% faster than nbody.adb - a small but interesting difference I don't understand. NSIEVEBITS: Compilation Commands: gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops - ftracer gfortran nsievebits.f90 -O3 -march=native -funroll-loops -o nsievebits2 ifort nsievebits.f90 -O3 -ipo -static -o nsievebits2 gcc nsievebits.c -O3 -march=native -funroll-loops -o nsievebits2 Execute: time ./nsievebits2 11 Running Time: gnat: 0.320 seconds gfortran: 0.388 seconds ifort: 0.372 seconds gcc: 0.364 seconds Notes: nsievebits2.adb uses an array of unsigned ints to replace the packed boolean array in nsievebits.adb. Both methods could not be more legitimate in this exercize. GNAT has improved remarkably: the packed boolean array version (nsievebits.adb) is now competitive with the other languages. MANDELBROT: Compilation Commands: gnatmake mandelbrot.adb -O3 -gnatnp -march=native -ffast-math -funroll-loops -mfpmath=387 gfortran mandelbrot.f90 -O3 -march=native -funroll-loops -o mandelbrot ifort mandelbrot.f90 -O3 -ipo -static -o mandelbrot gcc mandelbrot.c -O3 -march=native -ffast-math -funroll- loops -mfpmath=387 -o mandelbrot Execute: time ./mandelbrot 3000 Running Time: print-to-screen disabled: (present configuration.) (these are meaningful timings.) gnat: 0.980 seconds gfortran: 1.112 seconds ifort: 1.180 seconds gcc 0.960 seconds print-to-screen enabled (timings don't mean much here): gnat: 1.012 seconds gfortran: 1.582 seconds ifort: 1.376 seconds gcc 1.000 seconds Notes: print-to-screen was enabled only to verify that all 3 gave the same output. The Fortran uses the original complex number implementation of the mandelbrot inner loop. I modified the Ada version was to use exactly the same inner loop as the C version (even tho the modification slowed down the Ada version). In both the Ada and the C versions it was the -mfpmath=387 flag (which I assume disables sse) that did the trick speeding them up. BINARYTREES: Compilation Commands: gnatmake binarytrees.adb -O3 -gnatnp -march=native -ftracer gfortran binarytrees.f90 -O3 -march=native -o binarytrees ifort binarytrees.f90 -fast -static -o binarytrees gcc binarytrees.c -O3 -march=native -lm -o binarytrees Execute: time ./binarytrees 16 Running Time (fastest observed): gnat: 1.232 seconds gfortran: 1.084 seconds ifort: 1.676 seconds gcc 1.060 seconds Notes: Insensitive to optimization flags.