From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,b49d3a703a4b4db5 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII Path: g2news1.google.com!postnews.google.com!q22g2000yqm.googlegroups.com!not-for-mail From: jonathan Newsgroups: comp.lang.ada Subject: Re: compiler settings in AdaGIDE Date: Sat, 24 Jul 2010 11:21:45 -0700 (PDT) Organization: http://groups.google.com Message-ID: References: NNTP-Posting-Host: 143.117.23.236 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1279995705 30802 127.0.0.1 (24 Jul 2010 18:21:45 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sat, 24 Jul 2010 18:21:45 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: q22g2000yqm.googlegroups.com; posting-host=143.117.23.236; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.19) Gecko/2010062510 Iceweasel/3.0.6 (Debian-3.0.6-3),gzip(gfe) Xref: g2news1.google.com comp.lang.ada:12532 Date: 2010-07-24T11:21:45-07:00 List-Id: On Jul 23, 10:52=A0am, Ada novice wrote: > Hi, > =A0 =A0 I'm using the AdaGIDE editor (version 7.45.2) together with the > GNAT AdaCore libre compiler (release 2010) on a Win XP machine. I > would like my codes to run as fast as possible and here is the content > of a typical gnat.ago file that I use. Please let me know what > improvements I can make in order for an Ada program to run in the > minimum amount of time. > > Thanks. > YC I wrote a quick benchmark using our favorite complex number eigen-routine, so now we can test some these things rather quickly. The benchmark does 100 calls to Complex_Eigenvalues.Eigen (P , W , V , FAIL); using 121 x 121 complex matrices. I compiled it with both the 2009 and 2010 GNAT GPL compilers and timed it on a fast Intel PC (Xeon X5460 @ 3.16GHz) with: time ./bench Let's look at the best results first, and compare the '09 and '10 GNAT GPL compilers: 2010 GNAT GPL: gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll- loops 3.61 seconds 2009 GNAT GPL: gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll- loops 4.57 seconds This came as a nice surprise to me! We can learn some more about the compiler switches by toggling them. I'll stick to the 2010 GNAT GPL from now on: change -O2 to -O3: (gnatmake bench.adb -gnatnp -O3 -march=3Dnative -ffast-math -funroll- loops) running time changes from 3.61 to 3.63 seconds remove -funroll-loops: (gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math) running time changes from 3.61 to 3.66 seconds remove -ffast-math: (gnatmake bench.adb -gnatnp -O2 -march=3Dnative) running time changes from 3.61 to 4.35 seconds The -ffast-math had an amazing affect. I've never seen that before ... maybe an interaction with complex number arithmetic. Now let's check -gnato: add -gnato: (gnatmake bench.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll- loops -gnato) running time changes from 3.61 to 3.64 seconds (Good news again.) I suspect that the -mfpmath=3Dsse is the default on Intel. If you are on an Intel processor, then you have the option of running on the 387 stack by using -mfpmath=3D387. The compiler even tries to use both sse and 387 if you use -mfpmath=3D387,sse, or so the man pages say. (-mfpmath=3D387,sse isn't very good yet .. usually slower.) Finally, I always use the more portable: type Real is digits 15; It will give you standard double precision without much loss in speed as far as I can tell. To get single precision: type Real is digits 6; Its not much faster than the 15 digit version in the present benchmark. My experience is that it is almost always a bad idea to do much arithmetic (especially something like eigenvalue calculations), in single precision ... maybe at best useful for minimizing data storage size if data is known to be inaccurate.) I wasn't planning to do any more benchmarking than this, but the 2009 vs 2010 results were so surprising and so welcome that I took the time to do one more test. You can find the code in the following public directory: http://web.am.qub.ac.uk/users/j.parker/bench_depository/ You might also want to look at some other math routines I keep at: http://web.am.qub.ac.uk/users/j.parker/miscellany/ On this set of floating point routines, by the way, here's the best I could come up with for maximum speed: gnatmake xxx.adb -gnatnp -O2 -march=3Dnative -ffast-math -funroll-loops I've never found anything much better .. I always start with this set of switches (-gnatnp -O2 -march=3Dnative -ffast-math -funroll-loops) and turn other switches on and off to see if they help. In the benchmark directory you can find some programs I've either collected or written over the years. I'll compare Ada and Fortran 90 versions of a jacobi iterative eigen- decomposition for symmetric real matrices. The Ada version has departed slightly from the fortran version over the years, but the two are still very close ... do almost the same amount of computation in the same way. Mostly I just want to compare the 2009 and the 2010 GNAT GPL though - here is a test on a 500 x 500 Moler's matrix: 2010 GNAT GPL: gnatmake jacobi_eigen_bench_1 -gnatnp -O2 -march=3Dnative -ffast-math -funroll-loops 2.036 seconds 2009 GNAT GPL: gnatmake jacobi_eigen_bench_1 -gnatnp -O2 -march=3Dnative -ffast-math -funroll-loops 2.676 seconds Again, I found this so surprising that I copied over the test routine jacobi_eigen_tst_1.adb from another directory, just to make sure that the jacobi packages were actually working. You may want to verify this. You'll need packages matrix_sampler_3.ads and matrix_sampler_3.adb. For testing I start with: gnatmake jacobi_eigen_tst_1.adb -gnato -gnatVa -gnata To make the Fortran comparison I used (1) jacobi_eigen.f90, (2) a recent version (11.0) of the the Intel Fortran compiler, and (3) a not very recent version of gfortran (based on gcc 4.3.2): ifort -fast jacobi_eigen.f90 gfortran jacobi_eigen.f90 -O3 -march=3Dnative -funroll-loops -ffast- math Results on a 500 x 500 matrix: 2.038 seconds (2010 GNAT GPL) 2.012 seconds (gfortran (based on gcc 4.3.2)) 1.824 seconds (ifort -fast, version 11.0) Actually, INTEL's ifort is usually 20% or more faster than than the gcc's. It can do miraculous things with nested loops but in this case I suspect that it may be doing some algebra on the Jacobi inner loops and then factoring out some constants (you get gcc speed if you suppress that kind of thing with the ifort -fp-model strict switch). J.