From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,7767a311e01e1cd
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local01.nntp.dca.giganews.com!nntp.comcast.com!news.comcast.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 21 Oct 2006 11:42:02 -0500
Date: Sat, 21 Oct 2006 12:35:54 -0400
From: Jeffrey Creem <jeff@thecreems.com>
User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: GNAT compiler switches and optimization
References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com>
 <9Qb_g.111857$aJ.65708@attbi_s21> <434o04-7g7.ln1@newserver.thecreems.com>
 <4539ce34$1_2@news.bluewin.ch>
In-Reply-To: <4539ce34$1_2@news.bluewin.ch>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <nrup04-5hj.ln1@newserver.thecreems.com>
NNTP-Posting-Host: 24.147.74.171
X-Trace: 
 sv3-cmU/4PlJci7w5ysyvpp127PnxIqzQ3coLKcWK+zEUkKTPtoB1Jrj4uPVhPoLY8S2+LMjIghID6Ynq9S!siXsl/qgiHr5Ry9ibSQ8+TNJEFmGK/1MhCdTUiFBTkCcoOANX5iXvTV0qv08Xabiy8591Z33lI9z!sLo=
X-Complaints-To: abuse@comcast.net
X-DMCA-Complaints-To: dmca@comcast.net
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint
 properly
X-Postfilter: 1.3.32
Xref: g2news2.google.com comp.lang.ada:7116
Date: 2006-10-21T12:35:54-04:00
List-Id: <comp.lang.ada>

Gautier wrote:
> Jeffrey Creem:
> 
>> Note, I am the first one to jump to the defense of "Ada" in general 
>> but in this case, GNAT just plain sucks compared to GNU FORTRAN as it 
>> does a poor job on (at least) the inner loop (verified by looking at 
>> the output assembly)
> 
> 
> There is something strange... Martin Krischik was able to trim the 
> overall time for the Ada code down to 24% of the first version (GNAT/GCC 
> 4.1.1).
> This should make the Ada program as fast as the FORTRAN one, shouldn't it ?
> Maybe it's because the test is done on a 64 bit machine ?
> It needs some reconciliation...
> A good thing in that discussion would be that everybody shows each time
> - which GCC version
> - which machine
> - the execution time of the multiplication for both Ada and Fortran
> - which version of the Ada code (matrix on stack/heap, Fortran or Ada 
> convention)
> 
> Cheers, Gautier
> ______________________________________________________________
> Ada programming -- http://www.mysunrise.ch/users/gdm/gsoft.htm
> 
> NB: For a direct answer, e-mail address on the Web site!

I'd certainly be willing to run a few benchmarks but the important thing 
here is that rather innocent looking code is running 2-4x slower than it 
  "should".

There are things that I think we can really rule out as being "the" factor.

1) Random number generator - I did timings (for both the Ada and 
FORTRAN) with timing moved to only cover matrix multiply.
2) Difference GCC versions - I built a fresh GCC from the GCC trunk for 
both Ada and FORTRAN
3) The Machine - I am running both on the same machine, though I suppose 
there could be differences in 32 bit v.s. 64 bit comparisons.
4) Runtime checks - both the original author (and I) ran with checks 
suppressed
5) O2/O3 - Actually, I could look at this some more with some other 
versions but a quick look when I first started seemed to indicate this 
was not the issue.

A few other thoughts.


Once the timing is limited to just the matrix multiply the stack/heap 
thing 'should' generally not matter.

Some of the changes made to the Ada version make it not really the same 
program as the FORTRAN version and the same changes made to the FORTRAN 
one would also cause it to speed up (e.g. not counting the the zeroing 
of the target array during the accumulation phase).

I have certainly seen some amazing performance from some Ada compiler 
sin the past and in general, on non-trivial benchmarks I am usually 
pretty happy with the output of GNAT as well but in this case it is not 
great.

Further, I tried playing a bit with the new autovectorization capability 
of the near 4.X series of GCC (has to be specifically enabled) and found 
that even very very trivial cases would refuse to vectorize under Ada 
(though after I submitted the bug report to GCC, I found that FORTRAN 
fails to vectorize these too).

One thing everyone needs to remember is that this example was (probably) 
not "Find the way to get the smallest value out of this test program" 
becuase there are always ways of doing some tweaks to a small enough 
region of code to make it better. If there is a 2-4x global slowdown in 
your 100KLOC program, you will never "get there" following the 
conventional wisdom of profiling and looking for the problems.

Now, I am not suggesting that GNAT is globally 2-4x slower than GFORTRAN 
or anything like that (since that does not line up with what I have 
generally seen on larger code bases), but, if I were a manager picking a 
new language based on a set of long term goals for a project and saw 
that GNAT was running 2-4x slower and was still runninging 1.X to 3X 
slower after 2 days of Ada guru's looking at it, I'd probably jettison 
Ada (I know, I am mixing compilers and languages here, but in reality, 
that is what happens in the real world) and go with something else.

And before the chorus of "processors are so fast that performance does 
not matter as much as safety and correctness" crowd starts getting too 
loud, let me point out that there are still many segments of the 
industry where performance does still indeed matter. Especially when one 
is trading adding a second processor to an embedded box against a vague 
promise of "betterness" in terms of safety down the road....Ok..Off the 
soapbox.

So, in closing, if someone thinks they have "the best" version of that 
program they want timed against gfortran, post it here and I'll run them.