From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,7767a311e01e1cd
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news4.google.com!news.glorb.com!proxad.net!cleanfeed3-b.proxad.net!nnrp12-1.free.fr!not-for-mail
Sender: sam@willow.rfc1149.net
From: Samuel Tardieu <sam@rfc1149.net>
Newsgroups: comp.lang.ada
Subject: Re: GNAT compiler switches and optimization
References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com>
Date: 20 Oct 2006 14:09:45 +0200
Message-ID: <871wp3p4s6.fsf@willow.rfc1149.net>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Leafnode-NNTP-Posting-Host: 2001:6f8:37a:2::2
Organization: Guest of ProXad - France
NNTP-Posting-Date: 20 Oct 2006 14:10:01 MEST
NNTP-Posting-Host: 88.191.14.223
X-Trace: 1161346201 news-2.free.fr 29389 88.191.14.223:55384
X-Complaints-To: abuse@proxad.net
Xref: g2news2.google.com comp.lang.ada:7071
Date: 2006-10-20T14:10:01+02:00
List-Id: <comp.lang.ada>

>>>>> "tkrauss" == tkrauss  <thomas.krauss@gmail.com> writes:

tkrauss> Running them on 800x800 matrices (on my 2GHz laptop)

tkrauss> for Ada: "tst_array 800" runs in 18 seconds for Fortran
tkrauss> "tst_array 800" runs in 6 seconds

tkrauss> (if I use the fortran "matmul" intrinsic the fortran time
tkrauss> drops to 2.5 seconds)

tkrauss> Note, I tried reordering the loops, removing the random
tkrauss> calls, etc.  none of which made huge changes.  There is
tkrauss> something killing performance and/or a switch or two that I'm
tkrauss> missing, but I can't seem to find it.  Any thoughts?

First of all, what you measure is not only the matrix multiplication
time but also the operation of filling the matrices with random
numbers. I've moved the "start" initialization after the matrices
initialization.

The following optimizations make the difference smaller (9.47 seconds
for Fortran vs. 11.90 seconds for Ada on my machine):

  - use -fomit-frame-pointer on gnatmake command line (this doesn't
    change anything in the Fortran case)

  - add: pragma Convention (Fortran, Real_Matrix) to invert the
    storage method (line vs. column); I guess this helps maintaining
    more data in the cache

  - use 1 .. N as loop indices instead of A'Range (1) and friends;
    this is more equivalent to the Fortran code you posted

Still, this is a huge penaly for Ada. Unfortunately, I don't have the
time to investigate further right now. However, I would be interested
in other people findings.

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/