From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: *
X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_00,INVALID_MSGID,
	TO_NO_BRKTS_PCNT autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,cf34599caf2fa938,start
X-Google-Attributes: gid103376,public
From: sands@clipper.ens.fr (Duncan Sands)
Subject: GNAT function calling overhead
Date: 1995/04/06
Message-ID: <3m0nv1$pv2@nef.ens.fr>#1/1
X-Deja-AN: 100939351
distribution: world
organization: Ecole Normale Superieure, PARIS, France
newsgroups: comp.lang.ada
Date: 1995-04-06T00:00:00+00:00
List-Id: <comp.lang.ada>

system: DOS 486 (with math coprocessor), gcc 2.6.3, GNAT 2.03

Essentially the question is: why so much function calling overhead
in GNAT?

I'm writing a set of matrix factorization routines (Schur etc) so
of course I need routines for multiplying matrices etc.
For me a matrix is
   type Matrix is array(Positive range <>, Positive range <>) of Float;

I overloaded "*" for matrix multiplication:
   function  "*"(Left : in Matrix; Right : in Matrix) return Matrix;

Multiplying two 15 by 15 matrices 10_000 times using this function
takes about 55 seconds on my machine.  The algorithm is the obvious
one: loop over rows and columns, add up the appropriate products and
assign them.

I then "inlined" this function: rather than using "*", I put the code
for "*" directly into my 10_000 times loop, of course renaming Left
and Right to the names of my matrices, and assigning directly to the
matrix which is to hold the answer.  In this way I eliminated the
function calling overhead.  Using this method, multiplying two 15 by
15 matrices 10_000 times takes about 44 seconds.

All this was done with optimisation (-O3) and -gnatp (i.e. no range
checking etc).

In summary: 55 seconds with function calling overhead.
            44 seconds without function calling overhead.

Now, a 15 by 15 matrix means 225 entries.  225 entries at,
say, 8 bytes an entry makes a bit less than 2k.  So, depending on
whether GNAT takes function parameters by reference or by copy,
this makes anything between 2k and, say, 10k bytes to be copied
on each function call.

Does this explain the time difference?  It shouldn't!  The amount
of time spent copying memory should be completely overwhelmed by
the amount of time taken to do the floating point operations!
That is, for each of the 225 entries there are 15 floating point
multiplications to be performed.  The amount of time taken to
copy the 225 entries, even if you need to copy them several times,
should be MUCH smaller than the amount of time spent in the
calculation.  But the timings above indicate that function
calling overhead makes up something like 25% of the time taken!

So, the question is: why so much function calling overhead in GNAT?

Can anyone please enlighten me?

Thanks a lot,

Duncan Sands.

PS: The corresponding C code takes about 6 seconds.  This surprises
me too.