GNAT function calling overhead

comp.lang.ada
 help / color / mirror / Atom feed

* GNAT function calling overhead
@ 1995-04-06  0:00 Duncan Sands
  1995-04-06  0:00 ` Norman H. Cohen
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Duncan Sands @ 1995-04-06  0:00 UTC (permalink / raw)


system: DOS 486 (with math coprocessor), gcc 2.6.3, GNAT 2.03

Essentially the question is: why so much function calling overhead
in GNAT?

I'm writing a set of matrix factorization routines (Schur etc) so
of course I need routines for multiplying matrices etc.
For me a matrix is
   type Matrix is array(Positive range <>, Positive range <>) of Float;

I overloaded "*" for matrix multiplication:
   function  "*"(Left : in Matrix; Right : in Matrix) return Matrix;

Multiplying two 15 by 15 matrices 10_000 times using this function
takes about 55 seconds on my machine.  The algorithm is the obvious
one: loop over rows and columns, add up the appropriate products and
assign them.

I then "inlined" this function: rather than using "*", I put the code
for "*" directly into my 10_000 times loop, of course renaming Left
and Right to the names of my matrices, and assigning directly to the
matrix which is to hold the answer.  In this way I eliminated the
function calling overhead.  Using this method, multiplying two 15 by
15 matrices 10_000 times takes about 44 seconds.

All this was done with optimisation (-O3) and -gnatp (i.e. no range
checking etc).

In summary: 55 seconds with function calling overhead.
            44 seconds without function calling overhead.

Now, a 15 by 15 matrix means 225 entries.  225 entries at,
say, 8 bytes an entry makes a bit less than 2k.  So, depending on
whether GNAT takes function parameters by reference or by copy,
this makes anything between 2k and, say, 10k bytes to be copied
on each function call.

Does this explain the time difference?  It shouldn't!  The amount
of time spent copying memory should be completely overwhelmed by
the amount of time taken to do the floating point operations!
That is, for each of the 225 entries there are 15 floating point
multiplications to be performed.  The amount of time taken to
copy the 225 entries, even if you need to copy them several times,
should be MUCH smaller than the amount of time spent in the
calculation.  But the timings above indicate that function
calling overhead makes up something like 25% of the time taken!

So, the question is: why so much function calling overhead in GNAT?

Can anyone please enlighten me?

Thanks a lot,

Duncan Sands.

PS: The corresponding C code takes about 6 seconds.  This surprises
me too.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 GNAT function calling overhead Duncan Sands
@ 1995-04-06  0:00 ` Norman H. Cohen
  1995-04-06  0:00 ` Colin James III
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Norman H. Cohen @ 1995-04-06  0:00 UTC (permalink / raw)


In article <3m0nv1$pv2@nef.ens.fr>, sands@clipper.ens.fr (Duncan Sands) writes: 
|> system: DOS 486 (with math coprocessor), gcc 2.6.3, GNAT 2.03
|>
|> Essentially the question is: why so much function calling overhead
|> in GNAT?
|>
|> I'm writing a set of matrix factorization routines (Schur etc) so
|> of course I need routines for multiplying matrices etc.
|> For me a matrix is
|>    type Matrix is array(Positive range <>, Positive range <>) of Float;
|>
|> I overloaded "*" for matrix multiplication: 
|>    function  "*"(Left : in Matrix; Right : in Matrix) return Matrix;

Functions returning results of an unconstrained array type are
notoriously expensive.  Because the compiler cannot determine the size of
the result before hand, it cannot leave space for it in an ordinary stack
frame, so the result must be put somewhere else and then copied to its
final destination after the function returns.  Unless your compiler is
clever enough to realize that your local variable inside the function,
say Result, is the only variable used in a return statement, it will
probably copy twice: from Result to "somewhere else" and from "somewhere
else" to the variable in the calling subprogram to which you assign the
function result.

Here are some other experiments you could try: 

   1. Use a procedure

        procedure Multiply (Left, Right: in Matrix; Product: out Matrix);

      instead of a function.  (This makes the caller responsible for
      knowing the size of the result and declaring an object with those
      dimensions to be passed as the third actual parameter.)

   2. (Poor SW engineering, but a worthwhile experiment:)  Restrict your
      function to work with a CONSTRAINED array subtype: 

         subtype Matrix_15 is Matrix (1 .. 15, 1 .. 15);
         function  "*"
            (Left : in Matrix_15; Right : in Matrix_15) return Matrix_15;

|> Multiplying two 15 by 15 matrices 10_000 times using this function
|> takes about 55 seconds on my machine.  The algorithm is the obvious
|> one: loop over rows and columns, add up the appropriate products and
|> assign them.
|>
|> I then "inlined" this function: rather than using "*", I put the code
|> for "*" directly into my 10_000 times loop, of course renaming Left
|> and Right to the names of my matrices, and assigning directly to the
|> matrix which is to hold the answer.  In this way I eliminated the
|> function calling overhead.  Using this method, multiplying two 15 by
|> 15 matrices 10_000 times takes about 44 seconds.

If you wrote

    for I in Left'Range(1) loop
       for J in Right'Range(2) loop
          for K in Left'Range(2) loop
             Ultimate_Target(I,J) :=
                Ultimate_Target(I,J) + Left(I,K) * Right(K,J);
          end loop;
       end loop;
    end loop;

rather than

    for I in Left'Range(1) loop
       for J in Right'Range(2) loop
          for K in Left'Range(2) loop
             Result(I,J) := Result(I,J) + Left(I,K) * Right(K,J);
          end loop;
       end loop;
    end loop;
    Ultimate_Target := Result;

(as seems sensible) then you eliminated more than the function-call
overhead:  You also eliminated the overhead that was originally
associated with the ":=" in

   Ultimate_Result := Left * Right;

|> All this was done with optimisation (-O3) and -gnatp (i.e. no range
|> checking etc).
|>
|> In summary: 55 seconds with function calling overhead.
|>             44 seconds without function calling overhead.
|>
|> Now, a 15 by 15 matrix means 225 entries.  225 entries at,
|> say, 8 bytes an entry makes a bit less than 2k.  So, depending on
|> whether GNAT takes function parameters by reference or by copy,
|> this makes anything between 2k and, say, 10k bytes to be copied
|> on each function call.
|>
|> Does this explain the time difference?  It shouldn't!  The amount
|> of time spent copying memory should be completely overwhelmed by
|> the amount of time taken to do the floating point operations!

As modern processors have become faster and faster, loads and stores have
become the bottleneck in many computations.  I don't know the details of
timings on the 486, but on the PowerPC architecture, once you fill your
floating-point pipeline with multiply-adds the way that the inner loop
above does, you get one righthand side expression completely evaluated on
each cycle, PROVIDED that you can get your operands into floating-point
registers fast enough.  ("Floating-point registers?" ask the Intel users,
"What are floating-point registers?") Accounting for leading-edge and
trailing-edge effects, the fifteen iterations of the inner loop could
take on the order of 20-25 cycles.  In contrast, a single load of a value
not in cache could cost you on the order of 10 cycles.  (Once you pay
that penalty, an entire cache line is read in, which, assuming row-major
order, buys you something as you traverse a row of Left, but not as you
traverse a column of Right.)

If you get unlucky, you find that parts of different matrices you are
touching at about the same time (say row I of Result and Row I of Left),
or different parts of the same matrix that you are touching at about the
same time (say Right(K,J) and Right(K+1,J)) are mapped to the same cache
line.  Then, unless you have a highly associative cache, you encounter an
inordinate number of cache misses and slow your computation down
dramatically.

Memory latencies play such an important role in the running time of
numerical algorithms that professional matrix multipliers almost never
use the familiar "for i/for j/for k" loops that I wrote above.  We are
more likely to see something like

   for K in Left'Range(2) loop
      for I in Left'Range(1) loop
         for J in Right'Range(2) loop
            Result(I,J) := Result(I,J) + Left(I,K) * Right(K,J);
         end loop;
      end loop;
   end loop;

or, better yet, if it is convenient to keep the matrices that will be
used as left operands in transposed form,

   Result(I,J) := Result(I,J) + Transpose_Of_Left(K,I) * Right(K,J);

which (again assuming row-major ordering of array components) does a much
better job of reusing the contents of cache lines once they have been
loaded from memory and the contents of registers once they have been
loaded from cache.  It is because these effects are so powerful that
Fortran preprocessors performing these kinds of transformations are able
to increase certain SPECfp benchmark scores by orders of magnitude.

|> That is, for each of the 225 entries there are 15 floating point
|> multiplications to be performed.  The amount of time taken to
|> copy the 225 entries, even if you need to copy them several times,
|> should be MUCH smaller than the amount of time spent in the
|> calculation.  But the timings above indicate that function
|> calling overhead makes up something like 25% of the time taken!

Well, 20% (11/55), but in any event I'm not surprised.  Adding and
multiplying floating-point numbers is the easy part.  It's copying them
that can slow you down.

--
Norman H. Cohen    ncohen@watson.ibm.com




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 GNAT function calling overhead Duncan Sands
  1995-04-06  0:00 ` Norman H. Cohen
@ 1995-04-06  0:00 ` Colin James III
  1995-04-06  0:00   ` Robb Nebbe
                     ` (4 more replies)
  1995-04-07  0:00 ` Robert Dewar
                   ` (2 subsequent siblings)
  4 siblings, 5 replies; 20+ messages in thread
From: Colin James III @ 1995-04-06  0:00 UTC (permalink / raw)


In article <3m0nv1$pv2@nef.ens.fr>, Duncan Sands <sands@clipper.ens.fr> wrote:
>system: DOS 486 (with math coprocessor), gcc 2.6.3, GNAT 2.03
>
>Essentially the question is: why so much function calling overhead
>in GNAT?
>
>I'm writing a set of matrix factorization routines (Schur etc) so
>of course I need routines for multiplying matrices etc.
>For me a matrix is
>   type Matrix is array(Positive range <>, Positive range <>) of Float;
>
>I overloaded "*" for matrix multiplication:
>   function  "*"(Left : in Matrix; Right : in Matrix) return Matrix;
>
>Multiplying two 15 by 15 matrices 10_000 times using this function
>takes about 55 seconds on my machine.  The algorithm is the obvious
>one: loop over rows and columns, add up the appropriate products and
>assign them.
>
>I then "inlined" this function: rather than using "*", I put the code
>for "*" directly into my 10_000 times loop, of course renaming Left
>and Right to the names of my matrices, and assigning directly to the
>matrix which is to hold the answer.  In this way I eliminated the
>function calling overhead.  Using this method, multiplying two 15 by
>15 matrices 10_000 times takes about 44 seconds.
>
>All this was done with optimisation (-O3) and -gnatp (i.e. no range
>checking etc).
>
>In summary: 55 seconds with function calling overhead.
>            44 seconds without function calling overhead.
>
>Now, a 15 by 15 matrix means 225 entries.  225 entries at,
>say, 8 bytes an entry makes a bit less than 2k.  So, depending on
>whether GNAT takes function parameters by reference or by copy,
>this makes anything between 2k and, say, 10k bytes to be copied
>on each function call.
>
>Does this explain the time difference?  It shouldn't!  The amount
>of time spent copying memory should be completely overwhelmed by
>the amount of time taken to do the floating point operations!
>That is, for each of the 225 entries there are 15 floating point
>multiplications to be performed.  The amount of time taken to
>copy the 225 entries, even if you need to copy them several times,
>should be MUCH smaller than the amount of time spent in the
>calculation.  But the timings above indicate that function
>calling overhead makes up something like 25% of the time taken!
>
>So, the question is: why so much function calling overhead in GNAT?
>
>Can anyone please enlighten me?
>
>Thanks a lot,
>
>Duncan Sands.
>
>PS: The corresponding C code takes about 6 seconds.  This surprises
>me too.

At the most abstract level, it's because GNAT is a failed government 
project which was never finished and was mismanaged from the start by a 
bunch of flaky educators posing as "capable professionals".

At the most detailed level, it's because GNAT emits poorly optimized, and 
hence very evil, C code.

And what makes anyone think that ACT will change anything with regard to 
GNAT support, documentation or enhancements.  The ACT principals have 
already demonstrated that they failed with GNAT, by even starting ACT.
In other words, if GNAT were such a smashing success and quality product, 
then there would be no need for ACT.  

Good grief, what moral and intellectual dishonesty !




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 ` Colin James III
@ 1995-04-06  0:00   ` Robb Nebbe
  1995-04-07  0:00     ` Robert Dewar
  1995-04-07  0:00     ` Duncan Sands
  1995-04-06  0:00   ` Samuel Tardieu
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Robb Nebbe @ 1995-04-06  0:00 UTC (permalink / raw)


In article <3m0nv1$pv2@nef.ens.fr>, Duncan Sands <sands@clipper.ens.fr> wrote:
>PS: The corresponding C code takes about 6 seconds.  This surprises
>me too.


The main reason is most likely that the Ada code is not at all equivalent
to the C code.

If you declare

type Matrix is array( 0 .. 14, 0 .. 14 ) of Float;

and write the loops in a way that allow the compiler to optimize out
the bounds checks (not sure if GNAT does this) then you should get
the same result as with C;


Robb Nebbe




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00   ` Robb Nebbe
@ 1995-04-07  0:00     ` Robert Dewar
  1995-04-07  0:00     ` Duncan Sands
  1 sibling, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Robb comments that the bounds checks can make a difference, yes indeed!
and GNAT is not yet doing much on optimizing bounds checks. But if you
look at the post carefully, you will see that the comparison was with
checks turned off, at least that is the way I read it, in which case
more subtle things are at work!





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00   ` Robb Nebbe
  1995-04-07  0:00     ` Robert Dewar
@ 1995-04-07  0:00     ` Duncan Sands
  1 sibling, 0 replies; 20+ messages in thread
From: Duncan Sands @ 1995-04-07  0:00 UTC (permalink / raw)


In article <1995Apr6.163740@di.epfl.ch>, Robb.Nebbe@di.epfl.ch (Robb Nebbe) writes:
|> In article <3m0nv1$pv2@nef.ens.fr>, Duncan Sands <sands@clipper.ens.fr> wrote:
|> >PS: The corresponding C code takes about 6 seconds.  This surprises
|> >me too.
|> 
|> 
|> The main reason is most likely that the Ada code is not at all equivalent
|> to the C code.
|> 
|> If you declare
|> 
|> type Matrix is array( 0 .. 14, 0 .. 14 ) of Float;
|> 
|> and write the loops in a way that allow the compiler to optimize out
|> the bounds checks (not sure if GNAT does this) then you should get
|> the same result as with C;

Thanks for your comments.  Actually I compiled with all checking
suppressed (range checking and others) for exactly this reason.
Therefore range checking is not the culprit.  I'm not too sure what
the culprit could possibly be.  In any case, the comparison I made
with C was quick and nasty so should be taken with a pinch of salt.

Duncan Sands.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 ` Colin James III
  1995-04-06  0:00   ` Robb Nebbe
@ 1995-04-06  0:00   ` Samuel Tardieu
  1995-04-07  0:00   ` Tom Griest
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Samuel Tardieu @ 1995-04-06  0:00 UTC (permalink / raw)


  From: cjames@stout.entertain.com (Colin James III)
  Subject: Re: GNAT function calling overhead
  Newsgroups: comp.lang.ada
  Date: 6 Apr 1995 07:21:30 -0600
  Organization: A poorly-installed InterNetNews site
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Colin, go and configure your site first, then you can write your crap.

  Sam
--
"La cervelle des petits enfants, ca doit avoir comme un petit gout de noisette"
                                                       Charles Baudelaire




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 ` Colin James III
  1995-04-06  0:00   ` Robb Nebbe
  1995-04-06  0:00   ` Samuel Tardieu
@ 1995-04-07  0:00   ` Tom Griest
  1995-04-07  0:00     ` Robert Dewar
  1995-04-07  0:00   ` Robert Dewar
  1995-04-07  0:00   ` Philip Brashear
  4 siblings, 1 reply; 20+ messages in thread
From: Tom Griest @ 1995-04-07  0:00 UTC (permalink / raw)

In article <3m0nv1$pv2@nef.ens.fr>, Duncan Sands <sands@clipper.ens.fr> wrote:
[snipped stuff about 10_000 matrix-multplies in an application]
>>In summary: 55 seconds with function calling overhead.
>>            44 seconds without function calling overhead.
>>
>>Now, a 15 by 15 matrix means 225 entries.  225 entries at,
>>say, 8 bytes an entry makes a bit less than 2k.  So, depending on
>>whether GNAT takes function parameters by reference or by copy,
>>this makes anything between 2k and, say, 10k bytes to be copied
>>on each function call.
>>
>>Does this explain the time difference?  It shouldn't!  The amount
>>of time spent copying memory should be completely overwhelmed by
>>the amount of time taken to do the floating point operations!

First, I doubt very much that the matrix is passed by copy.  
Basically, any composite object larger than 8-bytes will be passed 
by reference.

Second, what are your assumptions about the time to perform a
floating point multiply on a 486DX?   The 486 ref manual indicates
that fpadds are typically 10 clocks and an fpmult is around 16 clocks.

Since the formal parameters for your function are unconstrated types,
there is probably a dynmaic allocation/initialization/deallocation 
of the dope vectors for each of the parameters.  This might account
for some of the overhead.

To really give you an answer, you should either get an assembly
listing (-S flag) or provide us the source of both versions.  It is
very hard to give a detailed answer (as opposed to the sort
Colin likes to supply :-)) without this information.

-Tom

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00   ` Tom Griest
@ 1995-04-07  0:00     ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Tom Griest said:

"Since the formal parameters for your function are unconstrated types,
there is probably a dynmaic allocation/initialization/deallocation
of the dope vectors for each of the parameters.  This might account
for some of the overhead."

nope, no dynamic alloocation is ever involved for bounds templates for
arrays in this situation.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 ` Colin James III
                     ` (2 preceding siblings ...)
  1995-04-07  0:00   ` Tom Griest
@ 1995-04-07  0:00   ` Robert Dewar
  1995-04-07  0:00   ` Philip Brashear
  4 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


>At the most abstract level, it's because GNAT is a failed government
>project which was never finished and was mismanaged from the start by a
>bunch of flaky educators posing as "capable professionals".

  Maybe things are different in other government projects, and they all
  get finished before the expected termination date ??

  Anyway, to get things absolutely clear on this. The GNAT project is
  indeed not finished. The project terminates on June 31st, 1995, and
  by that time, we will indeed be finished, in the sense of having
  completed the full implementation of Ada 95, including all the
  annexes.

  It's right to be suspicious of anything coming out of academic
  environments. I am myself one of the most sceptical people when
  it comes to software coming out of universities. So I understand
  this concern. The best advice is to pay no attention to what I
  or CJIII say on this, but instead take a close look at GNAT itself!

>At the most detailed level, it's because GNAT emits poorly optimized, and
>hence very evil, C code.

  This can't be based on looking at the alledged "very evil" C code. How
  do I know this? Because in no sense does GNAT emit C code AT ALL. It
  is a true compiler, not a translator to C. Both the C and GNAT front
  ends for GCC emit a common intermediate language (RTL) that is optimized
  by the backend of GCC. So this remark is nothing but fantasy.

>And what makes anyone think that ACT will change anything with regard to
>GNAT support, documentation or enhancements.  The ACT principals have
>already demonstrated that they failed with GNAT, by even starting ACT.
>In other words, if GNAT were such a smashing success and quality product,
>then there would be no need for ACT.

  Here I think that Colin James misunderstands what ACT is about. The idea
  that quality compilers need no support might make some sense in an ideal
  world, but in practice I know of no major project that would use a
  compiler for *any* language without having guaranteed support. After all
  we expect warranties on any products we buy, no matter how excellent.

  Actually if GNAT is such a dismal failure, *then* there is definitely
  no need for commercial support. No one is going to use a junk compiler,
  even if support is available (you do not buy products that are rated
  as terrible by Consumer Reports just because they have guarantees!)
  If people sign up for support for GNAT, then it is because they think
  it meets their needs.

By the way, this is a good time to reemphasize that GNAT will continue
to be freely available, and continue to be maintained after the official
government contract is completed. All improvements and maintenance
fixes will continue to be available free via anonymous FTP, on CD-ROM's
etc. This is one of the advantages of the free software mode of operation.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 ` Colin James III
                     ` (3 preceding siblings ...)
  1995-04-07  0:00   ` Robert Dewar
@ 1995-04-07  0:00   ` Philip Brashear
  4 siblings, 0 replies; 20+ messages in thread
From: Philip Brashear @ 1995-04-07  0:00 UTC (permalink / raw)


In article <3m0psq$fl2@stout.entertain.com>,
Colin James III <cjames@stout.entertain.com> wrote:
>
>At the most abstract level, it's because GNAT is a failed government 
>project which was never finished and was mismanaged from the start by a 
>bunch of flaky educators posing as "capable professionals".
>
>At the most detailed level, it's because GNAT emits poorly optimized, and 
>hence very evil, C code.
>
>And what makes anyone think that ACT will change anything with regard to 
>GNAT support, documentation or enhancements.  The ACT principals have 
>already demonstrated that they failed with GNAT, by even starting ACT.
>In other words, if GNAT were such a smashing success and quality product, 
>then there would be no need for ACT.  
>
>Good grief, what moral and intellectual dishonesty !

I KNOW that one shouldn't waste time responding to either Mr. James or
Mr. Aharonian, but this is ridiculous and near the point of libel.

First, GNAT is not strictly a government project; other organizations have
partially funded it (yes?).  Second, it doesn't claim to be finished.  Third,
I don't believe that it emits C code at all.  Fourth, ACT was founded to
provide services related to GNAT, not to "finish" it.

Colin, for Heaven's sake, learn the meaning of "homework" and "self-control"!!!

Phil Brashear





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 GNAT function calling overhead Duncan Sands
  1995-04-06  0:00 ` Norman H. Cohen
  1995-04-06  0:00 ` Colin James III
@ 1995-04-07  0:00 ` Robert Dewar
  1995-04-07  0:00 ` Theodore Dennison
  1995-04-07  0:00 ` Kenneth Almquist
  4 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Two issues here:

first, in your example, you return unconstrained arrays. This always
involves a fair amount of overhead. Some compilers will use the heap
for this (GNAT used to, and I think Alsys still does in some of their
compilers), and do two copies.

Some other compilers will do two copies, using a secondary stack (that's
what GNAT does now).

Some compilers will use specialized calling sequences, and manage to do
only one copy in some cases, but still two copies in many cases.

Anyway, there will be at least one extra copy, so that probably accounts
for the overhead of the call that you see. If you are concerned with
maximum efficiency, try to avoid returning uncosntrained arrays (note
that this facility does not exist at all in Fortran, C or C++).

Second, the comparisons between GNAT and C are odd. Normally when you
write equivalent code in Ada and C and compile both with GCC you will
get identical object code.

In almost all cases that we have examined, it turns out that such
discrepancies are caused by using high level features in Ada that
have no analog in C, thus rendering it an apples-vs-oranges comparison.

Anyway, I can't comment further without details. Send me the sources
at dewar@cs.nyu.edu, and I will analyze what is going on,a nd post a
followup when I figure it out.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 GNAT function calling overhead Duncan Sands
                   ` (2 preceding siblings ...)
  1995-04-07  0:00 ` Robert Dewar
@ 1995-04-07  0:00 ` Theodore Dennison
  1995-04-07  0:00   ` Robert Dewar
  1995-04-07  0:00 ` Kenneth Almquist
  4 siblings, 1 reply; 20+ messages in thread
From: Theodore Dennison @ 1995-04-07  0:00 UTC (permalink / raw)


sands@clipper.ens.fr (Duncan Sands) wrote:
>Essentially the question is: why so much function calling overhead
>in GNAT?
>
>In summary: 55 seconds with function calling overhead.
>            44 seconds without function calling overhead.
>PS: The corresponding C code takes about 6 seconds.  This surprises
>me too.


Did you try compiling gnat with the optimizations turned on?



T.E.D. (structured programming bigot)





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00 ` Theodore Dennison
@ 1995-04-07  0:00   ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


T.E.D. asks a good question, did you turn optimizations on?

The Unix style in compilers is to default to no optimization. The code
generated by GCC with no optimization is horrible! 

It is very important that any performance measurements are made with
optimization turned on (-O3), otherwise they are completely meaningless.

We have wondered whether on the PC ports, it would be better to have
optimization on be the default, because this is more common with PC
compilers, and the extra time for compiling in -O3 mode on the PC is
very small (unlike some of the RISC machines).

I would be interested in people's input on this issue (optimization
default).





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-06  0:00 GNAT function calling overhead Duncan Sands
                   ` (3 preceding siblings ...)
  1995-04-07  0:00 ` Theodore Dennison
@ 1995-04-07  0:00 ` Kenneth Almquist
  1995-04-07  0:00   ` Colin James III
                     ` (2 more replies)
  4 siblings, 3 replies; 20+ messages in thread
From: Kenneth Almquist @ 1995-04-07  0:00 UTC (permalink / raw)


(Duncan Sands wants to know why matrix multiplication is slow using:
   type Matrix is array(Positive range <>, Positive range <>) of Float;
   function  "*"(Left : in Matrix; Right : in Matrix) return Matrix;
He observes that he gets about a 20% speedup if he manually inlines the
matrix multiplication and wants to know why the calling overhead is so
high.)

This has little to do with function call overhead per se.  The problem
is that GNAT produces ineffient code for subscripting operations on
Matrix variables when the bounds of the matrix are not known at compile
time.  GNAT does somewhat better when the matrix is not passed as an
argument; hence the performance improvement from inlining.

I hope that GNAT funding will cover a bunch of performance tuning, but
unimplemented features and bug fixing presumably have higher priority
right now.  Of course one of the advantages that GNAT gains from using
the GCC back end is that even if nobody on the GNAT project gets around
to looking at this particular performance problem, somebody from another
project, such as GNU Fortran, might fix it.
				Kenneth Almquist




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00 ` Kenneth Almquist
@ 1995-04-07  0:00   ` Colin James III
  1995-04-07  0:00     ` Robert Dewar
  1995-04-07  0:00   ` Robert Dewar
  1995-04-07  0:00   ` Larry Kilgallen
  2 siblings, 1 reply; 20+ messages in thread
From: Colin James III @ 1995-04-07  0:00 UTC (permalink / raw)

In article <D6nA9u.Hq7@nntpa.cb.att.com>,
Kenneth Almquist <ka@socrates.hr.att.com> wrote:
>
>I hope that GNAT funding will cover a bunch of performance tunin ...
>				Kenneth Almquist

In about April, 1994 (one year ago) at a FRAWG meeting, I believe it was 
General Little, before he retired, answered my question of whether 
GNAT would be funded said, "No, no more money for GNAT".

If ACT goes public, maybe you could buy stock in that "hope".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00   ` Colin James III
@ 1995-04-07  0:00     ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Just to cut through some of the smoke here!

There will be no more funding for the GNAT project per se from the 
government after the contract termination at the end of June. The
government may wish to establish support contracts with SGI, Labtek,
ACT or other organizations supporting GNAT, but that's a different
matter entirely. 

GNAT stands on its own feet after June 30th, and that seems quite
appropriate to me! 





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00 ` Kenneth Almquist
  1995-04-07  0:00   ` Colin James III
@ 1995-04-07  0:00   ` Robert Dewar
  1995-04-07  0:00   ` Larry Kilgallen
  2 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Kenneth says:

"
This has little to do with function call overhead per se.  The problem
is that GNAT produces ineffient code for subscripting operations on
Matrix variables when the bounds of the matrix are not known at compile
time.  GNAT does somewhat better when the matrix is not passed as an
argument; hence the performance improvement from inlining."

I don't see this, in the inner loop, with checks turned off, the lower
bound calculation should be moved out of the loop. I can't yet duplicate
the reported results. I asked for the code but did not get it yet. 
Certainly for the straightforward way of computing matrix multiplciation
for example, there is no extra overhead in the inner loop in the GNAT
code, at least on the i386 where I am looking at the assembly code.

In my experience, it is futile to guess what might be behind such
differences without the actual code at hand, there can be MANY variables.
In future, if people want to discuss performance differences between
Ada and C on particular code, it would be useful to post the  alledgedly
comparable source code.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00 ` Kenneth Almquist
  1995-04-07  0:00   ` Colin James III
  1995-04-07  0:00   ` Robert Dewar
@ 1995-04-07  0:00   ` Larry Kilgallen
  1995-04-07  0:00     ` Robert Dewar
  2 siblings, 1 reply; 20+ messages in thread
From: Larry Kilgallen @ 1995-04-07  0:00 UTC (permalink / raw)

In article <D6nA9u.Hq7@nntpa.cb.att.com>, ka@socrates.hr.att.com (Kenneth Almquist) writes:

> I hope that GNAT funding will cover a bunch of performance tuning, but
> unimplemented features and bug fixing presumably have higher priority

Actually, I would hope GNAT funds would be devoted toward correctness
on the largest possible number of platforms.  Then let commercial
vendors sell high-performance compilers to those who need high performance.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: GNAT function calling overhead
  1995-04-07  0:00   ` Larry Kilgallen
@ 1995-04-07  0:00     ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1995-04-07  0:00 UTC (permalink / raw)


Larry says:

"Actually, I would hope GNAT funds would be devoted toward correctness
on the largest possible number of platforms.  Then let commercial
vendors sell high-performance compilers to those who need high performance."

I am not quite sure what "GNAT funds" means here. If it means the money
we have left for the remaining 84 days of the contract, then this will
be devoted to finishing off the implementation of Ada 95, and fixing bugs.

If you mean the funds that SGI, Labtek, ACT etc generate for maintenance
of GNAT, those will be directed in whatever manner corresponds to customer
needs, and high performance will definitely be one of these needs. At that
point extension of GNAT to new platforms will happen only if volunteers
do ports, or if people want ports to appear and can pay for them.

But in any case, it has always been our intention to generate a 
high-performance compiler that will compete on its own terms. This
will help push the quality barrier for all Ada 95 compilers, which
can only help users of the language, no matter what compiler they
are using.

Remember that the ground on which we are building GNAT, namely GCC, is
itself a high performance system. On many machines, GCC is the fastest
C compiler available. On some systems, such as Nextstep, it is the ONLY
C compiler available. 

Of course there are lots more optimizations that could be done to improve
GNAT, but then that's a statement that can be made about most Ada
compilers!

Note that a relatively small amount of the NYU resources (which are after
all fairly limited), has been spent on generating new ports. Yet there
are lots of ports of GNAT. These have come from volunteers around the
world. I am sure that this will continue to occur!





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~1995-04-07  0:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1995-04-06  0:00 GNAT function calling overhead Duncan Sands
1995-04-06  0:00 ` Norman H. Cohen
1995-04-06  0:00 ` Colin James III
1995-04-06  0:00   ` Robb Nebbe
1995-04-07  0:00     ` Robert Dewar
1995-04-07  0:00     ` Duncan Sands
1995-04-06  0:00   ` Samuel Tardieu
1995-04-07  0:00   ` Tom Griest
1995-04-07  0:00     ` Robert Dewar
1995-04-07  0:00   ` Robert Dewar
1995-04-07  0:00   ` Philip Brashear
1995-04-07  0:00 ` Robert Dewar
1995-04-07  0:00 ` Theodore Dennison
1995-04-07  0:00   ` Robert Dewar
1995-04-07  0:00 ` Kenneth Almquist
1995-04-07  0:00   ` Colin James III
1995-04-07  0:00     ` Robert Dewar
1995-04-07  0:00   ` Robert Dewar
1995-04-07  0:00   ` Larry Kilgallen
1995-04-07  0:00     ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox