Re: Ada-95 for numerics?

comp.lang.ada
 help / color / mirror / Atom feed

* Re: Ada-95 for numerics?
  1996-04-01  0:00 Ada-95 for numerics? Joerg Rodemann
@ 1996-04-01  0:00 ` Robert Dewar
  1996-04-02  0:00   ` michael
  1996-04-01  0:00 ` Ted Dennison
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Robert Dewar @ 1996-04-01  0:00 UTC (permalink / raw)


"  c.) Is this a GNAT specific result or do the commercial compilers show the
      same behaviour?
  d.) Is there a way to speed up the program? Perhaps without suppressing all
      checks? (It seems somewhat brute to take away a lot of those things
      that are real advantages of Ada...)"

First, comparing GNAT with checks to C without checks is apples and
oranges, so undoubtedly a big part of the factor of 6 comes from
these checks (you can definitey minimize the checks by proper use
of subtypes etc, but right now GNAT does not do much to minimize
checks, there are lots of ways in which it could do a lot better).

Second, are you using unconstrained arrays? This again is a big
difference between C and the Ada code. 

You will find that if you turn off checks and use constrained arrays,
that you come much closer in speed. 

(GNAT does not handle unconstrained arrays very well on some
architectures, for a very simple reason that we understand well,
and intend to fix at some point).

Our priorities in GNAT development are first to get te functionality
complete and realiable, and second to optimize and improve the
quality of the code. There are a LOT of optimization opportunities,
which we wlil visit as GNAT development continues.

Actually I must say that numerical codes of this kind are not our
first priority. That's because relatively few GNAT users have
critical performance requirements in this area. 

By comparison, exception handling efficiency is a lot more important,
and 3.04 wlil feature VERY much improved performance in this area,
especially on Solaris and SunOS.

P.S. you might post the exact code you are using, that would enable 
answering your questions more precisely. The ratio of speed between
different languages can often be very dependent on coding details (in
any language). 





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-01  0:00 ` Ted Dennison
@ 1996-04-01  0:00   ` Robert Dewar
  0 siblings, 0 replies; 12+ messages in thread
From: Robert Dewar @ 1996-04-01  0:00 UTC (permalink / raw)


"BTW: The other compilers you mentioned are all more "mature" than
GNAT. This means they have had way more time to work on optimizations."

Actulally GNAT is rather strange in this respect. The gcc backend is
*quite* "mature", and often generates code that compares with the
best code generated by any compiler for any language on a given
machine.

If you write Ada that corresponds to the stuff gcc knows how to do well,
then you can encounter performance levels that look VERY good, often
much better than other Ada (or C) compilers.

On the other hand, it is quite true that the GNAT front end is quite young,
and some of the code for Ada specific constructs (e.g. exceptions,
aggregates, finalization, slices, runtime checks) is far from optimal,
and can be expected to improve significantly as time goes by!

We certainly are interested in detailed reports (with code) for examples
where GNAT seems slow. It is important to give the exact code. Often
it is the case that the performance is VERY dependent on small details
of how the code is written.

Let me give a specific example, to see what I mean:

procedure a is
   x : integer;
   subtype r is integer range x .. x + x;
   type a is array (r) of integer;
   va : a;
begin
   for j in r loop
      va(j) := 0;
   end loop;

   for j in a'range loop
      va(j) := 0;
   end loop;
end a;

Now these two loops should obviously be treated exactly the same. But
in fact the first loop is treated optimally, with no junk checks, and
generates essentially perfect machine code. But the second loop generates
a silly check inside the loop. The fix is actually quite simple (it might
get into 3.04, we only noticed this anomoly recently). Certainly it is
on the list. 

Note: I had to use a dynamic range here, if I had written range 1 .. 10
for r, then the code for both loops would have been fine.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ada-95 for numerics?
@ 1996-04-01  0:00 Joerg Rodemann
  1996-04-01  0:00 ` Robert Dewar
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Joerg Rodemann @ 1996-04-01  0:00 UTC (permalink / raw)


Hello!

Since I will have to do a lot of matrix diagonalization in the near future
(matrix dimensions will be about 1000x1000) I checked the speed different
languages produce. I used the routine tred2 from the "Numerical Recipes in C"
which brings a real symmetric matrix to a tri-diagonal form. 
I tested the following languages/compilers:

   1.) C:
       Compilers: HP/UX cc, gcc-2.7.2
                  Solaris   gcc-2.7.2
   2.) Fortran-77: 
          the routine was called from an Ada-95 main.
       Compilers: HP-UX f77
   3.) Ada-95:
       Compilers: GNAT 3.01/gcc-2.7.2

In order to minimize other effects I entered NxN symmetrical random matrices
and called the routine 10000 times. With gcc/GNAT I used optimization level 4,
for cc I used +Oall. The Ada package containing tred2 included a pragma
Optimization (Time). 

This were the results:

   1.) All programs showed for larger N the expected N**3 behaviour.
   2.) Their speed differed by a seemingly constant factor (This not so for
       small N --- there the overhead is dominating.)
   3.) The C and f77 versions had a quite similar speed. (C needed about 1.x
       the time the f77-routine did with x between 1 and 2
   4.) The Ada-95 version was about a factor of 6 slower than the other ones.
   5.) For C the results from gcc and the original HP cc were nearly equal.
   6.) Using pragma Suppress (All_Checks) in the tred2 routine still left
       the Ada program a factor of 2 behind the other ones.

My questions now are the following:

  a.) Has anybody out there used/tested Ada for similar problems? What were
      the results?
  b.) What is the reason for the slow down? Especially if suppressing all
      checks still leaves a factor of 2...what is the runtime system doing
      with the extra time?
  c.) Is this a GNAT specific result or do the commercial compilers show the
      same behaviour?
  d.) Is there a way to speed up the program? Perhaps without suppressing all
      checks? (It seems somewhat brute to take away a lot of those things
      that are real advantages of Ada...)

For now I will do the global program structure with Ada and call some
Fortran routines with pragma Import. By the way: I was really surprised how
easy it is to import Fortran routines. IMHO if more languages provide such
an easy to use interface perhaps there would not be any language wars anymore.
Just use the language that is best suited for each single problem and build
the whole thing together... (Just thinking...)

Thanks to all

greetings

George






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-01  0:00 Ada-95 for numerics? Joerg Rodemann
  1996-04-01  0:00 ` Robert Dewar
@ 1996-04-01  0:00 ` Ted Dennison
  1996-04-01  0:00   ` Robert Dewar
  1996-04-02  0:00 ` Dale Stanbrough
  1996-04-03  0:00 ` Robert I. Eachus
  3 siblings, 1 reply; 12+ messages in thread
From: Ted Dennison @ 1996-04-01  0:00 UTC (permalink / raw)


Joerg Rodemann wrote:
> 
> I tested the following languages/compilers:
> 
>    3.) Ada-95:
>        Compilers: GNAT 3.01/gcc-2.7.2
...
> This were the results:
> 
>    4.) The Ada-95 version was about a factor of 6 slower than the other ones.

You mean the "GNAT version". Just because your one Ada-95 compiler was 
slow says nothing about the language.

BTW: The other compilers you mentioned are all more "mature" than
GNAT. This means they have had way more time to work on optimizations.

-- 
T.E.D.          
                |  Work - mailto:dennison@escmail.orl.mmc.com  |
                |  Home - mailto:dennison@iag.net              |
                |  URL  - http://www.iag.net/~dennison         |




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-01  0:00 Ada-95 for numerics? Joerg Rodemann
  1996-04-01  0:00 ` Robert Dewar
  1996-04-01  0:00 ` Ted Dennison
@ 1996-04-02  0:00 ` Dale Stanbrough
  1996-04-03  0:00 ` Robert I. Eachus
  3 siblings, 0 replies; 12+ messages in thread
From: Dale Stanbrough @ 1996-04-02  0:00 UTC (permalink / raw)


Joerg Rodemann writes:

"My questions now are the following:

 a.) Has anybody out there used/tested Ada for similar problems? What
were
     the results?
 b.) What is the reason for the slow down? Especially if suppressing
all
     checks still leaves a factor of 2...what is the runtime system
doing
     with the extra time?
 c.) Is this a GNAT specific result or do the commercial compilers
show the
     same behaviour?
 d.) Is there a way to speed up the program? Perhaps without
suppressing all
     checks? (It seems somewhat brute to take away a lot of those
things
     that are real advantages of Ada...)"
     

----------------------------------------------------
The following was posted by Bevin Brett, a long time Ada-83'er at
Digital... (Bevin, come back, all is forgiven! :-)

Hope it helps.

Dale
--------------------------------------------------
From: brett@ada9x.enet.dec.com (Bevin R. Brett)
Newsgroups: comp.lang.ada
Subject: Fortran and Ada, a partial answer
Date: 14 JUL 94 15:26:21 EST


I have been asked to comment on the relative performance of algorithms
coded in
Ada and in Fortran.

This question has come up repeatedly over the years, and deserves a
complete
answer, rather than a simplistic one.

There are many factors which influence the size and execution speed of
the
running program, and they all play together to get a full answer.  I
shall then
discuss an exact Ada v. Fortran comparison that Digital was involved
in.


First, a position statement:  The variation between Ada and Fortran is
less
than the variation within the language caused by the exact
implementation
details.  A person versed in the Ada issues should do as well in Ada
as a
person versed in the Fortran issues will do in Fortran.  The size and
execution
speed of the result should be within a few percent of each other.


(a) Differences due to the compiler

	In the case of the DEC Ada and Fortran compilers, the optimizer and
	code generator are the same.  Never-the-less, the exact inputs into
	the optimizer and code generator may differ slightly when the same
	algorithm is compiled by the Ada and Fortran compilers, and this can
	result in major differences in the generated code.  In these cases the
	compiler front ends can usually be modified to correct the slower one.

	We have not observed any major differences in generated code quality
	between the DEC Ada and DEC Fortran compilers caused by such issues.


(b) Differences due to the language

	It is very important that the same algorithm be written in the two
	languages.  The biggest differences we have observed are

	    i.  Having the wrong dimension varying fastest, since it is
		desireable to have the first dimension changing fastest
		in Fortran, and the last dimension in Ada.  Thus when
		an algorithm is transliterated, the array indexes must be
		reversed.

	    ii. Using compile-time-known bounds for arrays in Fortran, and
		using unconstrained arrays in the Ada code.  Knowing the
		exact values of the dimensions at compile-time results in
		much better code.

	    iii.Not suppressing all the runtime checks in Ada.  The Fortran
		compiler assumes all array bounds are in range, and all
		arithmetic operations do not overflow.  You must use a
		pragma Suppress to tell this to the Ada compiler as well.

	    iv. Don't use arrays of Ada Booleans to match arrays of Fortran
		Integers, because accessing bytes on a RISC system might
		be much worse than accessing fullwords.


(c) Differences due to the bindings

	The biggest bindings differences are related to Fortran's built-in
	support for complex types, and for various math routines such as
	SQRT and SIN, compared with Ada code that often uses hand-coded or
	ISO standardised versions of these functions with different
requirements
	than are imposed on the Fortran versions.

	DEC Ada has built-in support for complex types, and also has bindings
	directly to the same primitives that Fortran uses for its math
routines
	and so gets the same performance as Fortran does.


(d) Differences due to the author

	The use of good Ada and Fortran style can also effect the generated
	code.  Provided the author writes in good Ada style, and follows the
	above guidelines, the generated code should do as well as Fortran.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-01  0:00 ` Robert Dewar
@ 1996-04-02  0:00   ` michael
  0 siblings, 0 replies; 12+ messages in thread
From: michael @ 1996-04-02  0:00 UTC (permalink / raw)

dewar@cs.nyu.edu (Robert Dewar) wrote:

> Second, are you using unconstrained arrays? This again is a big
> difference between C and the Ada code. 
> 
> You will find that if you turn off checks and use constrained arrays,
> that you come much closer in speed. 

This is certainly true, but in many cases it is just impractical to use
constrained arrays.

> (GNAT does not handle unconstrained arrays very well on some
> architectures, for a very simple reason that we understand well,
> and intend to fix at some point).

Can you already give an estimate when that will be?

> Our priorities in GNAT development are first to get te functionality
> complete and realiable, and second to optimize and improve the
> quality of the code. There are a LOT of optimization opportunities,
> which we wlil visit as GNAT development continues.
> 
> Actually I must say that numerical codes of this kind are not our
> first priority. That's because relatively few GNAT users have
> critical performance requirements in this area. 

Please add me to the list of these few GNAT users because "numerical
codes of this kind" is what our current project is all about. I did
some tests with a simple matrix multiplication routine using unconstrained
arrays and found that on all architectures the GNAT code is about a
factor 3 - 4 slower than an equivalent hand optimized C function but
I also found that when I write a C function that is as similar as possible
to the original Ada procedure, I get almost exactly the same timing
results as for the Ada version. To me this indicates that the problem
with unconstrained arrays is actually in the gcc backend and not so
much in the Ada frontend. As far as I know the Fortran comunity
has the same problems with the performance of g77.

From all architectures which I have tested the SUN/SPARC architecture
appeared to be the worst of all. The original performance difference
was a factor of 12. Just recently I found a switch in the gcc manual
"-msupersparc" which seems to improve the situation quite a bit. By
using this switch the factor came down to the usual 4.0. So, for
SPARC users with a reasonably modern machine I can only recommend to
use this switch if portability of the binaries with older machines is
no issue. If anybody knows about other switches of this kind, please
let me know. Two more of them and we will be able to execute an
infinite loop in less than a second :-)

For all the tests I did of course suppress all Ada checks and used full
optimization!

Michael

-- 
------------------------------------------------------------------------
--Dipl.-Ing. Michael Paus   (Member: Team Ada)
--University of Stuttgart, Inst. of Flight Mechanics and Flight Control
--Forststrasse 86, 70176 Stuttgart, Germany
--Phone: (+49) 711-121-1434  FAX: (+49) 711-634856
--Email: Michael.Paus@ifr.luftfahrt.uni-stuttgart.de (NeXT-Mail welcome)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
@ 1996-04-02  0:00 Jean-Pierre Rosen
  1996-04-03  0:00 ` Robert I. Eachus
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Pierre Rosen @ 1996-04-02  0:00 UTC (permalink / raw)


At 08:28 02/04/1996 GMT, you wrote:
>[...] So I started coding some of _my_ problems in Ada-95 and check
>for the speed. Yet, I am still a beginner in Ada-95, so maybe the coding style
>is somewhat unlucky...

Something you have to remember is that Ada has nice facilities for dealing
with arrays globally. For example, a typical benchmark trap is to translate:

DO 10 I=1,N
   DO 10 J=1,N
10 A(I,J) = B(I,J)

into:
for I in 1..N loop
   for J in 1..N loop
      A(I,J) := B(I,J)
   end loop;
end loop;

The correct translation is of course (assuming N is the size of the array):
A := B;

It is important, because since the compiler KNOWS that global assignment is
available, it will not spend a huge time trying to unwind loops, the way the
FORTRAN compiler does. And your FORTRAN might look faster...

In the same vein, you wrote :
>   function Unit (n : Integer) return Matrix is
>      C : Matrix (1..N, 1..N);
>   begin
>      for i in C'range(1) loop
>         for j in C'first(2)..i loop
>            C(i,j) := 0.0;
>            C(j,i) := 0.0;
>         end loop;
>         C(i,i) := 1.0;
>      end loop;
>      return (C);
>   end Unit;

This should be:
   function Unit (n : Integer) return Matrix is
      C : Matrix (1..N, 1..N) := (Others => (Others => 0.0));
   begin
      for i in C'range(1) loop
         C(i,i) := 1.0;
      end loop;
      return (C);
   end Unit;

If the compiler is not  very clever, generated code will be the same as with
what you wrote. But it is much easier for the compiler  to recognize that a
simple "move byte string" instruction is enough to do the trick.

Can't be worse, likely to be better....
+------------------------------------o-------------------------------------+
| P-mail:                            | E-mail: rosen@enst.fr               |
|   ADALOG - 27 avenue de Verdun     |    Tel: +33 1 46 45 51 12           |
|   92170 Vanves - FRANCE            |    Fax: +33 1 46 45 52 49           |
+------------------------------------o-------------------------------------+




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ada-95 for numerics?
@ 1996-04-02  0:00 Joerg Rodemann
  1996-04-02  0:00 ` Robert Dewar
  0 siblings, 1 reply; 12+ messages in thread
From: Joerg Rodemann @ 1996-04-02  0:00 UTC (permalink / raw)


Hello again!

Thanks for all your postings. First I want to make the following clear:
I am not criticizing the GNAT compiler --- sadly it is the only Ada-95 
compiler I am able to use by now. And it is a fine thing to use.

Since I am now working at a theoretical physics department and lots of
the software written here is still done in Fortran-77. Many of our problems
need quite a lot computing time. I guess for some applications, especially
matrices Fortran may be still the best choice. At least for there are already
many routines available (Numerical Recipes, Nag, EISPACK,...). But there
are other applications that may be done better in another language...perhaps
Ada-95 (complex and dynamic data structures, usage of generics and OO-
techniques). So I started coding some of _my_ problems in Ada-95 and check
for the speed. Yet, I am still a beginner in Ada-95, so maybe the coding style
is somewhat unlucky... 

Some of you wanted to look at my code, so I appended it to this article. 

Thanks and greetings

George


-----------------------------------------------------------------------------
--
--                                matrices.ads
--
--            A generic matrix package (only a few functions shown...)
--
-----------------------------------------------------------------------------
generic
   type Floating is digits <>;
package Matrices is

   type Matrix is array (integer range <>, integer range <>) of Floating;
   pragma Convention (Fortran, Matrix);

   function Unit (n : integer) return matrix;

   procedure Get (a : out Matrix);
   procedure Put (a : Matrix);

end Matrices;
-----------------------------------------------------------------------------
--
--                         eigensystems.ads
--
--      A generic library of routines taken from "Numerical Recipes in C" 
-- The code ist strongly oriented at the C version, only the generic character
-- has been added.
--
-----------------------------------------------------------------------------
with vectors;
with matrices;

generic
   type Floating is digits <>;
   with function sqrt (x : Floating) return Floating;
   with package v is new vectors  (Floating);
   with package m is new matrices (Floating);

   use v, m;
package eigensystems is

   procedure tred2 (a : in out Matrix; d, e : out vector);

end eigensystems;
-----------------------------------------------------------------------------
--
--                         vectors.ads
--
-----------------------------------------------------------------------------
generic
   type Floating is digits <>;
package Vectors is

   type Vector is array (integer range <>) of Floating;
   
   procedure Get (x : out Vector);
   procedure Put (x : Vector);
  
end Vectors;
-----------------------------------------------------------------------------
--
--                             matrices.adb
--
-- A generic matrix package (only a few functions shown here...)
--
-----------------------------------------------------------------------------
with Ada.Text_Io;

use Ada.Text_Io;

package body Matrices is

   function Unit (n : Integer) return Matrix is
      C : Matrix (1..N, 1..N);
   begin
      for i in C'range(1) loop
         for j in C'first(2)..i loop
            C(i,j) := 0.0;
            C(j,i) := 0.0;
         end loop;
         C(i,i) := 1.0;
      end loop;
      return (C);
   end Unit;
            


   package Floating_Io is new Float_io (Floating); use Floating_Io;

   procedure Get (a : out matrix) is
   begin
      for i in a'Range(1) loop
         for j in a'Range(2) loop
            Get (a(i,j));
         end loop;
      end loop;
   end Get;

   procedure Put (a : matrix) is
   begin
      for i in a'Range(1) loop
         for j in a'Range(2) loop
            Put (a(i,j)); New_Line;
         end loop;
         New_Line;
      end loop;
   end Put;

end Matrices;
-----------------------------------------------------------------------------
--
--                             mytest.adb
--
-- Main program: read in a symmetric matrix and brings it to tridiagonal form
--
-----------------------------------------------------------------------------
with Vectors;
with Matrices;
with Eigensystems;
with Ada.Text_Io;
with Ada.Numerics.Aux;

use Ada.Text_Io;
use Ada.Numerics.Aux;

procedure mytest is

   package MyVectors  is new Vectors  (Double); use MyVectors;
   package MyMatrices is new Matrices (Double); use MyMatrices;
   package MyEigensystems is new 
      Eigensystems (Double, Sqrt, MyVectors, MyMatrices);

   N : integer := 5;

   package Int_Io is new Integer_Io(integer); use Int_Io;

begin
   Get (N);
   declare
      a, b : Matrix (1..N,1..N);
      d, e : Vector (1..N);
   begin
      Get (b);
      for i in 1..100000 loop
         a := b;
         MyEigensystems.Tred2 (a, d, e);
      end loop;
      Put (d);
      Put (e);
   end;
end mytest;
-----------------------------------------------------------------------------
--
--                       eigensystems.adb
--
--           Contains the tred2 routine from "Numerical Recipes in C"
--
-----------------------------------------------------------------------------
package body eigensystems is

   pragma Optimize (Time);
   pragma Suppress (All_Checks);

   procedure tred2 (a : in out Matrix; d, e : out vector) is
      scale : Floating;
      hh    : Floating;
      h     : Floating;
      g     : Floating;
      f     : Floating;
      n     : constant integer := a'Last(1);
      l     : integer range 0..n;
      i, j, k : integer range 1..n;
   begin
      for i in reverse 2..n loop
         l := i-1;
         h := 0.0;
         scale := 0.0;      
         if l > 1 then
            for k in 1..l loop
               scale := scale + abs (a(i,k));
            end loop;

            if scale = 0.0 then 
              e(i) := a(i,l);
            else
               for k in 1..l loop
                  a(i,k) := a(i,k) / scale;
                  h := h + a(i,k) * a(i,k);
               end loop;
               f := a(i,l);
               if f >= 0.0 then
                  g := -sqrt(h);
               else
                  g := sqrt(h);
               end if;
               e(i) := scale * g;
               h := h - f*g;
               a(i,l) := f - g;
               f := 0.0;
               for j in 1..l loop
                  a(j,i) := a(i,j) / h;
                  g := 0.0;
                  for k in 1..j loop
                     g := g + a(j,k) * a(i,k);
                  end loop;
                  for k in j+1..l loop
                     g := g + a(k,j)*a(i,k);
                  end loop;
                  e(j) := g / h;
                  f := f + e(j) * a(i,j);
               end loop;
               hh := f / (h+h);
               for j in 1..l loop
                  f := a(i,j);
                  e(j) := e(j) - hh*f;
                  g := e(j);
                  for k in 1..j loop
                     a(j,k) := a(j,k) - (f*e(k) + g*a(i,k));
                  end loop;
               end loop;
            end if;
         else
            e(i) := a(i,l);   
         end if;
         d(i) := h;
      end loop;

      d(1) := 0.0;
      e(1) := 0.0;
      for i in 1..n loop
         l := i - 1;
         if d(i) /= 0.0 then
            for j in 1..l loop
               g := 0.0;
               for k in 1..l loop
                  g := g + a(i,k)*a(k,j);
               end loop;
               for k in 1..l loop
                  a(k,j) := a(k,j) - g*a(k,i);
               end loop;
            end loop;
         end if;
         d(i) := a(i,i);
         a(i,i) := 1.0;
         for j in 1..l loop
            a(j,i) := 0.0;
            a(i,j) := 0.0;
         end loop; 
      end loop;
   end tred2;

end eigensystems;
-----------------------------------------------------------------------------
--
--                         vectors.adb
--
-- A generic Vector package (This is just a stripped version
--
-----------------------------------------------------------------------------
with Ada.Text_Io;

use Ada.Text_Io;

package body vectors is
   

   package Floating_Io is new Float_Io (Floating); use Floating_Io;

   procedure Get (x : out Vector) is
   begin
      for i in x'Range loop
         Get (x(i));
      end loop;
   end Get;

   procedure Put (x : Vector) is
   begin
      for i in x'Range loop
         Put (x(i)); New_Line;
      end loop;
   end Put;

end vectors;




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-02  0:00 Joerg Rodemann
@ 1996-04-02  0:00 ` Robert Dewar
  0 siblings, 0 replies; 12+ messages in thread
From: Robert Dewar @ 1996-04-02  0:00 UTC (permalink / raw)


"Thanks for all your postings. First I want to make the following clear:
I am not criticizing the GNAT compiler --- sadly it is the only Ada-95
compiler I am able to use by now. And it is a fine thing to use."

No need to apologize, one of the things you should expect from GNAT
is good performance of generated code. RIght now GNAT will meet this
expectation for some cases but not others :-)

You did not say what machine you are on, or if you did I missed this,
please make this clear, it can make a BIG difference (i.e. do not
assume that GNAT performance measurements on one machine have any
relevance whatsoever to another machine!)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-01  0:00 Ada-95 for numerics? Joerg Rodemann
                   ` (2 preceding siblings ...)
  1996-04-02  0:00 ` Dale Stanbrough
@ 1996-04-03  0:00 ` Robert I. Eachus
  3 siblings, 0 replies; 12+ messages in thread
From: Robert I. Eachus @ 1996-04-03  0:00 UTC (permalink / raw)

In article <4jocek$h5h@rigel.rz.uni-ulm.de> rodemann@mathematik.uni-ulm.de (Joerg Rodemann) writes:

  >  6.) Using pragma Suppress (All_Checks) in the tred2 routine still left
  >  the Ada program a factor of 2 behind the other ones.

  First a caution. Remember to check that the suppression is really
valid on the actual data sets.  If your matrices are, by their nature
well conditioned you don't have a problem.  But in the FORTRAN, C and
Ada with suppression cases you will get junk answers with no warning
for some inputs.

  Okay, now to the "fun stuff."

  >   a.) Has anybody out there used/tested Ada for similar problems? What were
  >	 the results?

   That factor of two is probably real, and is often seen.  The issue
is a residue of the exception checking.  The Ada code for matrix
operations often has to create a matrix value as a temporary, then
copy it to the final location.  If you know how, you can design this
out, sometimes at the cost of more storage space.  The problem is that
Ada code is not allowed to ignore aliasing among parameters and
results.  Partial results can only be written to an out variable of a
scalar type if the compiler can verify that for all calls there is no
aliasing to in parameters and no exceptions can be raised.  Most Ada
compilers just ignore the opportunity for optimization in these cases.

  >   b.) What is the reason for the slow down? Especially if suppressing all
  >	 checks still leaves a factor of 2...what is the runtime system doing
  >	 with the extra time?

    Copying values from here to there.

  >  c.) Is this a GNAT specific result or do the commercial compilers show the
  >	 same behaviour?

    I've seen it on a lot of Ada 83 compilers, including for various
Crays.

  > d.) Is there a way to speed up the program? Perhaps without
  >     suppressing all checks? (It seems somewhat brute to take away
  >	a lot of those things that are real advantages of Ada...)

  The solution is to use intrinsics, but I don't know that GNAT
supports them.  In any case they would likely be hardware specific.
(What you want is a machine level vector inner product operation which
is indetermininate in where the overflow occurs.  With hardware
support and if implemented right, you can get FORTRAN speed with Ada
overflow checking, even on a Cray.)

  > For now I will do the global program structure with Ada and call
  > some Fortran routines with pragma Import. By the way: I was really
  > surprised how easy it is to import Fortran routines. IMHO if more
  > languages provide such an easy to use interface perhaps there
  > would not be any language wars anymore.  Just use the language
  > that is best suited for each single problem and build the whole
  > thing together... (Just thinking...)

    See above, but be very careful about the shape of your data,
especially if the matrices are large.  The other thing to watch is
the accuracy of your results.  For seriously large matrices I use
fixed point with multiple precision intermediate operations to provide
greater ranges.  It is often very SLOW, but I have this thing about
knowing that I can trust at least the first three digits of the
answer.  (The old saw about testing with single and double precision
arithmetic sometimes works well, other times you just get two
different junk results.)

    Another good idea, besides conditioning the matrices, is to check
the values of all pivots and other divisors.  If they get distant from
1.0, your accuracy goes away fast.  (In one version of simplex code, i
allowed the user to set the check limits on the pivots.  If the pivot
fell out of range, rebuild the basis.  If that doesn't fix it, stop
and print an error message.)

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...
--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ada-95 for numerics?
  1996-04-02  0:00 Jean-Pierre Rosen
@ 1996-04-03  0:00 ` Robert I. Eachus
  0 siblings, 0 replies; 12+ messages in thread
From: Robert I. Eachus @ 1996-04-03  0:00 UTC (permalink / raw)

In article <199604021629.SAA00724@email.enst.fr> Jean-Pierre Rosen <rosen@EMAIL.ENST.FR> writes:

  > If the compiler is not very clever, generated code will be the
  > same as with what you wrote. But it is much easier for the
  > compiler to recognize that a simple "move byte string" instruction
  > is enough to do the trick.

   Or on a lot of modern hardware, the compiler can often make use of
hardware or OS features for zeroing memory.  For extremely large
arrays that have to be initialized to zero, it is better to allocate
them on the heap on certain platforms as a heap allocation always
returns zeroed memory.  (The OS can do this for you faster in many
cases because it "knows" that the data is going to be overwritten, so
it never reads it, in some cases disabling the cache to do so.)

   Incidently this "trick" is more than just a parlor trick.  I once
used hardware calls that blanked a memory bank to run an O(N**2 log N)
algorithm in O(N log N) time.  Clearing a 128K memory bank to zero out
a O(4K) array may seem wasteful, but boy did it help the performance.

--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ada-95 for numerics?
@ 1996-04-05  0:00  Dr J Parker
  0 siblings, 0 replies; 12+ messages in thread
From:  Dr J Parker @ 1996-04-05  0:00 UTC (permalink / raw)



 Joerg Rodemann writes:
 "Since I will have to do a lot of matrix diagonalization in the near future
(matrix dimensions will be about 1000x1000) I checked the speed different
languages produce. I used the routine tred2 from the "Numerical Recipes in C"
which brings a real symmetric matrix to a tri-diagonal form.
I tested the following languages/compilers:

   1.) C:
       Compilers: HP/UX cc, gcc-2.7.2
                  Solaris   gcc-2.7.2
   2.) Fortran-77:
          the routine was called from an Ada-95 main.
       Compilers: HP-UX f77
   3.) Ada-95:
       Compilers: GNAT 3.01/gcc-2.7.2
 a.) Has anybody out there used/tested Ada for similar problems? What were
     the results?"

Sure, I spent about 2 weeks working with gnat 2.02 on an HP 735,
trying to  optimize some scalar and some linear algebra programs.
I'll tell you what I learned, but remember it's limited
to a specific machine, an old gnat, and 2 weeks of tinkering.

Summary of everything I'm going to say: know the difference between
a machine portable compiler like gcc/gnat and a native Fortran compiler;
don't use unconstrained arrays; in multidimensional arrays, vary outer
indices fastest; unroll critical linear algebra loops by hand;
after that use the -funroll-all-loops flag; write critical inner
loops in lots of different ways and try various combinations of
compiler flags; if you are doing linear algebra,
then no matter what language or compiler you are using, consider
linking to BLAS (basic linear algebra subroutines); if you don't
want to use BLAS, write your own BLAS-like package - here is 
where you definitely suppress checks; the
scalar benchmark presented below (an fft) has gcc/gnat running
at 80% to 110%  HP Fortran speed; gcc/gnat peaked at 64 megaflops, 
and the HP's native Fortran at 78 megaflops, on a machine with a
(100 x 100) linpack rating of about 50 megaflops.  (You can stop
reading here :)

Regarding gcc/gnat vs. native compilers:

The machine portable code generator  used by gcc/gnat will
only very rarely give you the top performance that the
native compiler gives you, I mean if you are comparing number
crunching inner loops on a native Fortran compiler. For example
if gnat is giving you 80% HP-Fortran speed on the HP, then you are
doing very well. The native compiler's code generator knows tricks
about the machine that gcc will never know and it has been carefully
tuned for many years to do well on the kinds of simple loops
that show up in standard benchmarks like linpack.  On the other hand
if you can't get gcc/gnat to produce executables that run at the same 
speed as gcc/C and gcc/Fortran (g77) then you are doing something 
wrong.  (Actually I ran some comparisons of g77 and gnat: gnat was 
consistently slightly faster.  From this you cannot conclude that Ada
is faster than Fortran!   Anyway it was an early g77.)

So on an engineering workstation with an  optimized native Fortran,
you can expect the Fortran to consistently produce better average
performance and better peak performance than portable compilers.
Nevertheless there is a down side to this: you are fooling yourself
if you think any Fortran compiler will consistently get a high
percentage of the machine's peak  from most loops you feed it.
First of all, beyond a few simple loops that appear in common
benchmarks, you shouldn't expect too much from the optimizer. Second,
its very often the case in my experience that you have to spend
a lot of time rewriting loops to find the one that the optimizer
smiles on.  In fact, honest truth, I just spent a week
rewriting Fortran inner loops until I found  the ones that the
optimizer liked - doubled the speed of some of them.   (The program
runs for days on a massively parallel supercomputer.)  Good
Fortran programmers will often automatically write their
programs to use BLAS, so they don't have to reinvent these particular
wheels the hard way.

So that's why many of us are happy using the gcc family - you can
expect to do more work optimizing, but you may have to do that
work in any language, with any compiler. And of course we get to
use our favorite language.

Regarding performance of gcc/gnat:

Here are a few notes on optimizing with GNAT 2.02.

1. Avoid unconstrained arrays in inner loops - often makes life difficult
for the optimizer, especially if they are generic formals.  I greatly
prefer the services and abstraction provided by generics to that provided
by unconstrained arrays.  You don't need them.

2.  Try to write loops to vary the outer indices  of the array fastest.
Remember, Fortran prefers inner indices varied fastest.
Gnat will give you a choice on which indices you should vary fastest,
but you have to know what pragma to use and most compilers don't yet
offer this Ada 95 feature.

In your householder routine, the arrays are symmetric, a (i, j) = a (j, i).
Its easy to rewrite the inner loops so that the outer indices vary
fastest.  I always make the arrays as one dimensional arrays of
one dimensional arrays. Inner loops then become dot products
between rows of the matrix, which I pass to a dot product  routine
in which the loop has been unrolled by hand.  Gnat gets pretty good
performance this way. But if you write this dot product in a 
naive way, gnat's performance can drop by a factor of 3 to 6 on the HP.

Actually people who want the best performance unroll both the inner
and outer loops in your problem.  In fact there are automatic tools
that try every conceivable combination of unroll parameters to find
the best. I'll never bother with that kind of thing!  That's what
BLAS was invented for. 

3. If you are writing a BLAS-like package for gnat, here are the
kinds of things you should consider. This is from a package I wrote:

--  If you are going to do the dot-products in a high level language, then the
--  method that gives best efficiency will usually be different on each machine
--  and compiler. So the loop unrolling here should be taken as a hint at best,
--  but usually works better than no unrolling at all.
--
--  The procedure exports the best of several attempts to optimize
--  dot-products for gcc/GNAT (2.02) on an HP735. Three method are used.
--  The 3 methods are placed in a case statement.  This is convenient, and
--  its also a lot less error prone than alternatives.  (But the code
--  is considerably uglier, and the extra overhead noticable.)  Its important
--  to experiment with different compilation flags.  On gcc/GNAT, the
--  following seems to work best in most cases:
--
--   gcc -c -gnatp -O3 -funroll-all-loops -fomit-frame-pointer File_Name.adb
--
--  Even though some of the loops are unrolled by hand, subsequent application
--  of -funroll-all-loops is very important.  On the other hand, a better
--  compiler than gcc/gnat, one that knows more about the target arch., will
--  also know best how to unroll the loops for you.  In that case you should
--  use Ordinary_Loops:
--
--  Ordinary_Loops does no unrolling by hand. 
--  This should be compiled with gcc's -funroll-all-loops, because loop
--  bounds are known (in general) only at run time.
--
        Sum := 0.0;
        for L in Starting_Index..Ending_Index loop
           Sum := Sum + X(L)*Y(L);
        end loop;

--  In the 2nd method, the loops are also unrolled by hand (6 levels
--  deep) but the gcc/GNAT flag -funroll-all-loops is still required for
--  best performance.  This one is called Standard_Unrolled. Here is
--   part of the inner loop:
--
          for Segments in 2..Unroll_Segments loop
              --L0 := L5 + 1; L1 := L0 + 1; L2 := L1 + 1;
              --L3 := L2 + 1; L4 := L3 + 1; L5 := L4 + 1;
              L0 := L0 + Unroll_Length_Stnd; L1 := L1 + Unroll_Length_Stnd;
              L2 := L2 + Unroll_Length_Stnd; L3 := L3 + Unroll_Length_Stnd;
              L4 := L4 + Unroll_Length_Stnd; L5 := L5 + Unroll_Length_Stnd;
              Sum := Sum + X(L0)*Y(L0) + X(L1)*Y(L1) + X(L2)*Y(L2) + X(L3)*Y(L3)
                         + X(L4)*Y(L4) + X(L5)*Y(L5);
           end loop;
           --  Best choice of increments varies.  Chosen increments of L0, L1,
           -- Enable possible parallel execution.

--  The 3rd method wraps a slightly unrolled dot-product in a loop with
--  static bounds.  This can be compiled with -funroll-loops rather than
--  -funroll-all-loops (which is required for non-static bounds).  This got
--  the best performance from gcc/GNAT on the hppa.
--  This one is called Static_Unrolled.

           for Segments in 2..Unroll_Segments loop
           for N in Integer range 1..Half_Unroll_Length_Static loop
              L0 := L1 + 1; L1 := L0 + 1;
            --L0 := L0 + 2; L1 := L1 + 2;
              Sum := Sum + X(L0)*Y(L0) + X(L1)*Y(L1);
           end loop;
           end loop;
           --  Best choice of increments varies.

 Two final notes:
 a) there are lots of other tricks worth trying.
 b) to get gcc to unroll, instantiate generics with 0 based arrays.

  
4.   To give you an idea of what to expect from gnat, here's a 
benchmark of an fft I wrote.  It doesn't use any of the tricks
I describe above (there's no linear algebra in it). I didn't do
much work to optimize, so I'm fairly impressed with gnat here.

The comparison is with an identical version of the fft in Fortran,
compiled with the latest version of HP's native Fortran compiler.  
All results are megaflops of performance on an HP-735/125.

1. Performance in megaflops of procedure FFT, on data in an array of length
   8192.  The FFT is performed on subsets of the data.  The subsets are of
   length N:[Note: f77 inlined. GNAT 2.02 didn't.]


                    N = 256    512    1024   2048    4096   8192
      ---------------------------------------------------------

gnat/Ada   gcc -O3      54      55     61     60      64     63   megaflops

Fortran    f77 +O4      66      69     78     77      70     55   megaflops

Ratio      gcc/f77      82%     80%    78%    78%     91%    114%

   Megaflops are calculated by using  4*N*Log_Base_2(N)  as the number
   of floating point operations per call of the Radix 4 FFT.  Bit
   reversals of the data are performed each call, so the timings include
   overhead from bit-reversals and the procedure calls.  They are insensitive 
   to overhead from initialization of arrays of Sin's and Cos's.

   The +O4 optimization gave the fastest HP f77 results.  The best GNAT/gcc
   results came from:

     gcc -c -gnatp -O3 -funroll-all-loops File_Name.adb

   On machines with fewer registers, the -fomit-frame-pointer gcc flag might
   be useful.  The -funroll-all-loops only helped a little  (1%).

   SUMMARY: 
   
   The GNAT/gcc compilation ran at about 80% to 110% the speed of the HP f77 
   compilations.  80% was typical in the common benchmark limits.  As the
   arrays grew larger, performance of the 2 grew comparable.  (Cache size
   limitations begin to matter in the limit of large arrays.)
   
2. On an array with 16384 points, with N = 16384 (the number of points the
   FFT is performed on) we get the following results in megaflops: 

    gcc -O3   43 megaflops

    f77 +O4   52 megaflops 

3. On an array with 2**15 + 219 points, with N = 2**15 (the number of points
   the FFT is performed on) we get the following results in megaflops: 

    gcc -O3   16 megaflops

    f77 +O4   16 megaflops 
  
   The + 219 is there because it reduces cache thrashing, improving
   performance.  Other FFT algorithms are needed to really get
   around these Cache problems. Here speed is limited by the fact that
   the FFT reads quasi randomly from an array that is to large for the
   Cache.

4. The FFT's are standard Radix 4 Cooley-Tukey FFT's.  All
   arithmetic is real.  The Fortran and Ada programs are virtually
   identical.  When the same program is rewritten to use complex
   arrays, (but still real arithmetic in the inner loops)
   then the GNAT/gcc compilation runs 15% slower.

	
--               Jonathan Parker (tcp0063@harrier.am.qub.ac.uk)
--               Department of Applied Mathematics and Theoretical Physics
--               The Queen's University of Belfast
-- 






^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~1996-04-05  0:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-04-01  0:00 Ada-95 for numerics? Joerg Rodemann
1996-04-01  0:00 ` Robert Dewar
1996-04-02  0:00   ` michael
1996-04-01  0:00 ` Ted Dennison
1996-04-01  0:00   ` Robert Dewar
1996-04-02  0:00 ` Dale Stanbrough
1996-04-03  0:00 ` Robert I. Eachus
  -- strict thread matches above, loose matches on Subject: below --
1996-04-02  0:00 Jean-Pierre Rosen
1996-04-03  0:00 ` Robert I. Eachus
1996-04-02  0:00 Joerg Rodemann
1996-04-02  0:00 ` Robert Dewar
1996-04-05  0:00  Dr J Parker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox