From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,703c4f68db81387d
X-Google-Thread: 115aec,703c4f68db81387d
X-Google-Thread: f43e6,703c4f68db81387d
X-Google-Thread: 108717,a7c8692cac750b5e
X-Google-Attributes: gid103376,gid115aec,gidf43e6,gid108717,public
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!news1.google.com!proxad.net!newsfeed.stueberl.de!uucp.gnuu.de!newsfeed.arcor.de!news.arcor.de!not-for-mail
Date: Sat, 12 Mar 2005 08:16:13 +0100
From: Georg Bauhaus <sb463ba@user1.uni-duisburg.de>
User-Agent: Debian Thunderbird 1.0 (X11/20050116)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada,comp.realtime,comp.software-eng,comp.programming
Subject: Re: 10 rules for benchmarking (was Re: Teaching new tricks to an
 old dog (C++ -->Ada))
References: <4229bad9$0$1019$afc38c87@news.optusnet.com.au>
   <1110032222.447846.167060@g14g2000cwa.googlegroups.com>
   <871xau9nlh.fsf@insalien.org>
   <3SjWd.103128$Vf.3969241@news000.worldonline.dk>
   <87r7iu85lf.fsf@insalien.org>   <1110052142.832650@athnrd02>
   <d0dkq6$781$1@titan.btinternet.com>
   <1110284070.410136.205090@o13g2000cwo.googlegroups.com>
   <395uqaF5rhu2mU1@individual.net>   <112rs0bdr2aftdf@corp.supernews.com>
   <1inxxr988rxgg$.1w9dedak41k89.dlg@40tude.net>
   <112s1r0rf0o8nca@corp.supernews.com>   <d0l2gq$egt$1@titan.btinternet.com>
   <112sonip5v4dca6@corp.supernews.com>
   <E1vXd.352556$w62.20389@bgtnsc05-news.ops.worldnet.att.net>
   <112t3de6fu04f38@corp.supernews.com>
   <1110396477.596174.285520@o13g2000cwo.googlegroups.com>
   <112vb2t8eonuhed@corp.supernews.com>
 <1110422108.925127.54110@o13g2000cwo.googlegroups.com>
 <11329cb96h2p19f@corp.supernews.com> <c92s0d.vgk.ln@hunter.axlog.fr>
 <113394jjvppao64@corp.supernews.com>
In-Reply-To: <113394jjvppao64@corp.supernews.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Message-ID: <42329704$0$1096$9b4e6d93@newsread2.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 12 Mar 2005 08:15:17 MET
NNTP-Posting-Host: ad331e10.newsread2.arcor-online.net
X-Trace: 
 DXC=FR:3Ii]:h1DJ2=Pn8=T1[JQ5U85hF6f;DjW\KbG]kaMHliQbn6H@_EIN71C_Yj7TQN8JM^O\[iIdC^0dG=GSCo8D
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:9202 comp.realtime:1338
 comp.software-eng:4905 comp.programming:17852
Date: 2005-03-12T08:15:17+01:00
List-Id: <comp.lang.ada>

CTips wrote:
> Jean-Pierre Rosen wrote:
> 
>> CTips a �crit :
>>
>>>
>>> Since it appears that several people out here are making some very 
>>> basic mistakes when they report benchmark numbers, I thought I'd 
>>> write a small note on how to do benchmarking.
>>>
>> [lots of good stuff deleted]
>>
>> You are absolutely right, but there is one thing that this benchmark 
>> *proves*:
>>
>> Blindly claiming that C is faster than Ada is not supported by hard 
>> figures.
>>
> 
> Actually, it is.

Seems not so easy as you claim it to be.
I have done some more tests following your advice.
Though, I find my computer far too complicated for me to understand
or predict, even speculating about caches and TLB. Can a
PC be described in sufficient detail to say anything precise
or to write predictable assembly language programs for all kinds
of situations with PC OSs?

Anyway, here goes.
The starred columns show seconds with checks turned off
(in both languages).

Your program has, at C module level,
int perm[N];
so I did the same in Ada, results are in the T and TC columns.
The TU and TUC columns show seconds when dynamically allocated
array are used, from an unconstrained array type.

                      C             |             Ada
N        R       *T*         TC      *T*      TC      TU      TUC
4000     50000   1.01        2.46    1.01     1.67    1.65    2.0
8000     25000   1.01 (2.31) 2.45    1.01     1.67    1.65    2.0
10000    20000   1.01        2.46    1.01     1.65    1.65    2.0
20000    10000   1.02 (2.32) 2.48    1.02+    1.7++   2.5-    2.0
25000    8000    1.7++       2.46++  1.78++   2.6+--  2.6+-   2.3++-
40000    5000    4.6+        5.48+   4.6++    5.0+-   5.5+-   5.1+-
50000    4000    6.1  (6.4+) 6.7     6.1      6.4+-   5.9+-   7.1-
100000   2000    7.4         7.25    7.2      7.0     7.0     7.2
200000   1000    7.38        7.26    7.22     6.9     6.8     7.1

N: number of integers in the arrays

R: number of runs (calls of do_perm)

T: time without checks

TC: time with checks
   C = checking handcoded conditionals. In () short circuit checks
   Ada = compiler switches turning all checks on

TU: time, unconstrained Ada arrays (dynamically allocated)
TUC: again, with all checks on

+ and - indicate strong deviations in running time in:
$ while true ; do time ./perm ; done
(I guess the fluctuations might indicate where the something becomes
full or loaded.)
The numbers given are the shortest times except when the results
were jumpy and showed a faster result just once in a long series.

There is 1G of memory in the machine, and two PIII at 800 MHz.
The OS (GNU/Linux 2.6.8) has been operating in single user mode,
unloaded, compiler is GCC 4.0.0 20050215 in all cases.

gcc -O2 -ansi -pedantic
gnatmake -O2 -gnatwa -gnatp
gnatmake -O2 -gnatwa -gnatVa -gnato

I have used your C code for the C-no-checks test. For the checks
test I modified the commented conditional (see below).
The reason is that it should match the array range checking in Ada
and other languages, where semantically each array has two bounds,
for a total of four for perm[] and val[].  In (some) theory.
So I wrote a dumb test, and a slightly less dumb test.

The parenthesised times in the table have been measured after short
circuit operators had replaced the bit operators.
Obviously, when I want to compare checking of unknown bounds, I
have to pass bounds to do_perms(). Just comparing the loop counter to
N isn't the same thing. The parameters are named perm_n and val_n.

     /**/
     if ( i < 0 | i >= perm_n ) {
       abort();
     }
     if( perm[i] < 0 | perm[i] >= val_n) {
       abort();
     }
     /**/

For sure this is not the optimal or necessary check in this
loop, but for the test case it provides for comparing _unconstrained_
array types.
Otherwise 0 as the first index in C had been used a built-in
constraint.

(With this particular loop the checking can be made faster by
reversing the order of comparisons, as for example i < 0 will
invariably be false.)

Ada text, the dynamically allocated arrays:

    type Perm_Array is array(Natural range <>)  of Integer;
...
    setup_perm(perm.all);
    for rep in 1 .. R  loop
       do_perm(val.all, perm.all);
    end loop;

My conclusion is that for some data the C version is faster,
and that for some data the Ada version is faster, on this
machine. It takes a lot of time to really find out what
and why. Still, you get fast code from either language front
end, so if an Ada compiler will check more things at
compile time, support more programming language, and you want
your compiler to do this, it can be a plus.

Adding a complete set of runtime checks can be quite expensive,
some lines in the table show this. (GNAT doesn't add all
checks by default.)
Paradoxically, for both C and Ada, checks seem to speed up
processing (see the last line) when the amount of data
is growing.

The unconstrained array version has been faster with some data
than the constrained array type version...

Both versions can run more quickly on this machine when
the arrays can live on the stack. :-)

Georg


> In most cases, it is possible to get close to assembly performance out 
> of C [one major case where it isn't is when you want to use 
> labels-as-addresses].

I take it you mean the one best performing assembly language
program found by a very clever programmer beating all optimizers.
Possible, but how do you get there systematically?
Given that you can insert assembly instructions in a number of PLs,
this alone is not a competitive argument.

> In fact, it is _NOT_ possible to beat that number 
> using any language.

For this to be true, someone would have to actually prove
that a given assembly language program cannot be beaten by
any other assembly language program taking the same set
of inputs to outputs, and react in the same ways that the
given assembly language program does. Possible, o.K., but
feasably for any reasonably sized program?

> If there are any run-time checks added by the Ada 
> compiler, then the performance will not be the assembly level 
> performance.

How do you explain, in terms of assembly language instructions,
the speedup in the last line of the table above (T -> TC)?
(I have double-checked them for both languages.)

> Consequently, either one must program so that no checks 
> need to be added OR one must disable all run-time checking.


> BTW - all the psuedo-benchmarking people have done on this thread so far 
> has provided meaningless numbers - too small run-times, performance 
> dominated by cache misses etc.

And VERY_LARGE_NUMBER meaning not too large for ... :-)

Georg