From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,703c4f68db81387d
X-Google-Thread: 115aec,703c4f68db81387d
X-Google-Thread: f43e6,703c4f68db81387d
X-Google-Thread: 108717,a7c8692cac750b5e
X-Google-Attributes: gid103376,gid115aec,gidf43e6,gid108717,public
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!news4.google.com!newsfeed.stanford.edu!sn-xit-02!sn-xit-12!sn-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
From: CTips <ctips@bestweb.net>
Newsgroups: comp.lang.ada,comp.realtime,comp.software-eng,comp.programming
Subject: Re: 10 rules for benchmarking (was Re: Teaching new tricks to an
 old dog (C++ -->Ada))
Date: Sat, 12 Mar 2005 08:39:50 -0500
Organization: Posted via Supernews, http://www.supernews.com
Message-ID: <1135sbpj9e9cd79@corp.supernews.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.7.5) Gecko/20041217
X-Accept-Language: en-us, en
MIME-Version: 1.0
References: <4229bad9$0$1019$afc38c87@news.optusnet.com.au>
   <871xau9nlh.fsf@insalien.org>
   <3SjWd.103128$Vf.3969241@news000.worldonline.dk>
   <87r7iu85lf.fsf@insalien.org>   <1110052142.832650@athnrd02>
   <d0dkq6$781$1@titan.btinternet.com>
   <1110284070.410136.205090@o13g2000cwo.googlegroups.com>
   <395uqaF5rhu2mU1@individual.net>   <112rs0bdr2aftdf@corp.supernews.com>
   <1inxxr988rxgg$.1w9dedak41k89.dlg@40tude.net>
   <112s1r0rf0o8nca@corp.supernews.com>   <d0l2gq$egt$1@titan.btinternet.com>
   <112sonip5v4dca6@corp.supernews.com>
   <E1vXd.352556$w62.20389@bgtnsc05-news.ops.worldnet.att.net>
   <112t3de6fu04f38@corp.supernews.com>
   <1110396477.596174.285520@o13g2000cwo.googlegroups.com>
   <112vb2t8eonuhed@corp.supernews.com>
 <1110422108.925127.54110@o13g2000cwo.googlegroups.com>
 <11329cb96h2p19f@corp.supernews.com> <c92s0d.vgk.ln@hunter.axlog.fr>
 <113394jjvppao64@corp.supernews.com>
 <42329704$0$1096$9b4e6d93@newsread2.arcor-online.net>
In-Reply-To: <42329704$0$1096$9b4e6d93@newsread2.arcor-online.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@supernews.com
Xref: g2news1.google.com comp.lang.ada:9231 comp.realtime:1363
 comp.software-eng:4924 comp.programming:17865
Date: 2005-03-12T08:39:50-05:00
List-Id: <comp.lang.ada>

Georg Bauhaus wrote:

> CTips wrote:
> 
>> Jean-Pierre Rosen wrote:
>>
>>> CTips a �crit :
>>>
>>>>
>>>> Since it appears that several people out here are making some very 
>>>> basic mistakes when they report benchmark numbers, I thought I'd 
>>>> write a small note on how to do benchmarking.
>>>>
>>> [lots of good stuff deleted]
>>>
>>> You are absolutely right, but there is one thing that this benchmark 
>>> *proves*:
>>>
>>> Blindly claiming that C is faster than Ada is not supported by hard 
>>> figures.
>>>
>>
>> Actually, it is.
> 
> 
> Seems not so easy as you claim it to be.

<grin> It seemed easy for me.

> I have done some more tests following your advice.
> Though, I find my computer far too complicated for me to understand
> or predict, even speculating about caches and TLB. Can a
> PC be described in sufficient detail to say anything precise
> or to write predictable assembly language programs for all kinds
> of situations with PC OSs?

Umm good question. I know very little about the specifics about the x86 
architecture implementations, but I do know about how uPs in general are 
designed. I'll try and answer this question in a separate note. I think 
the answer is, yes, it can be described adequately.


> Anyway, here goes.

This is pretty good work.

> The starred columns show seconds with checks turned off
> (in both languages).

> Your program has, at C module level,
> int perm[N];
> so I did the same in Ada, results are in the T and TC columns.
> The TU and TUC columns show seconds when dynamically allocated
> array are used, from an unconstrained array type.
> 
>                      C             |             Ada
> N        R       *T*         TC      *T*      TC      TU      TUC
> 4000     50000   1.01        2.46    1.01     1.67    1.65    2.0
> 8000     25000   1.01 (2.31) 2.45    1.01     1.67    1.65    2.0
> 10000    20000   1.01        2.46    1.01     1.65    1.65    2.0
> 20000    10000   1.02 (2.32) 2.48    1.02+    1.7++   2.5-    2.0
> 25000    8000    1.7++       2.46++  1.78++   2.6+--  2.6+-   2.3++-
> 40000    5000    4.6+        5.48+   4.6++    5.0+-   5.5+-   5.1+-
> 50000    4000    6.1  (6.4+) 6.7     6.1      6.4+-   5.9+-   7.1-
> 100000   2000    7.4         7.25    7.2      7.0     7.0     7.2
> 200000   1000    7.38        7.26    7.22     6.9     6.8     7.1

For some reason your numbers at the higher end are anamolous. The 
checked code should not take less time than the unchecked code. There is 
something else going on.

You're running on a 2-processor system, so you could be ping-ponging 
between processors. Does the version of linux you are running have 
processor affinity? Also, if you're running on an loaded system, this 
would explain things.

I would rerun the numbers on a machine with less load before I concluded 
anything.

> N: number of integers in the arrays
> 
> R: number of runs (calls of do_perm)
> 
> T: time without checks
> 
> TC: time with checks
>   C = checking handcoded conditionals. In () short circuit checks
>   Ada = compiler switches turning all checks on
> 
> TU: time, unconstrained Ada arrays (dynamically allocated)
> TUC: again, with all checks on
> 
> + and - indicate strong deviations in running time in:
> $ while true ; do time ./perm ; done
> (I guess the fluctuations might indicate where the something becomes
> full or loaded.)
> The numbers given are the shortest times except when the results
> were jumpy and showed a faster result just once in a long series.
> 
> There is 1G of memory in the machine, and two PIII at 800 MHz.
> The OS (GNU/Linux 2.6.8) has been operating in single user mode,
> unloaded, compiler is GCC 4.0.0 20050215 in all cases.
> 
> gcc -O2 -ansi -pedantic
> gnatmake -O2 -gnatwa -gnatp
> gnatmake -O2 -gnatwa -gnatVa -gnato
> 
> I have used your C code for the C-no-checks test. For the checks
> test I modified the commented conditional (see below).

Your check could be better, see why below.

> The reason is that it should match the array range checking in Ada
> and other languages, where semantically each array has two bounds,
> for a total of four for perm[] and val[].  In (some) theory.
> So I wrote a dumb test, and a slightly less dumb test.
> 
> The parenthesised times in the table have been measured after short
> circuit operators had replaced the bit operators.
> Obviously, when I want to compare checking of unknown bounds, I
> have to pass bounds to do_perms(). Just comparing the loop counter to
> N isn't the same thing. The parameters are named perm_n and val_n.
> 
>     /**/
>     if ( i < 0 | i >= perm_n ) {
>       abort();
>     }
>     if( perm[i] < 0 | perm[i] >= val_n) {
>       abort();
>     }
>     /**/
> 

> For sure this is not the optimal or necessary check in this
> loop, but for the test case it provides for comparing _unconstrained_
> array types.
> Otherwise 0 as the first index in C had been used a built-in
> constraint.

Instead of using
   if( i < 0 && i > N ) { abort(); }
Try using
   if( (unsigned) i > N ) {abort(); }
Its the same check, only a lot faster. This probably explains any 
difference between checked and unchecked C. [It might be interesting to 
look at the assembler output for Ada and see if thats what they are using].


> (With this particular loop the checking can be made faster by
> reversing the order of comparisons, as for example i < 0 will
> invariably be false.)
> 
> Ada text, the dynamically allocated arrays:
> 
>    type Perm_Array is array(Natural range <>)  of Integer;
> ...
>    setup_perm(perm.all);
>    for rep in 1 .. R  loop
>       do_perm(val.all, perm.all);
>    end loop;
> 
> My conclusion is that for some data the C version is faster,
> and that for some data the Ada version is faster, on this
> machine.

Ada is not faster- the results where Ada is faster are not trustworthy. 
Replace the C checks with the alternate form, and run on an *unloaded* 
machine.

> It takes a lot of time to really find out what
> and why. Still, you get fast code from either language front
> end, so if an Ada compiler will check more things at
> compile time, support more programming language, and you want
> your compiler to do this, it can be a plus.

And if you want performance, use C instead.

> Adding a complete set of runtime checks can be quite expensive,
> some lines in the table show this. (GNAT doesn't add all
> checks by default.)
> Paradoxically, for both C and Ada, checks seem to speed up
> processing (see the last line) when the amount of data
> is growing.

I think that is unlikely. Its much more likely that you're seeing 
effects from something else that is happenning at the same time.

> The unconstrained array version has been faster with some data
> than the constrained array type version...
> 
> Both versions can run more quickly on this machine when
> the arrays can live on the stack. :-)
> 
> Georg
> 
> 
>> In most cases, it is possible to get close to assembly performance out 
>> of C [one major case where it isn't is when you want to use 
>> labels-as-addresses].
> 
> 
> I take it you mean the one best performing assembly language
> program found by a very clever programmer beating all optimizers.
> Possible, but how do you get there systematically?
> Given that you can insert assembly instructions in a number of PLs,
> this alone is not a competitive argument.
> 
>> In fact, it is _NOT_ possible to beat that number using any language.
> 
> 
> For this to be true, someone would have to actually prove
> that a given assembly language program cannot be beaten by
> any other assembly language program taking the same set
> of inputs to outputs, and react in the same ways that the
> given assembly language program does. Possible, o.K., but
> feasably for any reasonably sized program?
> 
>> If there are any run-time checks added by the Ada compiler, then the 
>> performance will not be the assembly level performance.
> 
> 
> How do you explain, in terms of assembly language instructions,
> the speedup in the last line of the table above (T -> TC)?
> (I have double-checked them for both languages.)
> 
>> Consequently, either one must program so that no checks need to be 
>> added OR one must disable all run-time checking.
> 
> 
> 
>> BTW - all the psuedo-benchmarking people have done on this thread so 
>> far has provided meaningless numbers - too small run-times, 
>> performance dominated by cache misses etc.
> 
> 
> And VERY_LARGE_NUMBER meaning not too large for ... :-)
> 
> Georg
>