From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,703c4f68db81387d X-Google-Thread: 115aec,703c4f68db81387d X-Google-Thread: f43e6,703c4f68db81387d X-Google-Thread: 108717,a7c8692cac750b5e X-Google-Attributes: gid103376,gid115aec,gidf43e6,gid108717,public X-Google-Language: ENGLISH,ASCII Path: g2news1.google.com!news1.google.com!proxad.net!newsfeed.stueberl.de!uucp.gnuu.de!newsfeed.arcor.de!news.arcor.de!not-for-mail Date: Sat, 12 Mar 2005 08:16:13 +0100 From: Georg Bauhaus User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 Newsgroups: comp.lang.ada,comp.realtime,comp.software-eng,comp.programming Subject: Re: 10 rules for benchmarking (was Re: Teaching new tricks to an old dog (C++ -->Ada)) References: <4229bad9$0$1019$afc38c87@news.optusnet.com.au> <1110032222.447846.167060@g14g2000cwa.googlegroups.com> <871xau9nlh.fsf@insalien.org> <3SjWd.103128$Vf.3969241@news000.worldonline.dk> <87r7iu85lf.fsf@insalien.org> <1110052142.832650@athnrd02> <1110284070.410136.205090@o13g2000cwo.googlegroups.com> <395uqaF5rhu2mU1@individual.net> <112rs0bdr2aftdf@corp.supernews.com> <1inxxr988rxgg$.1w9dedak41k89.dlg@40tude.net> <112s1r0rf0o8nca@corp.supernews.com> <112sonip5v4dca6@corp.supernews.com> <112t3de6fu04f38@corp.supernews.com> <1110396477.596174.285520@o13g2000cwo.googlegroups.com> <112vb2t8eonuhed@corp.supernews.com> <1110422108.925127.54110@o13g2000cwo.googlegroups.com> <11329cb96h2p19f@corp.supernews.com> <113394jjvppao64@corp.supernews.com> In-Reply-To: <113394jjvppao64@corp.supernews.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Message-ID: <42329704$0$1096$9b4e6d93@newsread2.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 12 Mar 2005 08:15:17 MET NNTP-Posting-Host: ad331e10.newsread2.arcor-online.net X-Trace: DXC=FR:3Ii]:h1DJ2=Pn8=T1[JQ5U85hF6f;DjW\KbG]kaMHliQbn6H@_EIN71C_Yj7TQN8JM^O\[iIdC^0dG=GSCo8D X-Complaints-To: abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:9202 comp.realtime:1338 comp.software-eng:4905 comp.programming:17852 Date: 2005-03-12T08:15:17+01:00 List-Id: CTips wrote: > Jean-Pierre Rosen wrote: > >> CTips a �crit : >> >>> >>> Since it appears that several people out here are making some very >>> basic mistakes when they report benchmark numbers, I thought I'd >>> write a small note on how to do benchmarking. >>> >> [lots of good stuff deleted] >> >> You are absolutely right, but there is one thing that this benchmark >> *proves*: >> >> Blindly claiming that C is faster than Ada is not supported by hard >> figures. >> > > Actually, it is. Seems not so easy as you claim it to be. I have done some more tests following your advice. Though, I find my computer far too complicated for me to understand or predict, even speculating about caches and TLB. Can a PC be described in sufficient detail to say anything precise or to write predictable assembly language programs for all kinds of situations with PC OSs? Anyway, here goes. The starred columns show seconds with checks turned off (in both languages). Your program has, at C module level, int perm[N]; so I did the same in Ada, results are in the T and TC columns. The TU and TUC columns show seconds when dynamically allocated array are used, from an unconstrained array type. C | Ada N R *T* TC *T* TC TU TUC 4000 50000 1.01 2.46 1.01 1.67 1.65 2.0 8000 25000 1.01 (2.31) 2.45 1.01 1.67 1.65 2.0 10000 20000 1.01 2.46 1.01 1.65 1.65 2.0 20000 10000 1.02 (2.32) 2.48 1.02+ 1.7++ 2.5- 2.0 25000 8000 1.7++ 2.46++ 1.78++ 2.6+-- 2.6+- 2.3++- 40000 5000 4.6+ 5.48+ 4.6++ 5.0+- 5.5+- 5.1+- 50000 4000 6.1 (6.4+) 6.7 6.1 6.4+- 5.9+- 7.1- 100000 2000 7.4 7.25 7.2 7.0 7.0 7.2 200000 1000 7.38 7.26 7.22 6.9 6.8 7.1 N: number of integers in the arrays R: number of runs (calls of do_perm) T: time without checks TC: time with checks C = checking handcoded conditionals. In () short circuit checks Ada = compiler switches turning all checks on TU: time, unconstrained Ada arrays (dynamically allocated) TUC: again, with all checks on + and - indicate strong deviations in running time in: $ while true ; do time ./perm ; done (I guess the fluctuations might indicate where the something becomes full or loaded.) The numbers given are the shortest times except when the results were jumpy and showed a faster result just once in a long series. There is 1G of memory in the machine, and two PIII at 800 MHz. The OS (GNU/Linux 2.6.8) has been operating in single user mode, unloaded, compiler is GCC 4.0.0 20050215 in all cases. gcc -O2 -ansi -pedantic gnatmake -O2 -gnatwa -gnatp gnatmake -O2 -gnatwa -gnatVa -gnato I have used your C code for the C-no-checks test. For the checks test I modified the commented conditional (see below). The reason is that it should match the array range checking in Ada and other languages, where semantically each array has two bounds, for a total of four for perm[] and val[]. In (some) theory. So I wrote a dumb test, and a slightly less dumb test. The parenthesised times in the table have been measured after short circuit operators had replaced the bit operators. Obviously, when I want to compare checking of unknown bounds, I have to pass bounds to do_perms(). Just comparing the loop counter to N isn't the same thing. The parameters are named perm_n and val_n. /**/ if ( i < 0 | i >= perm_n ) { abort(); } if( perm[i] < 0 | perm[i] >= val_n) { abort(); } /**/ For sure this is not the optimal or necessary check in this loop, but for the test case it provides for comparing _unconstrained_ array types. Otherwise 0 as the first index in C had been used a built-in constraint. (With this particular loop the checking can be made faster by reversing the order of comparisons, as for example i < 0 will invariably be false.) Ada text, the dynamically allocated arrays: type Perm_Array is array(Natural range <>) of Integer; ... setup_perm(perm.all); for rep in 1 .. R loop do_perm(val.all, perm.all); end loop; My conclusion is that for some data the C version is faster, and that for some data the Ada version is faster, on this machine. It takes a lot of time to really find out what and why. Still, you get fast code from either language front end, so if an Ada compiler will check more things at compile time, support more programming language, and you want your compiler to do this, it can be a plus. Adding a complete set of runtime checks can be quite expensive, some lines in the table show this. (GNAT doesn't add all checks by default.) Paradoxically, for both C and Ada, checks seem to speed up processing (see the last line) when the amount of data is growing. The unconstrained array version has been faster with some data than the constrained array type version... Both versions can run more quickly on this machine when the arrays can live on the stack. :-) Georg > In most cases, it is possible to get close to assembly performance out > of C [one major case where it isn't is when you want to use > labels-as-addresses]. I take it you mean the one best performing assembly language program found by a very clever programmer beating all optimizers. Possible, but how do you get there systematically? Given that you can insert assembly instructions in a number of PLs, this alone is not a competitive argument. > In fact, it is _NOT_ possible to beat that number > using any language. For this to be true, someone would have to actually prove that a given assembly language program cannot be beaten by any other assembly language program taking the same set of inputs to outputs, and react in the same ways that the given assembly language program does. Possible, o.K., but feasably for any reasonably sized program? > If there are any run-time checks added by the Ada > compiler, then the performance will not be the assembly level > performance. How do you explain, in terms of assembly language instructions, the speedup in the last line of the table above (T -> TC)? (I have double-checked them for both languages.) > Consequently, either one must program so that no checks > need to be added OR one must disable all run-time checking. > BTW - all the psuedo-benchmarking people have done on this thread so far > has provided meaningless numbers - too small run-times, performance > dominated by cache misses etc. And VERY_LARGE_NUMBER meaning not too large for ... :-) Georg