From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,703c4f68db81387d
X-Google-Thread: 115aec,703c4f68db81387d
X-Google-Thread: f43e6,703c4f68db81387d
X-Google-Thread: 108717,a7c8692cac750b5e,start
X-Google-Attributes: gid103376,gid115aec,gidf43e6,gid108717,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news4.google.com!news.glorb.com!logbridge.uoregon.edu!newsfeed.stanford.edu!sn-xit-02!sn-xit-06!sn-post-02!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
From: CTips <ctips@bestweb.net>
Newsgroups: comp.lang.ada,comp.realtime,comp.software-eng,comp.programming
Subject: 10 rules for benchmarking (was Re: Teaching new tricks to an old
 dog (C++ -->Ada))
Date: Thu, 10 Mar 2005 23:58:22 -0500
Organization: Posted via Supernews, http://www.supernews.com
Message-ID: <11329cb96h2p19f@corp.supernews.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.7.5) Gecko/20041217
X-Accept-Language: en-us, en
MIME-Version: 1.0
References: <4229bad9$0$1019$afc38c87@news.optusnet.com.au>
   <1110032222.447846.167060@g14g2000cwa.googlegroups.com>
   <871xau9nlh.fsf@insalien.org>
   <3SjWd.103128$Vf.3969241@news000.worldonline.dk>
   <87r7iu85lf.fsf@insalien.org>   <1110052142.832650@athnrd02>
   <d0dkq6$781$1@titan.btinternet.com>
   <1110284070.410136.205090@o13g2000cwo.googlegroups.com>
   <395uqaF5rhu2mU1@individual.net>   <112rs0bdr2aftdf@corp.supernews.com>
   <1inxxr988rxgg$.1w9dedak41k89.dlg@40tude.net>
   <112s1r0rf0o8nca@corp.supernews.com>   <d0l2gq$egt$1@titan.btinternet.com>
   <112sonip5v4dca6@corp.supernews.com>
   <E1vXd.352556$w62.20389@bgtnsc05-news.ops.worldnet.att.net>
   <112t3de6fu04f38@corp.supernews.com>
   <1110396477.596174.285520@o13g2000cwo.googlegroups.com>
   <112vb2t8eonuhed@corp.supernews.com>
 <1110422108.925127.54110@o13g2000cwo.googlegroups.com>
In-Reply-To: <1110422108.925127.54110@o13g2000cwo.googlegroups.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@supernews.com
Xref: g2news1.google.com comp.lang.ada:9094 comp.realtime:1239
 comp.software-eng:4807 comp.programming:17798
Date: 2005-03-10T23:58:22-05:00
List-Id: <comp.lang.ada>


Since it appears that several people out here are making some very basic 
mistakes when they report benchmark numbers, I thought I'd write a small 
note on how to do benchmarking.

Disclaimer: this stuff applies primarily to micro-benchmarks. It does 
not apply where the measurement is being done via a tool that uses 
built-in performance registers.

Feel free to disregard the following recommendations if you know your 
system (OS+hardware+compiler) well enough to know when to break these rules.

1. Make sure your benchmark runs for at least 0.1 sec, and preferably 
1.0.sec.
   - The timing method may only be accurate to 0.01 sec, in which case 
smaller numbers may have inaccuracies of > 10%
   - You will absorb any start-up costs [e.g., under cygwin, time 
reports that the null program takes 0.03s]

2. When running the benchmark program,
   * use an unloaded machine (i.e. as little else as possible running),
   * do multiple (3-5) runs
   * report the minimum time
  - The first couple of runs will warm the memory hierarchy, so cache 
misses/page misses don't add to the run-time.
  - Using an unloaded machine prevents pollution from other programs
  - If after multiple runs, the times do not settle down, consult an expert.

3. If you are investigating cache effects as part of your benchmarking 
efforts, make sure that you flush the cache between runs. This is best 
done by:
   * if there is an instruction that can flush the cache at one shot, 
use that.
   * if there is an instruction that can flush cache sets, figure out 
how to flush all sets in the cache.
   * otherwise just write to a large [2M+] array, and then read back 
from  it. This is _NOT_ guaranteed to flush the cache, but should do a 
decent job.

4. Make sure that the compiler has not done something really smart, like 
figuring out that the benchmark code is dead, and then eliminating the 
entire function.
   * *Always* look at a disassembly of your function.
   * In C, the best way to avoid this is to
     - put the benchmark in one file
     - repeatedly call the benchmark function from a main() that is in a 
separate file
      - compile the two files separately.

5. Do not do any I/O in your code.
   - The cost of I/O can vary unpredictably across runs
   - It can dominate the cost of the run (i.e. in a 1 sec run, the I/O 
might be 0.999s)
  * Instead of reading data from a file, use a const array.
  * Instead of printing out values, xor them together and return them 
from main.

6. Replace the benchmark function with the null function, and time the code
   * This tells you how much overhead you have.
   * Subtract this number to get the benchmark number.

7. If you're using arrays make sure you vary the size of the arrays.
   - you should see performance abruptly change as you grow past the 
sizes of the L1/L2 caches.
   - you MAY see the odd situation when you have multiple arrays (or 
multple references within the same array) where the performance for size 
N is much worse than the performance for N-16 or N+16. This may be 
because the references in the same iteration of the loop all map to the 
same cache set, and there are more references than associativity

8. If you've got branchy code, make sure that you choose input to 
appropriately exercise the branch predictor.
   - if you have actual input sequences, use those
   - if you have actual input values, but not sequences, you may want to 
randomize the order in which they are used.
   - you can report the best and worst case numbers separately.
     * for best case, send the same input multiple times, before 
switching to the next input.
     * for worst case, successive inputs should exercise different paths 
[Theres more to it, but this should suffice].

9. Check for sanity by varying the number of times the benchmark 
function is called
   * I like to call it 0.5x,2x, and 10x the base number
   * if the the time does not scale almost linearly with the number of 
calls, talk to an expert

10. Make sure that your numbers are reproducible: report
   - your times
   - your hardware
   - your system
     * OS (with version)
     * compiler (with version)
    - the compiler flags used