From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,703c4f68db81387d X-Google-Thread: 115aec,703c4f68db81387d X-Google-Thread: f43e6,703c4f68db81387d X-Google-Thread: 108717,a7c8692cac750b5e,start X-Google-Attributes: gid103376,gid115aec,gidf43e6,gid108717,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news4.google.com!news.glorb.com!logbridge.uoregon.edu!newsfeed.stanford.edu!sn-xit-02!sn-xit-06!sn-post-02!sn-post-01!supernews.com!corp.supernews.com!not-for-mail From: CTips Newsgroups: comp.lang.ada,comp.realtime,comp.software-eng,comp.programming Subject: 10 rules for benchmarking (was Re: Teaching new tricks to an old dog (C++ -->Ada)) Date: Thu, 10 Mar 2005 23:58:22 -0500 Organization: Posted via Supernews, http://www.supernews.com Message-ID: <11329cb96h2p19f@corp.supernews.com> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217 X-Accept-Language: en-us, en MIME-Version: 1.0 References: <4229bad9$0$1019$afc38c87@news.optusnet.com.au> <1110032222.447846.167060@g14g2000cwa.googlegroups.com> <871xau9nlh.fsf@insalien.org> <3SjWd.103128$Vf.3969241@news000.worldonline.dk> <87r7iu85lf.fsf@insalien.org> <1110052142.832650@athnrd02> <1110284070.410136.205090@o13g2000cwo.googlegroups.com> <395uqaF5rhu2mU1@individual.net> <112rs0bdr2aftdf@corp.supernews.com> <1inxxr988rxgg$.1w9dedak41k89.dlg@40tude.net> <112s1r0rf0o8nca@corp.supernews.com> <112sonip5v4dca6@corp.supernews.com> <112t3de6fu04f38@corp.supernews.com> <1110396477.596174.285520@o13g2000cwo.googlegroups.com> <112vb2t8eonuhed@corp.supernews.com> <1110422108.925127.54110@o13g2000cwo.googlegroups.com> In-Reply-To: <1110422108.925127.54110@o13g2000cwo.googlegroups.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@supernews.com Xref: g2news1.google.com comp.lang.ada:9094 comp.realtime:1239 comp.software-eng:4807 comp.programming:17798 Date: 2005-03-10T23:58:22-05:00 List-Id: Since it appears that several people out here are making some very basic mistakes when they report benchmark numbers, I thought I'd write a small note on how to do benchmarking. Disclaimer: this stuff applies primarily to micro-benchmarks. It does not apply where the measurement is being done via a tool that uses built-in performance registers. Feel free to disregard the following recommendations if you know your system (OS+hardware+compiler) well enough to know when to break these rules. 1. Make sure your benchmark runs for at least 0.1 sec, and preferably 1.0.sec. - The timing method may only be accurate to 0.01 sec, in which case smaller numbers may have inaccuracies of > 10% - You will absorb any start-up costs [e.g., under cygwin, time reports that the null program takes 0.03s] 2. When running the benchmark program, * use an unloaded machine (i.e. as little else as possible running), * do multiple (3-5) runs * report the minimum time - The first couple of runs will warm the memory hierarchy, so cache misses/page misses don't add to the run-time. - Using an unloaded machine prevents pollution from other programs - If after multiple runs, the times do not settle down, consult an expert. 3. If you are investigating cache effects as part of your benchmarking efforts, make sure that you flush the cache between runs. This is best done by: * if there is an instruction that can flush the cache at one shot, use that. * if there is an instruction that can flush cache sets, figure out how to flush all sets in the cache. * otherwise just write to a large [2M+] array, and then read back from it. This is _NOT_ guaranteed to flush the cache, but should do a decent job. 4. Make sure that the compiler has not done something really smart, like figuring out that the benchmark code is dead, and then eliminating the entire function. * *Always* look at a disassembly of your function. * In C, the best way to avoid this is to - put the benchmark in one file - repeatedly call the benchmark function from a main() that is in a separate file - compile the two files separately. 5. Do not do any I/O in your code. - The cost of I/O can vary unpredictably across runs - It can dominate the cost of the run (i.e. in a 1 sec run, the I/O might be 0.999s) * Instead of reading data from a file, use a const array. * Instead of printing out values, xor them together and return them from main. 6. Replace the benchmark function with the null function, and time the code * This tells you how much overhead you have. * Subtract this number to get the benchmark number. 7. If you're using arrays make sure you vary the size of the arrays. - you should see performance abruptly change as you grow past the sizes of the L1/L2 caches. - you MAY see the odd situation when you have multiple arrays (or multple references within the same array) where the performance for size N is much worse than the performance for N-16 or N+16. This may be because the references in the same iteration of the loop all map to the same cache set, and there are more references than associativity 8. If you've got branchy code, make sure that you choose input to appropriately exercise the branch predictor. - if you have actual input sequences, use those - if you have actual input values, but not sequences, you may want to randomize the order in which they are used. - you can report the best and worst case numbers separately. * for best case, send the same input multiple times, before switching to the next input. * for worst case, successive inputs should exercise different paths [Theres more to it, but this should suffice]. 9. Check for sanity by varying the number of times the benchmark function is called * I like to call it 0.5x,2x, and 10x the base number * if the the time does not scale almost linearly with the number of calls, talk to an expert 10. Make sure that your numbers are reproducible: report - your times - your hardware - your system * OS (with version) * compiler (with version) - the compiler flags used