From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=0.7 required=5.0 tests=BAYES_00,INVALID_DATE,
	MSGID_SHORT,REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no
	version=3.4.4
Path: utzoo!attcan!utgpu!watmath!att!dptg!rutgers!cs.utexas.edu!uunet!philmtl!ncc!alberta!ccu!roseman
From: roseman@ccu.UManitoba.CA (roseman)
Newsgroups: comp.lang.ada
Subject: performance benchmarking
Message-ID: <275@ccu.UManitoba.CA>
Date: 9 Aug 89 00:27:50 GMT
Reply-To: roseman@ccu.UManitoba.CA ()
Organization: University of Manitoba, Winnipeg, Manitoba, Canada
List-Id: <comp.lang.ada>

I'm involved in doing a bit of performance benchmarking work on a couple
of Ada compilers.  I'm wondering if there is anyone else out there doing
similar kind of work.  I've also got a few questions.

Right now we're using the PIWG (Performance Issues Working Group) test
suite to do the tests.  This seems to be "the" standard Ada test suite
out there.  I'm wondering first off if people are using other tests,
and if so, what?  (Furthermore, where did they come from, why are you
using them, etc?)

Second, the machine we're running PIWG on is a Unix based system.  With
the PIWG, we're running into some problems.  The tests themselves are
very very short - total times including iteration is well under a second
for most of them!

The problem with that is its almost impossible to get any accurate
measurements that way - you've got all the little Unix daemons popping in
and out and using up some time.  We have tests which vary from 0 usecs
to almost 4 (per iteration that is), which is most unacceptable!

What can you do to correct things?  Run tests 25 (e.g.) times and take
the best?  The average?  Increase the iteration count to some ridiculous
amount to try to compensate?

I guess this is getting into general benchmarking procedures (any digests
or lists devoted to this out there?).. but how are tests like this supposed
to be used?  Surely, this must be an old problem.  You have various companies
out there who are publishing PIWG numbers for their compilers, but what
are they measuring?  Is it reasonable to measure on a souped-up system
(e.g. high priority, kill the daeomons), or do people want to see results
on a real Unix system?

If anyone has any answers, comments, pointers to any papers covering these
issues, etc. I would very much like to hear from you.  I ask only that if
you post to the list that you also send a copy to my userid directly, as
my time is so tight these days I can't keep up with the digest.  Thanks.

Mark Roseman, University of Manitoba
<ROSEMAN@ccu.UManitoba.CA> or <ROSEMAN@UOFMCC.BITNET>