From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: * X-Spam-Status: No, score=1.6 required=5.0 tests=BAYES_00,INVALID_DATE, MSGID_SHORT,TO_NO_BRKTS_PCNT autolearn=no autolearn_force=no version=3.4.4 Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site spp1.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!sdcrdcf!trwrb!trwspp!spp2!spp1!colbert From: colbert@spp1.UUCP Newsgroups: net.lang.ada Subject: LA AdaTEC Ada Fair '84 Report Message-ID: <134@spp1.UUCP> Date: Fri, 7-Dec-84 10:04:49 EST Article-I.D.: spp1.134 Posted: Fri Dec 7 10:04:49 1984 Date-Received: Sun, 9-Dec-84 05:41:44 EST Distribution: net Organization: TRW, Redondo Beach CA List-Id: Report on the L.A. AdaTEC Ada* Fair '84 Compiler Test Results Bryce M. Bardin Hughes Aircraft Company Software Engineering Division Ground Systems Group Fullerton, CA On June 30th, 1984, L.A. AdaTEC held its second annual Ada Fair. Again this year, compiler vendors were invited to run a suite of test programs selected by L.A. AdaTEC. Each vendor was asked to report his own results in accordance with the set of rules which were supplied with the test suite. This report summarizes the results reported by the vendors. Source listings of the programs and copies of the rules were distributed to the people who attended the Fair. They are now available on the ARPAnet by logging into EV-INFORMATION at ECLB (with a password of EV) and typing "HELP TESTS-ADA-FAIR-84" or by FTPing TESTS-ADA-FAIR-84.HLP. As an alternative, L.A. AdaTEC, in the person of Ed Colbert, will mail you the tests over usenet if you contact him at "trwrb!trwspp!colbert". The test suite was assembled by Ed Colbert (TRW), Gerry Fisher (IBM Research), and me. The vendors who participated by running the tests were: 1) Data General Corporation (DG), running the DGC/Rolm ADE compiler on a DG MV8000 under AOS/VS, 2) Irvine Computer Sciences Corporation (ICSC), running the ICSC-Ada Compiler on a Gould 32/87, and 3) RR Software, Inc. (RR), running the JANUS/Ada compiler on an IBM PC-XT under DOS. This year, with the advent of more validated compilers, the tests were chosen without trying to limit the Ada constructs used in any way. The intent of the suite was to reveal the current status of Ada implementations to the entire Ada community, to the extent this is possible with a very small set of tests. * Ada is a registered trademark of the U.S. Government, Ada Joint Program Office. -2- Since we wished to enable vendors and end users alike to make simple performance comparisons on a uniform and equitable basis, we assumed that package Calendar was implemented. Because evaluation of the differences in performance which depend on slight differences in source code is almost impossible, we established the rule that making unauthorized changes to any test automatically removes a vendor from consideration on that test. Additionally, in order to challenge the vendors of validated compilers a bit, we included a few tests of features that are needed in order to build serious real-time embedded systems -- features that only a rather complete Ada implementation would be likely to support. Where possible, the tests were designed to be self-checking and to report their success or failure. The tests were checked out as far as possible with validated versions of NYU Ada/Ed, although some features not supported by Ada/Ed were simulated. In spite of our best efforts, two tests were clearly incorrect as given to the vendors and, in accordance with the rules, these tests were dropped from the suite. The Boolean vector "and" test had two errors: "v2(N) := true;" should have been "v1(N) := true;" and "vector_result(n) := v1(n) and v2(n);" should have been "vector_result := v1 and v2;". The derived type inter-conversion test had the record representation clause and length clause commented out, which defeats the purpose of the test. (Although it turns out that no vendor could have performed this test even if the source text had been correct.) A third test, the sets package, was challenged by Data General at the time their results were submitted. Several experts have now agreed that the test (and also version 1.2.9 of Ada/Ed) is in error, so the test has been dropped. The ARPAnet version of the test suite has been corrected. One group of the tests attempted to produce serious timing results using the package Calendar. These tests were quite interesting because of the problems in test construction they revealed. In order to assure adequate precision in the results, the vendors were instructed to modify the loop counts to obtain significant net time differences. The criterion used to determine whether the loop count was adequate to pass these tests was based on the assumption that the resolution of the Clock function is determined by Duration'Small and therefore the tests compared the net time with 100 times Duration'Small in order to be sure of at least one percent precision in the average times. However, according to the Ada Reference Manual (ARM), "Duration'Small need not correspond to the basic clock cycle, the named number System.Tick" (ARM 9.6/4). Although the ARM does not define "basic clock cycle", I interpret it to mean the resolution of the function Calendar.Clock. Then the comparison should have been against 100 times the maximum of Duration'Small and System.Tick, instead. Since the disparity between the clock resolution and Duration'Small may be very large (e.g., in the case of Data General it is 1.0 vs. 1/(2**9) seconds, a ratio of 512 to 1), the results of the timing tests as written are not guaranteed to be very accurate even when the test itself announces that it "passed". It should be emphasized that the cause of this problem is primarily poor test design. -3- The major reasons that compilers did not pass some tests can be simply stated: 1) The test was not attempted. (We speculate that this is likely to be due to the fact that some feature or features necessary to the proper functioning of the test are not implemented or have significant bugs.) 2) The vendor was disqualified on the test due to the use of unauthorized changes to the source code. (Initially, all vendors were disqualified on one or more tests for this reason. This was particularly likely to be the cause for the non-validated implementations, since they need work-arounds for unimplemented features, in order to make a program compilable. However, in a few cases, there was no apparent reason for the vendor to modify the code. In such cases we asked the vendor to re-run the test without the modifications.) 3) The test was run correctly, but the results did not meet the accuracy criterion, so the test itself indicated that it failed. (This was generally due to poor test design.) The following overall comments apply to the results from each of the vendors individually: 1) The DG implementation has an apparent inconsistency in the implementation of the Calendar.Clock function and the definition of System.Tick. The value of System.Tick is 0.1 seconds and the resolution of Clock is 1.0 seconds. I believe their implementation to be incorrect. (DG says that they are aware of this discrepancy and are taking steps to improve the resolution of their clock function to equal System.Tick.) Errors were present in the output format for type Duration and an apparent bug was revealed in the operation of division of type Duration by type Integer. 2) The ICSC compiler, which is not yet validated, currently implements type Calendar.Duration as a (hidden) subtype of float and uses the floating point output routines. This leads to an incorrect format for a Put of Duration values with both Fore and Exp set to 0. 3) The RR compiler is also not yet validated. Contrary to the benchmarking rules, no compilation or execution listings were provided by RR. Their results have been compiled from the summary they submitted. How the vendors fared on each individual test is given in Table 1. -4- Most of the timing results reported by the vendors are summarized in Table 2, regardless of whether the test was passed, "failed" due to insufficient precision, or the vendor was disqualified on the test, since these results are generally not too sensitive to the work-arounds which may have been used. The original intent of the tests to provide times accurate to 1% was not realized due to problems in test design. Some of the times are only accurate to about one significant digit. Therefore we are reporting the results in the exact format given by the vendor, where possible, in order to avoid biasing the data further. Interpretation of the data may be easier with the aid of the values of the clock function resolution and Duration'Small, which are included in the table along with their ratio. The greater the ratio of the resolution value to Duration'Small, the less accurate the results would be if the minimum iteration count that met the precision criterion were used in the test. In general, the iteration counts used by the vendors were greater than necessary to pass the Duration'Small criterion, but not greatly so. All times are given in seconds. Some of the size information supplied by the vendors is summarized in Table 3. Because most vendors did not report all of the sizes requested, only the size of the object module compiled for the test (the columns labelled "Object") and the maximum memory size used (the columns labelled "Memory") are given here. It should be noted that the DG data include the stack/heap allocation in the size reported. All sizes are given in (decimal) bytes. One thing is clear about the results, and that is that all of the timing tests need further refinement and, in some cases, drastic surgery to improve their precision. In particular, besides using both System.Tick and Duration'Small in checking the precision, better strategies are needed for the measurement of some of the I/O times. Another problem is that some of the tests were nominally "failed" for reasons of inadequate precision because iteration counts or array sizes greater than the maximum the implementation can support would have been required. This is manifestly unfair when the goal of a test is to measure timing rather than capacity. Future tests should have a better separation of test concerns, making sure that timing tests and capacity tests are kept distinct, and designing timing tests to run properly on machines with small word sizes and small address spaces wherever that is feasible. We need to iterate the test design and trial use process until the results are satisfactory to users and implementers alike. I believe the current set of tests will have served their purpose, in spite of their obvious flaws, if they help to point us in the right direction. -5- Test Name Vendor: DG ICSC RR Ackermann's Function A[a,b] A[a,c] D[d,e] Boolean Vector And Test I I I Binary Search P N N Cauchy Matrices - Floating Point F[f] N N Cauchy Matrices - Fixed Point F[f] N N Cauchy Matrices - Universal Numbers F[f] N N Character Direct I/O P[a] P[a] D[d,e] Character Enumeration I/O P[a] P[a] N Character Text I/O P[a] P[a] D[d,e] Consumer/Producer P N N Derived Type Inter-conversion I I I Floating Point Vector Addition P[a] F[a,g] D[d,e] Friendliness Test P[h] N N Integer Direct I/O P[a] P[a] D[d,e] Integer Text I/O P[a] P[a] D[d,e] Integer Vector Addition P[a] F[a,g] D[d,e] Low Level Test N N N Procedure Call Timing P[a] P[a] D[d,e] Quick Sort - Parallel P D[d] N Quick Sort - Sequential P D[d] N Readers/Writers Problem P N N Rendezvous Call Timing P[a] P[a] N Sets Package I[i] I I Legend: P = Passed A = Anomalous (Program behavior was slightly anomalous) F = Failed N = Not Attempted D = Disqualified I = Invalid Test (Test Dropped) Notes: a Output had errors in format. b Output had errors in values. c Stack overflow occurred after Ackermann (3,7), but Storage_Error was not raised or handled. d Disqualified due to source code changes. e No listing was provided by the vendor. f Compiler passed the syntax and semantics checking phases, but couldn't generate correct code. g Array size could not be set large enough to give adequate timing precision. Otherwise, program executed correctly. h Compiled and executed correctly, but no set/use errors (use of a variable before initialization) or 'hard' exceptions (exceptions which will always be raised by the program) were detected by the compiler. Procedure Dont_Do_It was not called in the generated code, but was included in the load module. The run-time environment did not identify the name of the exception which is deliberately raised by the program (Program_Error). i Compiler diagnosed source errors. Vendor successfully challenged the validity of the test. Table 1: Overall Results -6- Test Name Vendor: DG ICSC RR Machine: DG MV8000 Gould 32/87 IBM PC-XT Clock resolution (System.Tick) 1.0[a] 1.66667E-02 0.0549 Duration'Small 1.95312E-03 1.66667E-02 0.01 Ratio (a pure number) 512.0 1.0 5.49 Ackermann's function: 3.26E-4[b] (3,1) 0.00000E+00[c] 0.00000E+00[c] -- (3,2) 0.00000E+00[c] 3.08059E-05[d] -- (3,3) 0.00000E+00[c] 1.37056E-05[d] -- (3,4) 9.70214E-05[d,f] 1.13187E-05[d] -- (3,5) 2.35638E-05[d,f] 1.13887E-05[e] -- (3,6) 3.48365E-05[d,f] 1.13214E-05 -- (3,7) 3.60249E-05[e,f] 1.15515E-05 -- (3,8) 3.62527E-05[f] [g] -- (3,9) [h] -- -- Character Direct I/O Write 1.00000E-04[d] 8.49966E-04 4.73E-3 Character Direct I/O Read 8.33333E-05[d] 5.20812E-04 3.63E-3 Character Enumer. I/O Write 1.33333E-03[e] 9.83294E-04 -- Character Enumer. I/O Read 5.60000E-03[e] 1.54994E-03 -- Character Text I/O Write 4.33333E-04[e] 4.79147E-05 1.54E-3 Character Text I/O Read 5.33333E-04[e] 9.41629E-05 1.40E-3 Float Vector Add 1.53846E-05[d] 0.00000E+00[c] 3.30E-4 Integer Direct I/O Write 2.70000E-04[e] 1.09579E-03 4.88E-3 Integer Direct I/O Read 1.10000E-04[e] 5.33312E-04 3.79E-3 Integer Text I/O Write 2.80000E-03 1.64993E-03 3.93E-3 Integer Text I/O Read 3.97500E-03 2.26658E-03 4.81E-3 Integer Vector Add 2.50000E-05[d] 3.33320E-06[d] 2.70E-4 No Parameter Call 1.50000E-05[d] 6.19975E-06 1.37E-4 In Parameter Call 1.50000E-05[d] 5.49978E-06 2.11E-4 Out Parameter Call 2.00000E-05[d] 5.89976E-06 1.77E-4 In Out Parameter Call 2.00000E-05[d] 6.06642E-06 1.77E-4 No Parameter Rendezvous 8.36666E-03 8.99964E-04 -- Notes: a Clock resolution is 1.0, although System.Tick is 0.1 seconds. b No individual results were provided by vendor. c Net time was less than one resolution interval. d Net time was at least 1 but less than 10 resolution intervals. e Net time was at least 10 but less than 100 resolution intervals. f Calculated by hand from intermediate results. (Due to a compiler bug the values printed were all zero.) g Storage_Error exception not raised or handled. The system detected stack overflow and terminated the program. h Terminated (as expected) by Storage_Error exception. Table 2: Timing Results -7- Test Name Vendor: DG[a] ICSC[b] RR[b] Size: Object Memory Object Memory Object Memory Ackermann's function -- 348160 1792 75016 1540 86784 Binary Search -- 243712 -- -- -- -- Character Direct I/O -- 251904 3392 81872 2531 90112 Character Enumeration I/O -- 251904 4104 77328 -- -- Character Text I/O -- 249856 3336 76560 2527 87680 Consumer/Producer -- 352256 -- -- -- -- Floating Point Vector Addition -- 948224 1744 74968 1467 86656 Friendliness Test -- 241664 -- -- -- -- Integer Direct I/O -- 251904 3392 81872 2528 90240 Integer Text I/O -- 249856 3352 76576 2557 87680 Integer Vector Addition -- 948224 1712 74936 1422 86656 Procedure Call Timing -- 249856 2080 75304 2083 87296 Quick Sort - Parallel -- 354304 3080 84648 -- -- Quick Sort - Sequential -- 243712 3296 84864 -- -- Readers/Writers Problem -- 356352 -- -- -- -- Rendezvous Call Timing -- 360448 1672 86824 -- -- Notes: a Stack/heap storage is included in size b Stack/heap storage is not included in size -------