From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ad4aec717fd8556e
X-Google-Attributes: gid103376,public
From: dewar@merv.cs.nyu.edu (Robert Dewar)
Subject: Re: 'size attribute inheritance
Date: 1997/09/06
Message-ID: <dewar.873602116@merv>#1/1
X-Deja-AN: 270401749
References: <33ECF679.4B5D@lmco.com> <dewar.871839941@merv>
 <EF4u22.715@world.std.com> <dewar.872433846@merv> <EFy4Gt.BLA@world.std.com>
Organization: New York University
Newsgroups: comp.lang.ada
Date: 1997-09-06T00:00:00+00:00
List-Id: <comp.lang.ada>


<<That part, I don't buy, and I won't buy unless I see measurements of
real code.  My wild guess is that the damage would be not-much-worse
than the damage due to any of the other range checks and whatnot defined
by the language.  Robert's wild guess is that the damage would be much
worse.  The only way to know who's right would be to implement a
compiler that does the "extra" checks, and measure the speed of typical
programs with and without those checks.>>

These are not "wild guesses", they are based on data obtained from looking
at the Alsys compiler on the x86 when I was working on that. I am recalling
that introducing simple assignment subrange checks resulted in an increase
by a significant factor of the penalty due to checks. This is not surprising.
Particularly in the floating-point case, range checks are extremely
expensive, because they may break the pipeline, and a comparison takes
as much time as a multiply on typical machines. I am sorry, I no long
er have the exact data, because I discarded all this material when I
stopped working for Alsys.

But just think about a bit, and I think you will see why it is a significant
extra hit. Consider a loop

   for J in ....
     S := S + A(J)*B(J);
   end loop;

Now on a typical high performance RISC machine, with a multiply add
instruction, we are seeing in the loop

two loads -- which can be scheduled if the loop is unrolled
one fused-multiply-add
one increment-and-loop operation

Now on many modern RISC machines with super scalar capability, we can
approach 1 clock to issue all these instructions, and certainly 2 clocks
is achievable on a number of architectures.

But Bob wants to add two comparison instructions (yes, you need them, 
because the uninitialized value may look like a Nan or infinity).

This can easily double the number of instructions in the loop, and on
some architectures, would triple the number of instructions in the loop.

Note that the requirement for checking that actually is *in* the RM adds
virtually no overhead to this loop, but Bob's wished for change in the RM
could double or triple the execution time of this very typical fpt inner
loop.