From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,b4c0f0f8a3cf7068
X-Google-Attributes: gid103376,public
X-Google-Thread: 10a146,7bdd56c6db71678c
X-Google-Attributes: gid10a146,public
From: "Norman H. Cohen" <ncohen@watson.ibm.com>
Subject: Re: Hotspot. Dynamic compilers "better" than static one?
Date: 1998/06/01
Message-ID: <3572FDA2.22E1@watson.ibm.com>
X-Deja-AN: 358538358
Content-Transfer-Encoding: 7bit
References: <6kocc1$d80@drn.newsguy.com> <6kpk0h$qmo$1@supernews.com>
Content-Type: text/plain; charset=us-ascii
Organization: IBM Thomas J. Watson Research Center
Mime-Version: 1.0
Reply-To: ncohen@watson.ibm.com
Newsgroups: comp.lang.java.programmer,comp.lang.ada
Date: 1998-06-01T00:00:00+00:00
List-Id: <comp.lang.ada>


Roedy Green wrote:

> Here are a couple of examples where information gleaned from dynamic
> analysis could help substantially in optimisation.
> 
> (1) Most machine architectures run fastest if there are no jumps, and if
> any conditional jumps don't jump, just fall through.
> 
> By dynamic analysis you can determine which branch is the more likely, and
> put that one inline and branch off to code elsewhere for the less likely
> one.
> 
> Over a decade ago I spent months writing screaming fast code for the
> nucleus of a 32-bit Forth compiler that hand optimised every jump this way.
> The secondary advantage is that the most commonly used code is more likely
> to be pre-fetched or in cache.  A static optimiser can't do this, since it
> has no knowledge of which branch is the more likely.
...

In particular, dynamic analysis has been shown to be effective in
optmizing dynamically bound function calls in object-oriented
languages.  Suppose there is an Ada tagged type T with primitive
subprogram

   procedure P(X: in out T);

and types T1, T2, T3, ... derived from T.  If we know that most of the
calls on P are going to be with actual parameters of type T1, the call

   P(A);

(where A is of type T'Class) can, in effect, be transformed into

   if A'Tag = T1'Tag then
      P( T1(A) );  -- Static call on T1's version of P;
                   --   tag check for conversion can be
                   --   optimized away
   else
      P(A);  -- Dispatching call based on tag of A
   end if;

(Translation into Java:  Suppose there is a class T with method p and
final subclasses T1, T2, T3, ....  If we know that most of the calls on
p are going to be for objects of class T1, the call

   x.p();

(where x is of type T) can, in effect, be transformed into

   if ( x instanceof T1 ) {
      ((T1)x).p();
   } else {
      x.p();
   }

where the check for the cast to T1 can be optimized away and the call to
((T1)x).p can be compiled as direct method call to T1's version of p.)

By eliminating the indirect branch (dynamic call) in the most common
case, this transformation (sometimes called "if-conversion") improves
I-cache locality and instruction prefetching.  More importantly
(especially when cache effects are neglible compared to the overhead of
interpreting Java byte codes), the static call on T1's version of P can
be inlined.  Besides eliminating the procedure-call overhead, the
inlining enables further optimizations, because the inlined copy of the
procedure body can be optimized with all the information available in
the calling context.

It makes no sense to perform "if-conversion" unless there is information
available indicating that more of the dispatching calls from a given
call site go to one target than to others.  This information can be
gleaned from profiling or from dynamic compilation.  The Java Hotspot
compiler goes even further, rechecking dispatching frequencies as a
program executes and recompiling if-conversions favoring different
target methods as the program's dynamic behavior evolves.  It has even
been claimed that inlining optimizations are redone dynmically, but I
suspect that the inlining optimizations performed dynamically are much
less ambitious than those traditionally performed statically.

Studies of OO code indicate that if-conversions can be very worthwhile. 
(See, for example, the paper "Reducing Indirect Function Call Overhead
in C++ Programs" by Brad Calder and Dirk Grunwald of the University of
Colorado, in the 1994 POPL proceedings.)  Indeed, often an apparently
dynamic call has only one possible target in the program (e.g. when an
Ada programmer declared a type tagged to provide flexibility for future
enhancements, but did not derive from that type, or when a Java
programmer neglected to declare a method final); in this case the
if-conversion can actually be done at link time, without any profiling
information.  In other cases, a class library with several subclasses
may be linked in, but the program might actually construct objects of
only one class; this is detectable by profiling or dynamic compilation. 
Of course there are also cases where more than one target is dispatched
to at run time, but the distribution of dispatch targets is highly
skewed.

-- 
Norman H. Cohen
mailto:ncohen@watson.ibm.com
http://www.research.ibm.com/people/n/ncohen