From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,b4c0f0f8a3cf7068 X-Google-Attributes: gid103376,public X-Google-Thread: 10a146,7bdd56c6db71678c X-Google-Attributes: gid10a146,public From: "Norman H. Cohen" Subject: Re: Hotspot. Dynamic compilers "better" than static one? Date: 1998/06/01 Message-ID: <3572FDA2.22E1@watson.ibm.com> X-Deja-AN: 358538358 Content-Transfer-Encoding: 7bit References: <6kocc1$d80@drn.newsguy.com> <6kpk0h$qmo$1@supernews.com> Content-Type: text/plain; charset=us-ascii Organization: IBM Thomas J. Watson Research Center Mime-Version: 1.0 Reply-To: ncohen@watson.ibm.com Newsgroups: comp.lang.java.programmer,comp.lang.ada Date: 1998-06-01T00:00:00+00:00 List-Id: Roedy Green wrote: > Here are a couple of examples where information gleaned from dynamic > analysis could help substantially in optimisation. > > (1) Most machine architectures run fastest if there are no jumps, and if > any conditional jumps don't jump, just fall through. > > By dynamic analysis you can determine which branch is the more likely, and > put that one inline and branch off to code elsewhere for the less likely > one. > > Over a decade ago I spent months writing screaming fast code for the > nucleus of a 32-bit Forth compiler that hand optimised every jump this way. > The secondary advantage is that the most commonly used code is more likely > to be pre-fetched or in cache. A static optimiser can't do this, since it > has no knowledge of which branch is the more likely. ... In particular, dynamic analysis has been shown to be effective in optmizing dynamically bound function calls in object-oriented languages. Suppose there is an Ada tagged type T with primitive subprogram procedure P(X: in out T); and types T1, T2, T3, ... derived from T. If we know that most of the calls on P are going to be with actual parameters of type T1, the call P(A); (where A is of type T'Class) can, in effect, be transformed into if A'Tag = T1'Tag then P( T1(A) ); -- Static call on T1's version of P; -- tag check for conversion can be -- optimized away else P(A); -- Dispatching call based on tag of A end if; (Translation into Java: Suppose there is a class T with method p and final subclasses T1, T2, T3, .... If we know that most of the calls on p are going to be for objects of class T1, the call x.p(); (where x is of type T) can, in effect, be transformed into if ( x instanceof T1 ) { ((T1)x).p(); } else { x.p(); } where the check for the cast to T1 can be optimized away and the call to ((T1)x).p can be compiled as direct method call to T1's version of p.) By eliminating the indirect branch (dynamic call) in the most common case, this transformation (sometimes called "if-conversion") improves I-cache locality and instruction prefetching. More importantly (especially when cache effects are neglible compared to the overhead of interpreting Java byte codes), the static call on T1's version of P can be inlined. Besides eliminating the procedure-call overhead, the inlining enables further optimizations, because the inlined copy of the procedure body can be optimized with all the information available in the calling context. It makes no sense to perform "if-conversion" unless there is information available indicating that more of the dispatching calls from a given call site go to one target than to others. This information can be gleaned from profiling or from dynamic compilation. The Java Hotspot compiler goes even further, rechecking dispatching frequencies as a program executes and recompiling if-conversions favoring different target methods as the program's dynamic behavior evolves. It has even been claimed that inlining optimizations are redone dynmically, but I suspect that the inlining optimizations performed dynamically are much less ambitious than those traditionally performed statically. Studies of OO code indicate that if-conversions can be very worthwhile. (See, for example, the paper "Reducing Indirect Function Call Overhead in C++ Programs" by Brad Calder and Dirk Grunwald of the University of Colorado, in the 1994 POPL proceedings.) Indeed, often an apparently dynamic call has only one possible target in the program (e.g. when an Ada programmer declared a type tagged to provide flexibility for future enhancements, but did not derive from that type, or when a Java programmer neglected to declare a method final); in this case the if-conversion can actually be done at link time, without any profiling information. In other cases, a class library with several subclasses may be linked in, but the program might actually construct objects of only one class; this is detectable by profiling or dynamic compilation. Of course there are also cases where more than one target is dispatched to at run time, but the distribution of dispatch targets is highly skewed. -- Norman H. Cohen mailto:ncohen@watson.ibm.com http://www.research.ibm.com/people/n/ncohen