From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,edfa88b682578b7 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-07-27 06:39:51 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!fu-berlin.de!uni-berlin.de!82-43-33-75.cable.ubr01.croy.blueyonder.co.UK!not-for-mail From: "Nick Roberts" Newsgroups: comp.lang.ada Subject: Loop Optimisation [was: Overhead for type conversions?] Date: Sun, 27 Jul 2003 14:41:48 +0100 Message-ID: References: NNTP-Posting-Host: 82-43-33-75.cable.ubr01.croy.blueyonder.co.uk (82.43.33.75) X-Trace: news.uni-berlin.de 1059313189 20225551 82.43.33.75 (16 [25716]) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Xref: archiver1.google.com comp.lang.ada:40864 Date: 2003-07-27T14:41:48+01:00 List-Id: "Bobby D. Bryant" wrote in message news:pan.2003.07.26.23.39.45.556296@mail.utexas.edu... > On Sat, 26 Jul 2003 12:01:43 -0400, Warren W. Gay VE3WWG wrote: > ... > > I once suggested something like this to Simon Wright regarding > > some procedure/function calls within the Booch Components. He > > did some profiling, and discovered that inlining that I suggested > > actually worsened the performance slightly under Linux. I don't > > know if he did more investigations along this line, but for > > the examples that I suggested, it actually made things worse > > (due to instruction caching reasons I expect). > > > > So inlining should probably be tested before the conclusion is > > made. [...] In order to illustrate this, perhaps bizarre-seeming, phenomenon, consider the following loop: loop ... if C then P(...); else Q(...); end if; ... end loop; Typical modern processors have a very limited (second level) instruction cache (a few kilobytes), with limited (or no) associativity. Let's assume that the code for this loop, where the call (in the Ada sense) of procedure P is expanded to call instructions (in the machine code sense), all fits into the instruction cache. Let's also assume that P will tend to be called less frequently than Q. If procedure P is inlined, the call to procedure P is replaced by the code representing P's body; this could cause the size of the loop code to increase so much that it causes the overall loop to become bigger than the instruction cache. In this case, the slowing down caused by having to retrieve some of the instructions in the loop from the first-level cache on each iteration of the loop could outweigh the speeding up achieved by eliminating the instructions associated with the machine call to and return from P. However, look at a possible skeleton assembly expansion of the loop: L1: ... [test C, truth in flag x] JNx L2 [call or expand P] JMP L3 L2: [Q] L3: ... L4: ;end of loop JMP means 'jump'. JNx means 'jump if not x'. A clever compiler can (and will!) change this expansion to the following: L1: ... [test C, truth in flag x] Jx L2 [Q] L3: ... L4: ;end of loop ... L2: [call or expand P] JMP L3 In other words, it pushes the code associated with the less frequently executed branch of the 'if' statement outside the loop, precisely so as to avoid the cache overflow problem. This is a classic form of loop optimisation in the literature. I'd be very interested to know how many Ada compilers are capable of performing it in practice. The much-touted modern technique for determining which branch of an if-then-else is least often executed is profiling. Personally, my preference would be for the compiler to assume the 'else' part is more often executed by default, and to provide a pragma to instruct the compiler otherwise. I might propose a 'straw man' for such a pragma as follows: ---------- Syntax pragma Principal_Path; Static Semantics The effect defined by this standard applies only when at most one pragma Principal_Path occurs within an if statement which contains more than one sequence of statements. In all other cases, the effect of pragma Principal_Path is implementation-defined. Pragma Principal_Path advises the implementation that optimization is to be based on the assumption that the sequence of statements in which it occurs will be executed more frequently than any other sequences of statements in the same if statement. In the absence of any pragma Principal_Path within an if statement which contains more than one sequence of statements, the implementation should assume that the last sequence of statements in the if statement will be executed more frequently than any of the other sequences of statements. An implementation should always assume, for any if statement, that any sequence of statements within the if statement will be executed less frequently than the if statement itself. Implementatation Permissions An implementation may ignore pragma Principal_Path; in particular, it may override the effect of this pragma (for example, on the basis of a profiling analysis). An implementation may provide more sophisticated ways for the programmer to express relative path importance -- for example by ranking paths numerically -- possibly by defining the effect of pragma Principal_Path in places where this standard does not, or by defining parameters for pragma Principal_Path, or by some other means. Such extensions should be documented. ---------- I'd also suggest a pragma for the parallelisation of 'for' loops. Maybe in another post. -- Nick Roberts Jabber: debater@charente.de [ICQ: 159718630]