From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.190.104 with SMTP id gp8mr6465446pbc.4.1341405527668; Wed, 04 Jul 2012 05:38:47 -0700 (PDT) Path: l9ni10838pbj.0!nntp.google.com!news1.google.com!goblin2!goblin3!goblin.stu.neva.ru!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!usenet-fr.net!feeder1-2.proxad.net!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Wed, 04 Jul 2012 14:38:45 +0200 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net> <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com> In-Reply-To: <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com> Message-ID: <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 04 Jul 2012 14:38:46 CEST NNTP-Posting-Host: 41d15c81.newsspool3.arcor-online.net X-Trace: DXC=;OXBT^dILAUFm0Y?OE@2^XMcF=Q^Z^V3X4Fo<]lROoRQ8kFZLh>_cHTX3j]fCkU8]Gajh_ X-Complaints-To: usenet-abuse@arcor.de Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Date: 2012-07-04T14:38:46+02:00 List-Id: On 04.07.12 12:57, Keean Schupke wrote: > On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus wrote: >> On 03.07.12 01:48, Keean Schupke wrote: >>> I have done some testing with the linux "perf" tool. These are some figures for the Ada version: >>> >>> 1,014,900 l1-dcache-load-misses # 0.01% of all L1-dcache hits >>> 12,462,973,199 l1-dcache-loads >>> 7,311,495 cache-references >>> 38,804 cache-misses # 0.531 % of all cache refs >>> 2,588,686,069 branch-instructions >>> 388,460,030 branch-misses # 15.01% of all branches >>> 21.885512117 seconds time elapsed >>> >>> And here are the results for the C++ version: >>> >>> 840,245 l1-dcache-load-misses # 0.01% of all L1-dcache hits >>> 11,140,761,995 l1-dcache-loads >>> 6,019,321 cache-references >>> 27,584 cache-misses # 0.458 % of all cache refs >>> 3,049,597,029 branch-instructions >>> 560,173,316 branch-misses # 18.37% of all branches >>> 17.823476294 seconds time elapsed >>> >>> >>> So the interesting thing is that the Ada version has less overall branches and less branch misses than the C++ version, so it seems the profile-guided compilation has achieved as much. There is another factor limiting performance. The interesting figure would appear to be the cache-misses. >>> >>> So it would appear I need to focus on the cache utilisation of the Ada code. >> >> FWIW, looking at the 1D vs 2D subprograms in order to learn >> about a (dis)advantage of writing 2D arrays,I found some >> things potentially interesting. >> >> When there is no additional test in the loops, >> Apple's Instruments shows two orders of magnitude fewer >> branch instructions executed by the 2D subprogram >> compared to the 1D subprogram, 5M : 2G. This seems huge to me, >> but is reproducible. A naive look at the assembly listing offers >> some confirmation, mentioned below, though not on the same order. >> >> With the "mod" based test added to the respective loops the number >> of branch instructions executed by the 2D subprogram increases >> to about one half of that of the 1D subprogram's. Still better. >> >> The assembly listing of the subprograms without tests added has >> >> - [compute_1d] 3 pairs of forward je and 1 backward jne near >> the end >> >> - [compute_2] 1 pair of backward jne near the end, >> >> It appears that unrolling yields two somewhat differently >> structured lists of instructions, but I'm drifting away >> from Ada. >> >> Compiling with profile data rearranges the jumps for 1D, adds jumps to 2D, >> and shortens both procedures. However, this slows both down using the latest >> GNAT GPL on Core i7; there is some speed-up of the 1D procedure with >> Debian's GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loops -gnatp) >> >> All of this breaks once I turn on -O3. >> Not sure whether this is a lottery or a mine field. ;-) >> >> Cheers, >> Georg > > > How can I turn off inlining for a function in GNAT? Sometimes by reordering code, making sure the body hasn't been seen when the compiler sees the call statement. Or try separate compilation. The following arrangement appears to prevent inline expansion of Inc, even when just the main unit is fed to gnatmake -O3 -gnatNp, so that GNAT translates everything else automatically, using the same switches. -fno-inline is another switch to consider. However, it appears to be interfering with other optimizations (loop unrolling, vectorizer, from what I can guess). package Prevent_Inline is type List is array (Positive range <>) of Integer; procedure Inc (X : in out Integer); procedure Inc_All (A : in out List); end Prevent_Inline; with Prevent_Inline.Aux; package body Prevent_Inline is procedure Inc (X : in out Integer) is begin X := X + 1; end Inc; procedure Inc_All (A : in out List) renames Prevent_Inline.Aux; end Prevent_Inline; procedure Prevent_Inline.Aux (A : in out List) is begin for X of A loop Inc (X); end loop; end Prevent_Inline.Aux; with Prevent_Inline; use Prevent_Inline; procedure Test_Prevent_Inline is X : List (1 .. 10); begin Inc_All (X); end Test_Prevent_Inline;