From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.223.40 with SMTP id qr8mr5484138pbc.0.1342297047988; Sat, 14 Jul 2012 13:17:27 -0700 (PDT) Path: l9ni11739pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Keean Schupke Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation Date: Sat, 14 Jul 2012 13:17:27 -0700 (PDT) Organization: http://groups.google.com Message-ID: <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com> References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net> <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com> <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net> NNTP-Posting-Host: 82.44.19.199 Mime-Version: 1.0 X-Trace: posting.google.com 1342297047 15840 127.0.0.1 (14 Jul 2012 20:17:27 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sat, 14 Jul 2012 20:17:27 +0000 (UTC) In-Reply-To: <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199; posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM User-Agent: G2/1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Date: 2012-07-14T13:17:27-07:00 List-Id: On Wednesday, 4 July 2012 13:38:45 UTC+1, Georg Bauhaus wrote: > On 04.07.12 12:57, Keean Schupke wrote: > > On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus wrote: > >> On 03.07.12 01:48, Keean Schupke wrote: > >>> I have done some testing with the linux "perf" too= l. These are some figures for the Ada version: > >>> > >>> 1,014,900 l1-dcache-load-misses # 0.01% of = all L1-dcache hits > >>> 12,462,973,199 l1-dcache-loads > >>> 7,311,495 cache-references > >>> 38,804 cache-misses # 0.531 % o= f all cache refs > >>> 2,588,686,069 branch-instructions > >>> 388,460,030 branch-misses # 15.01% of = all branches > >>> 21.885512117 seconds time elapsed > >>> > >>> And here are the results for the C++ version: > >>> > >>> 840,245 l1-dcache-load-misses # 0.01% of = all L1-dcache hits > >>> 11,140,761,995 l1-dcache-loads > >>> 6,019,321 cache-references > >>> 27,584 cache-misses # 0.458 % o= f all cache refs > >>> 3,049,597,029 branch-instructions > >>> 560,173,316 branch-misses # 18.37% of = all branches > >>> 17.823476294 seconds time elapsed > >>> > >>> > >>> So the interesting thing is that the Ada version has less ov= erall branches and less branch misses than the C++ version, so it seems the= profile-guided compilation has achieved as much. There is another factor l= imiting performance. The interesting figure would appear to be the cache-mi= sses. > >>> > >>> So it would appear I need to focus on the cache utilisation = of the Ada code. > >> > >> FWIW, looking at the 1D vs 2D subprograms in order to learn > >> about a (dis)advantage of writing 2D arrays,I found some > >> things potentially interesting. > >> > >> When there is no additional test in the loops, > >> Apple's Instruments shows two orders of magnitude fewer > >> branch instructions executed by the 2D subprogram > >> compared to the 1D subprogram, 5M : 2G. This seems huge to me, > >> but is reproducible. A naive look at the assembly listing offers > >> some confirmation, mentioned below, though not on the same order= . > >> > >> With the "mod" based test added to the respective loop= s the number > >> of branch instructions executed by the 2D subprogram increases > >> to about one half of that of the 1D subprogram's. Still bett= er. > >> > >> The assembly listing of the subprograms without tests added has > >> > >> - [compute_1d] 3 pairs of forward je and 1 backward jne near > >> the end > >> > >> - [compute_2] 1 pair of backward jne near the end, > >> > >> It appears that unrolling yields two somewhat differently > >> structured lists of instructions, but I'm drifting away > >> from Ada. > >> > >> Compiling with profile data rearranges the jumps for 1D, adds ju= mps to 2D, > >> and shortens both procedures. However, this slows both down usin= g the latest > >> GNAT GPL on Core i7; there is some speed-up of the 1D procedure = with > >> Debian's GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loo= ps -gnatp) > >> > >> All of this breaks once I turn on -O3. > >> Not sure whether this is a lottery or a mine field. ;-) > >> > >> Cheers, > >> Georg > >=20 > >=20 > > How can I turn off inlining for a function in GNAT? >=20 > Sometimes by reordering code, making sure the body hasn't > been seen when the compiler sees the call statement. > Or try separate compilation. The following arrangement > appears to prevent inline expansion of Inc, even when > just the main unit is fed to gnatmake -O3 -gnatNp, so that > GNAT translates everything else automatically, using the > same switches. >=20 > -fno-inline is another switch to consider. However, it > appears to be interfering with other optimizations (loop > unrolling, vectorizer, from what I can guess). >=20 > package Prevent_Inline is > type List is array (Positive range <>) of Integer; > procedure Inc (X : in out Integer); > procedure Inc_All (A : in out List); > end Prevent_Inline; >=20 > with Prevent_Inline.Aux; > package body Prevent_Inline is >=20 > procedure Inc (X : in out Integer) is > begin > X :=3D X + 1; > end Inc; >=20 > procedure Inc_All (A : in out List) > renames Prevent_Inline.Aux; >=20 > end Prevent_Inline; >=20 > procedure Prevent_Inline.Aux (A : in out List) is > begin > for X of A loop > Inc (X); > end loop; > end Prevent_Inline.Aux; >=20 > with Prevent_Inline; use Prevent_Inline; > procedure Test_Prevent_Inline is > X : List (1 .. 10); > begin > Inc_All (X); > end Test_Prevent_Inline; Okay, I think I have tracked down the performance problem, but I am not sur= e how to fix it. It would appear that C++ code that returns a boolean from = a function, generates a decision tree using tests and branches, whereas Ada= is setting the result into a Boolean variable. This has the result that C+= + is bailing out of the evaluation as soon as it can (IE if one side of an = and is false, or one side of an or is true), but Ada is always evaluating a= ll parts of the expressions. Is this a difference in language semantics, and what is the best way to dea= l with it? Do I need to rewrite all 'and' and 'or' statements in conditiona= ls as nested if statements to get the evaluate only as far as necessary sem= antics like C/C++? Cheers, Keean.