From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.223.73 with SMTP id qs9mr5482025pbc.7.1342298008543; Sat, 14 Jul 2012 13:33:28 -0700 (PDT) Path: l9ni11739pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Keean Schupke Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation Date: Sat, 14 Jul 2012 13:33:28 -0700 (PDT) Organization: http://groups.google.com Message-ID: References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net> <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com> <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net> <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com> NNTP-Posting-Host: 82.44.19.199 Mime-Version: 1.0 X-Trace: posting.google.com 1342298008 25255 127.0.0.1 (14 Jul 2012 20:33:28 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sat, 14 Jul 2012 20:33:28 +0000 (UTC) In-Reply-To: <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199; posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM User-Agent: G2/1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Date: 2012-07-14T13:33:28-07:00 List-Id: On Saturday, 14 July 2012 21:17:27 UTC+1, Keean Schupke wrote: > On Wednesday, 4 July 2012 13:38:45 UTC+1, Georg Bauhaus wrote: > > On 04.07.12 12:57, Keean Schupke wrote: > > &gt; On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus wr= ote: > > &gt;&gt; On 03.07.12 01:48, Keean Schupke wrote: > > &gt;&gt;&gt; I have done some testing with the linux &am= p;quot;perf&quot; tool. These are some figures for the Ada version: > > &gt;&gt;&gt; > > &gt;&gt;&gt; 1,014,900 l1-dcache-load-misses = # 0.01% of all L1-dcache hits > > &gt;&gt;&gt; 12,462,973,199 l1-dcache-loads > > &gt;&gt;&gt; 7,311,495 cache-references > > &gt;&gt;&gt; 38,804 cache-misses = # 0.531 % of all cache refs > > &gt;&gt;&gt; 2,588,686,069 branch-instructions > > &gt;&gt;&gt; 388,460,030 branch-misses = # 15.01% of all branches > > &gt;&gt;&gt; 21.885512117 seconds time elapsed > > &gt;&gt;&gt; > > &gt;&gt;&gt; And here are the results for the C++ versio= n: > > &gt;&gt;&gt; > > &gt;&gt;&gt; 840,245 l1-dcache-load-misses = # 0.01% of all L1-dcache hits > > &gt;&gt;&gt; 11,140,761,995 l1-dcache-loads > > &gt;&gt;&gt; 6,019,321 cache-references > > &gt;&gt;&gt; 27,584 cache-misses = # 0.458 % of all cache refs > > &gt;&gt;&gt; 3,049,597,029 branch-instructions > > &gt;&gt;&gt; 560,173,316 branch-misses = # 18.37% of all branches > > &gt;&gt;&gt; 17.823476294 seconds time elapsed > > &gt;&gt;&gt; > > &gt;&gt;&gt; > > &gt;&gt;&gt; So the interesting thing is that the Ada ve= rsion has less overall branches and less branch misses than the C++ version= , so it seems the profile-guided compilation has achieved as much. There is= another factor limiting performance. The interesting figure would appear t= o be the cache-misses. > > &gt;&gt;&gt; > > &gt;&gt;&gt; So it would appear I need to focus on the c= ache utilisation of the Ada code. > > &gt;&gt; > > &gt;&gt; FWIW, looking at the 1D vs 2D subprograms in order = to learn > > &gt;&gt; about a (dis)advantage of writing 2D arrays,I found= some > > &gt;&gt; things potentially interesting. > > &gt;&gt; > > &gt;&gt; When there is no additional test in the loops, > > &gt;&gt; Apple&#39;s Instruments shows two orders of mag= nitude fewer > > &gt;&gt; branch instructions executed by the 2D subprogram > > &gt;&gt; compared to the 1D subprogram, 5M : 2G. This seems = huge to me, > > &gt;&gt; but is reproducible. A naive look at the assembly l= isting offers > > &gt;&gt; some confirmation, mentioned below, though not on t= he same order. > > &gt;&gt; > > &gt;&gt; With the &quot;mod&quot; based test added t= o the respective loops the number > > &gt;&gt; of branch instructions executed by the 2D subprogra= m increases > > &gt;&gt; to about one half of that of the 1D subprogram&= #39;s. Still better. > > &gt;&gt; > > &gt;&gt; The assembly listing of the subprograms without tes= ts added has > > &gt;&gt; > > &gt;&gt; - [compute_1d] 3 pairs of forward je and 1 backward= jne near > > &gt;&gt; the end > > &gt;&gt; > > &gt;&gt; - [compute_2] 1 pair of backward jne near the end, > > &gt;&gt; > > &gt;&gt; It appears that unrolling yields two somewhat diffe= rently > > &gt;&gt; structured lists of instructions, but I&#39;m d= rifting away > > &gt;&gt; from Ada. > > &gt;&gt; > > &gt;&gt; Compiling with profile data rearranges the jumps fo= r 1D, adds jumps to 2D, > > &gt;&gt; and shortens both procedures. However, this slows b= oth down using the latest > > &gt;&gt; GNAT GPL on Core i7; there is some speed-up of the = 1D procedure with > > &gt;&gt; Debian&#39;s GNAT 4.4.5 on Xeon E5645, though. = (-O2 -funroll-loops -gnatp) > > &gt;&gt; > > &gt;&gt; All of this breaks once I turn on -O3. > > &gt;&gt; Not sure whether this is a lottery or a mine field.= ;-) > > &gt;&gt; > > &gt;&gt; Cheers, > > &gt;&gt; Georg > > &gt;=20 > > &gt;=20 > > &gt; How can I turn off inlining for a function in GNAT? > >=20 > > Sometimes by reordering code, making sure the body hasn&#39;t > > been seen when the compiler sees the call statement. > > Or try separate compilation. The following arrangement > > appears to prevent inline expansion of Inc, even when > > just the main unit is fed to gnatmake -O3 -gnatNp, so that > > GNAT translates everything else automatically, using the > > same switches. > >=20 > > -fno-inline is another switch to consider. However, it > > appears to be interfering with other optimizations (loop > > unrolling, vectorizer, from what I can guess). > >=20 > > package Prevent_Inline is > > type List is array (Positive range &lt;&gt;) of Integer; > > procedure Inc (X : in out Integer); > > procedure Inc_All (A : in out List); > > end Prevent_Inline; > >=20 > > with Prevent_Inline.Aux; > > package body Prevent_Inline is > >=20 > > procedure Inc (X : in out Integer) is > > begin > > X :=3D X + 1; > > end Inc; > >=20 > > procedure Inc_All (A : in out List) > > renames Prevent_Inline.Aux; > >=20 > > end Prevent_Inline; > >=20 > > procedure Prevent_Inline.Aux (A : in out List) is > > begin > > for X of A loop > > Inc (X); > > end loop; > > end Prevent_Inline.Aux; > >=20 > > with Prevent_Inline; use Prevent_Inline; > > procedure Test_Prevent_Inline is > > X : List (1 .. 10); > > begin > > Inc_All (X); > > end Test_Prevent_Inline; >=20 > Okay, I think I have tracked down the performance problem, but I am not s= ure how to fix it. It would appear that C++ code that returns a boolean fro= m a function, generates a decision tree using tests and branches, whereas A= da is setting the result into a Boolean variable. This has the result that = C++ is bailing out of the evaluation as soon as it can (IE if one side of a= n and is false, or one side of an or is true), but Ada is always evaluating= all parts of the expressions. >=20 > Is this a difference in language semantics, and what is the best way to d= eal with it? Do I need to rewrite all 'and' and 'or' statem= ents in conditionals as nested if statements to get the evaluate only as fa= r as necessary semantics like C/C++? >=20 >=20 > Cheers, > Keean. Just answering my own question, looks like I should be using "and then" and= "or else" for "and" and "or" in boolean expressions. Cheers, Keean.