From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.220.230 with SMTP id pz6mr949725pbc.3.1341272914172; Mon, 02 Jul 2012 16:48:34 -0700 (PDT) Path: l9ni10739pbj.0!nntp.google.com!news1.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Keean Schupke Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation Date: Mon, 2 Jul 2012 16:48:33 -0700 (PDT) Organization: http://groups.google.com Message-ID: References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net> NNTP-Posting-Host: 82.44.19.199 Mime-Version: 1.0 X-Trace: posting.google.com 1341272914 28721 127.0.0.1 (2 Jul 2012 23:48:34 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Mon, 2 Jul 2012 23:48:34 +0000 (UTC) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199; posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM User-Agent: G2/1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Date: 2012-07-02T16:48:33-07:00 List-Id: On Monday, 2 July 2012 18:26:58 UTC+1, Keean Schupke wrote: > On Monday, 2 July 2012 18:15:28 UTC+1, Georg Bauhaus wrote: > > On 02.07.12 00:57, Keean Schupke wrote: > > > The real benefit (and performance gains) from profile guided compilat= ion come from correcting branch prediction. As such the gains will be most = apparent when there is an 'if' statement in the inner loop of the code. Try= something where you are taking the sign of an int in the formula and have = three cases <0 =3D0 >0. > >=20 > >=20 > > Thanks for your lucid words, I was mostly guessing at what profile > > guided compilation might actually do. Indeed, now that I have started > > playing with conditionals, the translations show very different effects > > already, for variations of the procedure below, > >=20 > > procedure Compute_1D (A : in out Matrix_1D) is > > begin > > for K in A'First + Len + 1 .. A'Last - Len - 1 loop > > case K mod Len is > > when 0 | Len - 1 =3D> null; > > when others =3D> > > A (K) :=3D (A(K + 1) > > + A(K - Len) > > + A(K - 1) > > + A(K + Len)) mod Num'Last; > > end case; > > if A (K) mod 6 =3D 0 then > > A (K) :=3D (A (K) - 1) mod Num'Last; > > else > > A (K) :=3D K mod Num'Last; > > end if; > > end loop; > > end Compute_1D; > >=20 > > Ada and C++ are mostly on a par without help from a profile > > (the 2D approach is still better in the Ada case; perhaps mod 6 > > isn't true for that many K). C++ gains 8%, Ada only 4%, though. > >=20 > >=20 > > Cheers, > > Georg >=20 >=20 > As it happens, the branch predictor is quite good at predicting regular '= mod' patterns. See: >=20 > http://en.wikipedia.org/wiki/Branch_predictor >=20 > And look for the section on the two level adaptive predictor. >=20 > I think Monte-Carlo techniques must be particularly sensitive to branch p= redictor error, as each iteration the branching is controlled by a pseudo r= andom number (and we hope the branch predictor cannot predict that). >=20 > So if for each iteration you pick a random number, and that controls your= branch pattern in the inner loop, you should see a stronger effect from th= e profile-guided optimisation. >=20 >=20 > Cheers, > Keean. I have done some testing with the linux "perf" tool. These are some figures= for the Ada version: 1,014,900 l1-dcache-load-misses # 0.01% of all L1-dcache hi= ts 12,462,973,199 l1-dcache-loads 7,311,495 cache-references 38,804 cache-misses # 0.531 % of all cache refs 2,588,686,069 branch-instructions 388,460,030 branch-misses # 15.01% of all branches 21.885512117 seconds time elapsed And here are the results for the C++ version: 840,245 l1-dcache-load-misses # 0.01% of all L1-dcache hi= ts 11,140,761,995 l1-dcache-loads 6,019,321 cache-references 27,584 cache-misses # 0.458 % of all cache refs 3,049,597,029 branch-instructions 560,173,316 branch-misses # 18.37% of all branches 17.823476294 seconds time elapsed So the interesting thing is that the Ada version has less overall branches = and less branch misses than the C++ version, so it seems the profile-guided= compilation has achieved as much. There is another factor limiting perform= ance. The interesting figure would appear to be the cache-misses. So it would appear I need to focus on the cache utilisation of the Ada code= . Cheers, Keean.