From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.227.67 with SMTP id ry3mr6263913pbc.8.1341399542782; Wed, 04 Jul 2012 03:59:02 -0700 (PDT) Path: l9ni10839pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Keean Schupke Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation Date: Wed, 4 Jul 2012 03:57:16 -0700 (PDT) Organization: http://groups.google.com Message-ID: <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com> References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net> <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> NNTP-Posting-Host: 82.44.19.199 Mime-Version: 1.0 X-Trace: posting.google.com 1341399542 14740 127.0.0.1 (4 Jul 2012 10:59:02 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Wed, 4 Jul 2012 10:59:02 +0000 (UTC) In-Reply-To: <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199; posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM User-Agent: G2/1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Date: 2012-07-04T03:57:16-07:00 List-Id: On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus wrote: > On 03.07.12 01:48, Keean Schupke wrote: > > I have done some testing with the linux "perf" tool. These are some fig= ures for the Ada version: > > > > 1,014,900 l1-dcache-load-misses # 0.01% of all L1-dcac= he hits > > 12,462,973,199 l1-dcache-loads > > 7,311,495 cache-references > > 38,804 cache-misses # 0.531 % of all cache= refs > > 2,588,686,069 branch-instructions > > 388,460,030 branch-misses # 15.01% of all branche= s > > 21.885512117 seconds time elapsed > > > > And here are the results for the C++ version: > > > > 840,245 l1-dcache-load-misses # 0.01% of all L1-dcac= he hits > > 11,140,761,995 l1-dcache-loads > > 6,019,321 cache-references > > 27,584 cache-misses # 0.458 % of all cache= refs > > 3,049,597,029 branch-instructions > > 560,173,316 branch-misses # 18.37% of all branche= s > > 17.823476294 seconds time elapsed > > > > > > So the interesting thing is that the Ada version has less overall branc= hes and less branch misses than the C++ version, so it seems the profile-gu= ided compilation has achieved as much. There is another factor limiting per= formance. The interesting figure would appear to be the cache-misses. > > > > So it would appear I need to focus on the cache utilisation of the Ada = code. >=20 > FWIW, looking at the 1D vs 2D subprograms in order to learn > about a (dis)advantage of writing 2D arrays,I found some > things potentially interesting. >=20 > When there is no additional test in the loops, > Apple's Instruments shows two orders of magnitude fewer > branch instructions executed by the 2D subprogram > compared to the 1D subprogram, 5M : 2G. This seems huge to me, > but is reproducible. A naive look at the assembly listing offers > some confirmation, mentioned below, though not on the same order. >=20 > With the "mod" based test added to the respective loops the number > of branch instructions executed by the 2D subprogram increases > to about one half of that of the 1D subprogram's. Still better. >=20 > The assembly listing of the subprograms without tests added has >=20 > - [compute_1d] 3 pairs of forward je and 1 backward jne near > the end >=20 > - [compute_2] 1 pair of backward jne near the end, >=20 > It appears that unrolling yields two somewhat differently > structured lists of instructions, but I'm drifting away > from Ada. >=20 > Compiling with profile data rearranges the jumps for 1D, adds jumps to 2D= , > and shortens both procedures. However, this slows both down using the lat= est > GNAT GPL on Core i7; there is some speed-up of the 1D procedure with > Debian's GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loops -gnatp) >=20 > All of this breaks once I turn on -O3. > Not sure whether this is a lottery or a mine field. ;-) >=20 > Cheers, > Georg How can I turn off inlining for a function in GNAT? GNAT seems to be automatically inlining some functions despite not having -= gnatn enabled nor having a pragma Inline for the function. On the profile-guided stuff, I think you have to benchmark with your real a= pplication. I get a consistent improvement of 25% with C++ and 15% for Ada.= I just can't work out at the moment why the Ada is slower. Cheers, Keean.