From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,103803355c3db607
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.227.67 with SMTP id ry3mr6263913pbc.8.1341399542782;
        Wed, 04 Jul 2012 03:59:02 -0700 (PDT)
Path: 
 l9ni10839pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Wed, 4 Jul 2012 03:57:16 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com>
References: <dac2857a-6f74-4ecb-a5d2-f6b73fbd0ecc@googlegroups.com>
 <dd9d3648-4538-4aa2-8a0e-557bed1799b3@googlegroups.com>
 <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com>
 <d15a813f-d697-4c80-ad7c-d110382b92d7@googlegroups.com>
 <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com>
 <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com>
 <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net>
 <a4f2a43e-5593-48f6-9e0f-7d0057874f94@googlegroups.com>
 <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net>
 <cdbe38d2-c8b0-41b2-9830-d913aefa200c@googlegroups.com>
 <fed934c8-9cff-4905-811d-9f9d3050d0b1@googlegroups.com>
 <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1341399542 14740 127.0.0.1 (4 Jul 2012 10:59:02
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Wed, 4 Jul 2012 10:59:02 +0000 (UTC)
In-Reply-To: <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-04T03:57:16-07:00
List-Id: <comp.lang.ada>

On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus  wrote:
> On 03.07.12 01:48, Keean Schupke wrote:
> > I have done some testing with the linux "perf" tool. These are some fig=
ures for the Ada version:
> >
> >           1,014,900 l1-dcache-load-misses     #    0.01% of all L1-dcac=
he hits
> >      12,462,973,199 l1-dcache-loads
> >           7,311,495 cache-references
> >              38,804 cache-misses              #    0.531 % of all cache=
 refs
> >       2,588,686,069 branch-instructions
> >         388,460,030 branch-misses             #   15.01% of all branche=
s
> >        21.885512117 seconds time elapsed
> >
> > And here are the results for the C++ version:
> >
> >             840,245 l1-dcache-load-misses     #    0.01% of all L1-dcac=
he hits
> >      11,140,761,995 l1-dcache-loads
> >           6,019,321 cache-references
> >              27,584 cache-misses              #    0.458 % of all cache=
 refs
> >       3,049,597,029 branch-instructions
> >         560,173,316 branch-misses             #   18.37% of all branche=
s
> >        17.823476294 seconds time elapsed
> >
> >
> > So the interesting thing is that the Ada version has less overall branc=
hes and less branch misses than the C++ version, so it seems the profile-gu=
ided compilation has achieved as much. There is another factor limiting per=
formance. The interesting figure would appear to be the cache-misses.
> >
> > So it would appear I need to focus on the cache utilisation of the Ada =
code.
>=20
> FWIW, looking at the 1D vs 2D subprograms in order to learn
> about a (dis)advantage of writing 2D arrays,I found some
> things potentially interesting.
>=20
> When there is no additional test in the loops,
> Apple's Instruments shows two orders of magnitude fewer
> branch instructions executed by the 2D subprogram
> compared to the 1D subprogram, 5M : 2G. This seems huge to me,
> but is reproducible. A naive look at the assembly listing offers
> some confirmation, mentioned below, though not on the same order.
>=20
> With the "mod" based test added to the respective loops the number
> of branch instructions executed by the 2D subprogram increases
> to about one half of that of the 1D subprogram's. Still better.
>=20
> The assembly listing of the subprograms without tests added has
>=20
> - [compute_1d] 3 pairs of forward je and 1 backward jne near
>    the end
>=20
> - [compute_2] 1 pair of backward jne near the end,
>=20
> It appears that unrolling yields two somewhat differently
> structured lists of instructions, but I'm drifting away
> from Ada.
>=20
> Compiling with profile data rearranges the jumps for 1D, adds jumps to 2D=
,
> and shortens both procedures. However, this slows both down using the lat=
est
> GNAT GPL on Core i7; there is some speed-up of the 1D procedure with
> Debian's GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loops -gnatp)
>=20
> All of this breaks once I turn on -O3.
> Not sure whether this is a lottery or a mine field. ;-)
>=20
> Cheers,
> Georg


How can I turn off inlining for a function in GNAT?

GNAT seems to be automatically inlining some functions despite not having -=
gnatn enabled nor having a pragma Inline for the function.


On the profile-guided stuff, I think you have to benchmark with your real a=
pplication. I get a consistent improvement of 25% with C++ and 15% for Ada.=
 I just can't work out at the moment why the Ada is slower.


Cheers,
Keean.