From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,103803355c3db607
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.220.230 with SMTP id pz6mr949725pbc.3.1341272914172;
        Mon, 02 Jul 2012 16:48:34 -0700 (PDT)
Path: 
 l9ni10739pbj.0!nntp.google.com!news1.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Mon, 2 Jul 2012 16:48:33 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <fed934c8-9cff-4905-811d-9f9d3050d0b1@googlegroups.com>
References: <dac2857a-6f74-4ecb-a5d2-f6b73fbd0ecc@googlegroups.com>
 <dd9d3648-4538-4aa2-8a0e-557bed1799b3@googlegroups.com>
 <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com>
 <d15a813f-d697-4c80-ad7c-d110382b92d7@googlegroups.com>
 <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com>
 <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com>
 <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net>
 <a4f2a43e-5593-48f6-9e0f-7d0057874f94@googlegroups.com>
 <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net>
 <cdbe38d2-c8b0-41b2-9830-d913aefa200c@googlegroups.com>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1341272914 28721 127.0.0.1 (2 Jul 2012 23:48:34
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 2 Jul 2012 23:48:34 +0000 (UTC)
In-Reply-To: <cdbe38d2-c8b0-41b2-9830-d913aefa200c@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-02T16:48:33-07:00
List-Id: <comp.lang.ada>

On Monday, 2 July 2012 18:26:58 UTC+1, Keean Schupke  wrote:
> On Monday, 2 July 2012 18:15:28 UTC+1, Georg Bauhaus  wrote:
> > On 02.07.12 00:57, Keean Schupke wrote:
> > > The real benefit (and performance gains) from profile guided compilat=
ion come from correcting branch prediction. As such the gains will be most =
apparent when there is an 'if' statement in the inner loop of the code. Try=
 something where you are taking the sign of an int in the formula and have =
three cases <0 =3D0 >0.
> >=20
> >=20
> > Thanks for your lucid words, I was mostly guessing at what profile
> > guided compilation might actually do. Indeed, now that I have started
> > playing with conditionals, the translations show very different effects
> > already, for variations of the procedure below,
> >=20
> >    procedure Compute_1D (A : in out Matrix_1D) is
> >    begin
> >       for K in A'First + Len + 1 .. A'Last - Len - 1 loop
> >          case K mod Len is
> >          when 0 | Len - 1 =3D> null;
> >          when others =3D>
> >             A (K) :=3D (A(K + 1)
> >                         + A(K - Len)
> >                         + A(K - 1)
> >                         + A(K + Len)) mod Num'Last;
> >          end case;
> >          if A (K) mod 6 =3D 0 then
> >             A (K) :=3D (A (K) - 1) mod Num'Last;
> >          else
> >             A (K) :=3D K mod Num'Last;
> >          end if;
> >       end loop;
> >    end Compute_1D;
> >=20
> > Ada and C++ are mostly on a par without help from a profile
> > (the 2D approach is still better in the Ada case; perhaps mod 6
> > isn't true for that many K). C++ gains 8%, Ada only 4%, though.
> >=20
> >=20
> > Cheers,
> > Georg
>=20
>=20
> As it happens, the branch predictor is quite good at predicting regular '=
mod' patterns. See:
>=20
> http://en.wikipedia.org/wiki/Branch_predictor
>=20
> And look for the section on the two level adaptive predictor.
>=20
> I think Monte-Carlo techniques must be particularly sensitive to branch p=
redictor error, as each iteration the branching is controlled by a pseudo r=
andom number (and we hope the branch predictor cannot predict that).
>=20
> So if for each iteration you pick a random number, and that controls your=
 branch pattern in the inner loop, you should see a stronger effect from th=
e profile-guided optimisation.
>=20
>=20
> Cheers,
> Keean.


I have done some testing with the linux "perf" tool. These are some figures=
 for the Ada version:

         1,014,900 l1-dcache-load-misses     #    0.01% of all L1-dcache hi=
ts
    12,462,973,199 l1-dcache-loads
         7,311,495 cache-references
            38,804 cache-misses              #    0.531 % of all cache refs
     2,588,686,069 branch-instructions
       388,460,030 branch-misses             #   15.01% of all branches
      21.885512117 seconds time elapsed

And here are the results for the C++ version:

           840,245 l1-dcache-load-misses     #    0.01% of all L1-dcache hi=
ts
    11,140,761,995 l1-dcache-loads
         6,019,321 cache-references
            27,584 cache-misses              #    0.458 % of all cache refs
     3,049,597,029 branch-instructions
       560,173,316 branch-misses             #   18.37% of all branches
      17.823476294 seconds time elapsed


So the interesting thing is that the Ada version has less overall branches =
and less branch misses than the C++ version, so it seems the profile-guided=
 compilation has achieved as much. There is another factor limiting perform=
ance. The interesting figure would appear to be the cache-misses.

So it would appear I need to focus on the cache utilisation of the Ada code=
.


Cheers,
Keean.