From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,103803355c3db607
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.228.227 with SMTP id sl3mr6581294pbc.5.1342340859908;
        Sun, 15 Jul 2012 01:27:39 -0700 (PDT)
Path: 
 l9ni11846pbj.0!nntp.google.com!news1.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Sun, 15 Jul 2012 01:27:39 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <505460ad-495d-41b4-a88f-e95eb99a6be3@googlegroups.com>
References: <dac2857a-6f74-4ecb-a5d2-f6b73fbd0ecc@googlegroups.com>
 <dd9d3648-4538-4aa2-8a0e-557bed1799b3@googlegroups.com>
 <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com>
 <d15a813f-d697-4c80-ad7c-d110382b92d7@googlegroups.com>
 <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com>
 <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com>
 <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net>
 <a4f2a43e-5593-48f6-9e0f-7d0057874f94@googlegroups.com>
 <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net>
 <cdbe38d2-c8b0-41b2-9830-d913aefa200c@googlegroups.com>
 <fed934c8-9cff-4905-811d-9f9d3050d0b1@googlegroups.com>
 <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net>
 <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com>
 <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net>
 <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com>
 <a6e3v8FvsvU1@mid.individual.net>
 <e70a8424-47df-4141-a0ac-2e0bc6fbfc77@googlegroups.com>
 <e215fb37-2e94-464c-bfc0-2298a415d874@googlegroups.com>
 <a6f90pFj55U1@mid.individual.net>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1342340859 6417 127.0.0.1 (15 Jul 2012 08:27:39
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sun, 15 Jul 2012 08:27:39 +0000 (UTC)
In-Reply-To: <a6f90pFj55U1@mid.individual.net>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-15T01:27:39-07:00
List-Id: <comp.lang.ada>

On Sunday, 15 July 2012 08:15:35 UTC+1, Niklas Holsti  wrote:
> On 12-07-15 02:40 , Keean Schupke wrote:
> &gt; After a bit of checking line-by-line to make sure I am using &#39;an=
d
> &gt; then&#39;, &#39;or else&#39; and &#39;constant&#39; everywhere I can=
 in the code, Ada is
> &gt; outperforming C++ when using profile-guided compilation for the firs=
t
> &gt; time. both C++ and Ada are getting about 40k simulations per second
> &gt; with normal compilation, C++ is achieving 56k simulations per second
> &gt; profile-guided, and Ada 57k per second.
>=20
> Good news.
>=20
> An Ada copmiler is of course free to implement the ordinary=20
> &quot;long-circuit&quot; operators &quot;and&quot;/&quot;or&quot; using s=
hort-circuit code, if the=20
> evaluation of the operands has no side effects. What are the operands=20
> typically like in your program? Are they function calls, or simple=20
> variables?
>=20
> It seems to me that the general belief, regarding the expected relative=
=20
> speeds of the short-circuit code versus the long-circuit code for=20
> Boolean expressions with simple operands, is that the branch penalties=20
> on modern processors are so large that the short-circuit form is not=20
> obviously faster. This may explain why the Ada compiler is not using the=
=20
> short-circuit code automatically.
>=20
> Clearly, if the expression is &quot;(simple operand likely to be True) an=
d=20
> (longer and longer expression)&quot;, at some point the short-circuit cod=
e=20
> (or changing to &quot;and then&quot;) will become faster than the long-ci=
rcuit=20
> code, whatever the branch penalty. This point will come sooner if=20
> profile-guidance is used to reduce the branch penalty.
>=20
> --=20
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .


The code is side effect free. I wonder if using "pragma Pure_Function(X)" w=
ould help here?

Actually I think the assumption that branch misprediction has a high cost i=
s wrong on modern speculative processors. In my tests on different core2 pr=
ocessors simple branching beats all the branchless methods. This is probabl=
y because they can decode up to multiple instructions per cycle, and execut=
e both branch paths in parallel speculatively, so a branch misprediction ju=
st throws away the micro-ops from the not-taken path, and the cost is fairl=
y low. I realise this somewhat contradicts what I said about profile guided=
 compilation, but there is still a higher cost to misprediction, and for my=
 code that was worth about 6k simulations using profile-guided compilation.=
 However the short-circuit execution plus profile guided compilation was wo=
rth another 10k simulations. It would appear the biggest benefit is in re-a=
rranging the order of the clauses of boolean logic so the most often taken =
short-circuit path is first, although the re-arranging of general branches =
still has a significant effect.

The most critical Boolean expression has the a format like this:


function F(N : Node, V : Value) return Boolean is begin (
    return (N.Enum =3D Const) or else ((N.Enum =3D V) =3D (N.Number =3D 0))=
;
)

B : constant Boolean =3D F(N1, V)=20
    and then F(N2, V)
    and then F(N3, V)
    and then F(N4, V);


Cheers,
Keean.