From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,103803355c3db607
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.223.73 with SMTP id qs9mr5482025pbc.7.1342298008543;
        Sat, 14 Jul 2012 13:33:28 -0700 (PDT)
Path: 
 l9ni11739pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Sat, 14 Jul 2012 13:33:28 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <ae25ef43-5b69-47df-8bfe-60eaff51d190@googlegroups.com>
References: <dac2857a-6f74-4ecb-a5d2-f6b73fbd0ecc@googlegroups.com>
 <dd9d3648-4538-4aa2-8a0e-557bed1799b3@googlegroups.com>
 <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com>
 <d15a813f-d697-4c80-ad7c-d110382b92d7@googlegroups.com>
 <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com>
 <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com>
 <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net>
 <a4f2a43e-5593-48f6-9e0f-7d0057874f94@googlegroups.com>
 <4ff1d731$0$6582$9b4e6d93@newsspool3.arcor-online.net>
 <cdbe38d2-c8b0-41b2-9830-d913aefa200c@googlegroups.com>
 <fed934c8-9cff-4905-811d-9f9d3050d0b1@googlegroups.com>
 <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net>
 <26b778c4-5abc-4fbf-94b0-888c2ce71831@googlegroups.com>
 <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net>
 <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1342298008 25255 127.0.0.1 (14 Jul 2012 20:33:28
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sat, 14 Jul 2012 20:33:28 +0000 (UTC)
In-Reply-To: <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-14T13:33:28-07:00
List-Id: <comp.lang.ada>

On Saturday, 14 July 2012 21:17:27 UTC+1, Keean Schupke  wrote:
> On Wednesday, 4 July 2012 13:38:45 UTC+1, Georg Bauhaus  wrote:
> &gt; On 04.07.12 12:57, Keean Schupke wrote:
> &gt; &amp;gt; On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus  wr=
ote:
> &gt; &amp;gt;&amp;gt; On 03.07.12 01:48, Keean Schupke wrote:
> &gt; &amp;gt;&amp;gt;&amp;gt; I have done some testing with the linux &am=
p;quot;perf&amp;quot; tool. These are some figures for the Ada version:
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt;           1,014,900 l1-dcache-load-misses  =
   #    0.01% of all L1-dcache hits
> &gt; &amp;gt;&amp;gt;&amp;gt;      12,462,973,199 l1-dcache-loads
> &gt; &amp;gt;&amp;gt;&amp;gt;           7,311,495 cache-references
> &gt; &amp;gt;&amp;gt;&amp;gt;              38,804 cache-misses           =
   #    0.531 % of all cache refs
> &gt; &amp;gt;&amp;gt;&amp;gt;       2,588,686,069 branch-instructions
> &gt; &amp;gt;&amp;gt;&amp;gt;         388,460,030 branch-misses          =
   #   15.01% of all branches
> &gt; &amp;gt;&amp;gt;&amp;gt;        21.885512117 seconds time elapsed
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt; And here are the results for the C++ versio=
n:
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt;             840,245 l1-dcache-load-misses  =
   #    0.01% of all L1-dcache hits
> &gt; &amp;gt;&amp;gt;&amp;gt;      11,140,761,995 l1-dcache-loads
> &gt; &amp;gt;&amp;gt;&amp;gt;           6,019,321 cache-references
> &gt; &amp;gt;&amp;gt;&amp;gt;              27,584 cache-misses           =
   #    0.458 % of all cache refs
> &gt; &amp;gt;&amp;gt;&amp;gt;       3,049,597,029 branch-instructions
> &gt; &amp;gt;&amp;gt;&amp;gt;         560,173,316 branch-misses          =
   #   18.37% of all branches
> &gt; &amp;gt;&amp;gt;&amp;gt;        17.823476294 seconds time elapsed
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt; So the interesting thing is that the Ada ve=
rsion has less overall branches and less branch misses than the C++ version=
, so it seems the profile-guided compilation has achieved as much. There is=
 another factor limiting performance. The interesting figure would appear t=
o be the cache-misses.
> &gt; &amp;gt;&amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt;&amp;gt; So it would appear I need to focus on the c=
ache utilisation of the Ada code.
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; FWIW, looking at the 1D vs 2D subprograms in order =
to learn
> &gt; &amp;gt;&amp;gt; about a (dis)advantage of writing 2D arrays,I found=
 some
> &gt; &amp;gt;&amp;gt; things potentially interesting.
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; When there is no additional test in the loops,
> &gt; &amp;gt;&amp;gt; Apple&amp;#39;s Instruments shows two orders of mag=
nitude fewer
> &gt; &amp;gt;&amp;gt; branch instructions executed by the 2D subprogram
> &gt; &amp;gt;&amp;gt; compared to the 1D subprogram, 5M : 2G. This seems =
huge to me,
> &gt; &amp;gt;&amp;gt; but is reproducible. A naive look at the assembly l=
isting offers
> &gt; &amp;gt;&amp;gt; some confirmation, mentioned below, though not on t=
he same order.
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; With the &amp;quot;mod&amp;quot; based test added t=
o the respective loops the number
> &gt; &amp;gt;&amp;gt; of branch instructions executed by the 2D subprogra=
m increases
> &gt; &amp;gt;&amp;gt; to about one half of that of the 1D subprogram&amp;=
#39;s. Still better.
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; The assembly listing of the subprograms without tes=
ts added has
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; - [compute_1d] 3 pairs of forward je and 1 backward=
 jne near
> &gt; &amp;gt;&amp;gt;    the end
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; - [compute_2] 1 pair of backward jne near the end,
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; It appears that unrolling yields two somewhat diffe=
rently
> &gt; &amp;gt;&amp;gt; structured lists of instructions, but I&amp;#39;m d=
rifting away
> &gt; &amp;gt;&amp;gt; from Ada.
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; Compiling with profile data rearranges the jumps fo=
r 1D, adds jumps to 2D,
> &gt; &amp;gt;&amp;gt; and shortens both procedures. However, this slows b=
oth down using the latest
> &gt; &amp;gt;&amp;gt; GNAT GPL on Core i7; there is some speed-up of the =
1D procedure with
> &gt; &amp;gt;&amp;gt; Debian&amp;#39;s GNAT 4.4.5 on Xeon E5645, though. =
(-O2 -funroll-loops -gnatp)
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; All of this breaks once I turn on -O3.
> &gt; &amp;gt;&amp;gt; Not sure whether this is a lottery or a mine field.=
 ;-)
> &gt; &amp;gt;&amp;gt;
> &gt; &amp;gt;&amp;gt; Cheers,
> &gt; &amp;gt;&amp;gt; Georg
> &gt; &amp;gt;=20
> &gt; &amp;gt;=20
> &gt; &amp;gt; How can I turn off inlining for a function in GNAT?
> &gt;=20
> &gt; Sometimes by reordering code, making sure the body hasn&amp;#39;t
> &gt; been seen when the compiler sees the call statement.
> &gt; Or try separate compilation.  The following arrangement
> &gt; appears to prevent inline expansion of Inc, even when
> &gt; just the main unit is fed to gnatmake -O3 -gnatNp, so that
> &gt; GNAT translates everything else automatically, using the
> &gt; same switches.
> &gt;=20
> &gt; -fno-inline is another switch to consider. However, it
> &gt; appears to be interfering with other optimizations (loop
> &gt; unrolling, vectorizer, from what I can guess).
> &gt;=20
> &gt; package Prevent_Inline is
> &gt;    type List is array (Positive range &amp;lt;&amp;gt;) of Integer;
> &gt;    procedure Inc (X : in out Integer);
> &gt;    procedure Inc_All (A : in out List);
> &gt; end Prevent_Inline;
> &gt;=20
> &gt; with Prevent_Inline.Aux;
> &gt; package body Prevent_Inline is
> &gt;=20
> &gt;    procedure Inc (X : in out Integer) is
> &gt;    begin
> &gt;       X :=3D X + 1;
> &gt;    end Inc;
> &gt;=20
> &gt;    procedure Inc_All (A : in out List)
> &gt;      renames Prevent_Inline.Aux;
> &gt;=20
> &gt; end Prevent_Inline;
> &gt;=20
> &gt; procedure Prevent_Inline.Aux (A : in out List) is
> &gt; begin
> &gt;    for X of A loop
> &gt;       Inc (X);
> &gt;    end loop;
> &gt; end Prevent_Inline.Aux;
> &gt;=20
> &gt; with Prevent_Inline;    use Prevent_Inline;
> &gt; procedure Test_Prevent_Inline is
> &gt;    X : List (1 .. 10);
> &gt; begin
> &gt;    Inc_All (X);
> &gt; end Test_Prevent_Inline;
>=20
> Okay, I think I have tracked down the performance problem, but I am not s=
ure how to fix it. It would appear that C++ code that returns a boolean fro=
m a function, generates a decision tree using tests and branches, whereas A=
da is setting the result into a Boolean variable. This has the result that =
C++ is bailing out of the evaluation as soon as it can (IE if one side of a=
n and is false, or one side of an or is true), but Ada is always evaluating=
 all parts of the expressions.
>=20
> Is this a difference in language semantics, and what is the best way to d=
eal with it? Do I need to rewrite all &#39;and&#39; and &#39;or&#39; statem=
ents in conditionals as nested if statements to get the evaluate only as fa=
r as necessary semantics like C/C++?
>=20
>=20
> Cheers,
> Keean.

Just answering my own question, looks like I should be using "and then" and=
 "or else" for "and" and "or" in boolean expressions.

Cheers,
Keean.