comp.lang.ada
 help / color / mirror / Atom feed
From: Keean Schupke <keean.schupke@googlemail.com>
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Sat, 14 Jul 2012 13:17:27 -0700 (PDT)
Date: 2012-07-14T13:17:27-07:00	[thread overview]
Message-ID: <2dba1140-4f28-4fb8-ace4-2c10f3a02313@googlegroups.com> (raw)
In-Reply-To: <4ff43956$0$6576$9b4e6d93@newsspool3.arcor-online.net>

On Wednesday, 4 July 2012 13:38:45 UTC+1, Georg Bauhaus  wrote:
> On 04.07.12 12:57, Keean Schupke wrote:
> &gt; On Wednesday, 4 July 2012 11:38:57 UTC+1, Georg Bauhaus  wrote:
> &gt;&gt; On 03.07.12 01:48, Keean Schupke wrote:
> &gt;&gt;&gt; I have done some testing with the linux &quot;perf&quot; tool. These are some figures for the Ada version:
> &gt;&gt;&gt;
> &gt;&gt;&gt;           1,014,900 l1-dcache-load-misses     #    0.01% of all L1-dcache hits
> &gt;&gt;&gt;      12,462,973,199 l1-dcache-loads
> &gt;&gt;&gt;           7,311,495 cache-references
> &gt;&gt;&gt;              38,804 cache-misses              #    0.531 % of all cache refs
> &gt;&gt;&gt;       2,588,686,069 branch-instructions
> &gt;&gt;&gt;         388,460,030 branch-misses             #   15.01% of all branches
> &gt;&gt;&gt;        21.885512117 seconds time elapsed
> &gt;&gt;&gt;
> &gt;&gt;&gt; And here are the results for the C++ version:
> &gt;&gt;&gt;
> &gt;&gt;&gt;             840,245 l1-dcache-load-misses     #    0.01% of all L1-dcache hits
> &gt;&gt;&gt;      11,140,761,995 l1-dcache-loads
> &gt;&gt;&gt;           6,019,321 cache-references
> &gt;&gt;&gt;              27,584 cache-misses              #    0.458 % of all cache refs
> &gt;&gt;&gt;       3,049,597,029 branch-instructions
> &gt;&gt;&gt;         560,173,316 branch-misses             #   18.37% of all branches
> &gt;&gt;&gt;        17.823476294 seconds time elapsed
> &gt;&gt;&gt;
> &gt;&gt;&gt;
> &gt;&gt;&gt; So the interesting thing is that the Ada version has less overall branches and less branch misses than the C++ version, so it seems the profile-guided compilation has achieved as much. There is another factor limiting performance. The interesting figure would appear to be the cache-misses.
> &gt;&gt;&gt;
> &gt;&gt;&gt; So it would appear I need to focus on the cache utilisation of the Ada code.
> &gt;&gt;
> &gt;&gt; FWIW, looking at the 1D vs 2D subprograms in order to learn
> &gt;&gt; about a (dis)advantage of writing 2D arrays,I found some
> &gt;&gt; things potentially interesting.
> &gt;&gt;
> &gt;&gt; When there is no additional test in the loops,
> &gt;&gt; Apple&#39;s Instruments shows two orders of magnitude fewer
> &gt;&gt; branch instructions executed by the 2D subprogram
> &gt;&gt; compared to the 1D subprogram, 5M : 2G. This seems huge to me,
> &gt;&gt; but is reproducible. A naive look at the assembly listing offers
> &gt;&gt; some confirmation, mentioned below, though not on the same order.
> &gt;&gt;
> &gt;&gt; With the &quot;mod&quot; based test added to the respective loops the number
> &gt;&gt; of branch instructions executed by the 2D subprogram increases
> &gt;&gt; to about one half of that of the 1D subprogram&#39;s. Still better.
> &gt;&gt;
> &gt;&gt; The assembly listing of the subprograms without tests added has
> &gt;&gt;
> &gt;&gt; - [compute_1d] 3 pairs of forward je and 1 backward jne near
> &gt;&gt;    the end
> &gt;&gt;
> &gt;&gt; - [compute_2] 1 pair of backward jne near the end,
> &gt;&gt;
> &gt;&gt; It appears that unrolling yields two somewhat differently
> &gt;&gt; structured lists of instructions, but I&#39;m drifting away
> &gt;&gt; from Ada.
> &gt;&gt;
> &gt;&gt; Compiling with profile data rearranges the jumps for 1D, adds jumps to 2D,
> &gt;&gt; and shortens both procedures. However, this slows both down using the latest
> &gt;&gt; GNAT GPL on Core i7; there is some speed-up of the 1D procedure with
> &gt;&gt; Debian&#39;s GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loops -gnatp)
> &gt;&gt;
> &gt;&gt; All of this breaks once I turn on -O3.
> &gt;&gt; Not sure whether this is a lottery or a mine field. ;-)
> &gt;&gt;
> &gt;&gt; Cheers,
> &gt;&gt; Georg
> &gt; 
> &gt; 
> &gt; How can I turn off inlining for a function in GNAT?
> 
> Sometimes by reordering code, making sure the body hasn&#39;t
> been seen when the compiler sees the call statement.
> Or try separate compilation.  The following arrangement
> appears to prevent inline expansion of Inc, even when
> just the main unit is fed to gnatmake -O3 -gnatNp, so that
> GNAT translates everything else automatically, using the
> same switches.
> 
> -fno-inline is another switch to consider. However, it
> appears to be interfering with other optimizations (loop
> unrolling, vectorizer, from what I can guess).
> 
> package Prevent_Inline is
>    type List is array (Positive range &lt;&gt;) of Integer;
>    procedure Inc (X : in out Integer);
>    procedure Inc_All (A : in out List);
> end Prevent_Inline;
> 
> with Prevent_Inline.Aux;
> package body Prevent_Inline is
> 
>    procedure Inc (X : in out Integer) is
>    begin
>       X := X + 1;
>    end Inc;
> 
>    procedure Inc_All (A : in out List)
>      renames Prevent_Inline.Aux;
> 
> end Prevent_Inline;
> 
> procedure Prevent_Inline.Aux (A : in out List) is
> begin
>    for X of A loop
>       Inc (X);
>    end loop;
> end Prevent_Inline.Aux;
> 
> with Prevent_Inline;    use Prevent_Inline;
> procedure Test_Prevent_Inline is
>    X : List (1 .. 10);
> begin
>    Inc_All (X);
> end Test_Prevent_Inline;

Okay, I think I have tracked down the performance problem, but I am not sure how to fix it. It would appear that C++ code that returns a boolean from a function, generates a decision tree using tests and branches, whereas Ada is setting the result into a Boolean variable. This has the result that C++ is bailing out of the evaluation as soon as it can (IE if one side of an and is false, or one side of an or is true), but Ada is always evaluating all parts of the expressions.

Is this a difference in language semantics, and what is the best way to deal with it? Do I need to rewrite all 'and' and 'or' statements in conditionals as nested if statements to get the evaluate only as far as necessary semantics like C/C++?


Cheers,
Keean.



  reply	other threads:[~2012-07-14 20:17 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-29  9:17 GNAT (GCC) Profile Guided Compilation Keean Schupke
2012-06-29  9:34 ` Dmitry A. Kazakov
2012-06-29 10:01   ` Keean Schupke
2012-06-29 10:24     ` Keean Schupke
2012-06-29 12:26       ` stefan-lucks
2012-06-29 12:51         ` Keean Schupke
2012-06-29 12:05     ` Dmitry A. Kazakov
2012-06-29 10:48 ` Simon Wright
2012-06-29 11:14   ` Keean Schupke
2012-06-29 12:39 ` gautier_niouzes
2012-06-29 12:52   ` Keean Schupke
2012-06-29 14:14     ` gautier_niouzes
2012-06-29 15:05       ` gautier_niouzes
2012-06-29 17:03         ` Keean Schupke
2012-07-01  9:29           ` Georg Bauhaus
2012-07-01 17:45           ` Georg Bauhaus
2012-07-01 22:57             ` Keean Schupke
2012-07-02 17:15               ` Georg Bauhaus
2012-07-02 17:26                 ` Keean Schupke
2012-07-02 23:48                   ` Keean Schupke
2012-07-04 10:38                     ` Georg Bauhaus
2012-07-04 10:57                       ` Keean Schupke
2012-07-04 12:36                         ` Mark Lorenzen
2012-07-04 12:38                         ` Georg Bauhaus
2012-07-14 20:17                           ` Keean Schupke [this message]
2012-07-14 20:33                             ` Keean Schupke
2012-07-14 20:43                             ` Niklas Holsti
2012-07-14 22:32                               ` Keean Schupke
2012-07-14 23:40                                 ` Keean Schupke
2012-07-15  7:15                                   ` Niklas Holsti
2012-07-15  8:27                                     ` Keean Schupke
2012-07-18 10:01                                       ` Georg Bauhaus
2012-07-18 17:36                                         ` Keean Schupke
2012-07-19  5:42                                           ` Georg Bauhaus
2012-07-19 10:18                                             ` Keean Schupke
2012-07-15 11:02                                     ` Niklas Holsti
2012-07-15 12:48                                       ` Keean Schupke
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox