comp.lang.ada
 help / color / mirror / Atom feed
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Wed, 04 Jul 2012 12:38:57 +0200
Date: 2012-07-04T12:38:49+02:00	[thread overview]
Message-ID: <4ff41d38$0$6577$9b4e6d93@newsspool3.arcor-online.net> (raw)
In-Reply-To: <fed934c8-9cff-4905-811d-9f9d3050d0b1@googlegroups.com>

On 03.07.12 01:48, Keean Schupke wrote:
> I have done some testing with the linux "perf" tool. These are some figures for the Ada version:
>
>           1,014,900 l1-dcache-load-misses     #    0.01% of all L1-dcache hits
>      12,462,973,199 l1-dcache-loads
>           7,311,495 cache-references
>              38,804 cache-misses              #    0.531 % of all cache refs
>       2,588,686,069 branch-instructions
>         388,460,030 branch-misses             #   15.01% of all branches
>        21.885512117 seconds time elapsed
>
> And here are the results for the C++ version:
>
>             840,245 l1-dcache-load-misses     #    0.01% of all L1-dcache hits
>      11,140,761,995 l1-dcache-loads
>           6,019,321 cache-references
>              27,584 cache-misses              #    0.458 % of all cache refs
>       3,049,597,029 branch-instructions
>         560,173,316 branch-misses             #   18.37% of all branches
>        17.823476294 seconds time elapsed
>
>
> So the interesting thing is that the Ada version has less overall branches and less branch misses than the C++ version, so it seems the profile-guided compilation has achieved as much. There is another factor limiting performance. The interesting figure would appear to be the cache-misses.
>
> So it would appear I need to focus on the cache utilisation of the Ada code.

FWIW, looking at the 1D vs 2D subprograms in order to learn
about a (dis)advantage of writing 2D arrays,I found some
things potentially interesting.

When there is no additional test in the loops,
Apple's Instruments shows two orders of magnitude fewer
branch instructions executed by the 2D subprogram
compared to the 1D subprogram, 5M : 2G. This seems huge to me,
but is reproducible. A naive look at the assembly listing offers
some confirmation, mentioned below, though not on the same order.

With the "mod" based test added to the respective loops the number
of branch instructions executed by the 2D subprogram increases
to about one half of that of the 1D subprogram's. Still better.

The assembly listing of the subprograms without tests added has

- [compute_1d] 3 pairs of forward je and 1 backward jne near
   the end

- [compute_2] 1 pair of backward jne near the end,

It appears that unrolling yields two somewhat differently
structured lists of instructions, but I'm drifting away
from Ada.

Compiling with profile data rearranges the jumps for 1D, adds jumps to 2D,
and shortens both procedures. However, this slows both down using the latest
GNAT GPL on Core i7; there is some speed-up of the 1D procedure with
Debian's GNAT 4.4.5 on Xeon E5645, though. (-O2 -funroll-loops -gnatp)

All of this breaks once I turn on -O3.
Not sure whether this is a lottery or a mine field. ;-)

Cheers,
Georg



  reply	other threads:[~2012-07-04 10:39 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-29  9:17 GNAT (GCC) Profile Guided Compilation Keean Schupke
2012-06-29  9:34 ` Dmitry A. Kazakov
2012-06-29 10:01   ` Keean Schupke
2012-06-29 10:24     ` Keean Schupke
2012-06-29 12:26       ` stefan-lucks
2012-06-29 12:51         ` Keean Schupke
2012-06-29 12:05     ` Dmitry A. Kazakov
2012-06-29 10:48 ` Simon Wright
2012-06-29 11:14   ` Keean Schupke
2012-06-29 12:39 ` gautier_niouzes
2012-06-29 12:52   ` Keean Schupke
2012-06-29 14:14     ` gautier_niouzes
2012-06-29 15:05       ` gautier_niouzes
2012-06-29 17:03         ` Keean Schupke
2012-07-01  9:29           ` Georg Bauhaus
2012-07-01 17:45           ` Georg Bauhaus
2012-07-01 22:57             ` Keean Schupke
2012-07-02 17:15               ` Georg Bauhaus
2012-07-02 17:26                 ` Keean Schupke
2012-07-02 23:48                   ` Keean Schupke
2012-07-04 10:38                     ` Georg Bauhaus [this message]
2012-07-04 10:57                       ` Keean Schupke
2012-07-04 12:36                         ` Mark Lorenzen
2012-07-04 12:38                         ` Georg Bauhaus
2012-07-14 20:17                           ` Keean Schupke
2012-07-14 20:33                             ` Keean Schupke
2012-07-14 20:43                             ` Niklas Holsti
2012-07-14 22:32                               ` Keean Schupke
2012-07-14 23:40                                 ` Keean Schupke
2012-07-15  7:15                                   ` Niklas Holsti
2012-07-15  8:27                                     ` Keean Schupke
2012-07-18 10:01                                       ` Georg Bauhaus
2012-07-18 17:36                                         ` Keean Schupke
2012-07-19  5:42                                           ` Georg Bauhaus
2012-07-19 10:18                                             ` Keean Schupke
2012-07-15 11:02                                     ` Niklas Holsti
2012-07-15 12:48                                       ` Keean Schupke
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox