comp.lang.ada
 help / color / mirror / Atom feed
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Sun, 01 Jul 2012 19:45:22 +0200
Date: 2012-07-01T19:45:22+02:00	[thread overview]
Message-ID: <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> (raw)
In-Reply-To: <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com>

On 29.06.12 19:03, Keean Schupke wrote:
> Note: this is not really answering my question about the lack of improvement due to profile-guided compilation - but perhaps that side of things is more a compiler issue. Does anyone have any ideas about that side of things?

Some more observations, collected with the help of the simple
two example programs appended below.

For Ada and GNAT at least, any advantages to be had from profile-guided
compilation seem to vary with optimization options and sizes of data.
The same is true about a 1D approach vs a 2D approach.

Among the various arrangements, one of the best I have got
from both GNAT GPL 2012, and from an FSF GNAT as of June 2012
uses
*) -O2 -funroll-loops -gnatp
*)  the 2D approach

(If it means a anything, adding -ftree-vectorize to the -O2 set
produces only the second best Ada result, the same as the -O3 times
listed below. I had chosen -ftree-vectorize as this switch is in the
O3 set.) Trying profile guided compilation at higher optimization
levels consistently slowed the Ada program down (other Ada programs
were faster after profile guided compilation, as mentioned elsewhere).

The result at -O2 seems nice, though, in that the 2D approach is
natural, the compiler switches are the ones that the manual recommends.
and this combination produces the fastest Ada program.

In short, from worst to best 1D / 2D, Runs = 100:

Ada profile guided -> .88 / .80   (this program only!)
Ada at -O3         -> .68 / .66
C++ at -O3         -> .66
Ada at -O2 -funr.. -> .68 / .47
C++ profile guided -> .31


Some differences seem to vary with hardware, too.
On one computer I have seen a minor speedup, on another
a very small slowdown, and a different ratio of 1D / 2D.

For the C++ case I have tested a 1D approach only, not
knowing how to write 2D arrays in C++ using C style arrays.
I'm not a C++ programmer, apologies for my C++.


Ada at -O2 -funroll-loops -gnatp -mtune=native
1D:  0.680750000 27303
2D:  0.469094000 27303  <-- Best for Ada!

Ada at -O3 -gnatNp -fomit-frame-pointer -mtune=native
1D:  0.676616000 27303
2D:  0.664194000 27303

previous with profile-guided compilation
1D:  0.888206000 27303
2D:  0.806196000 27303

C++ at -O3 -mtune=native
1D: 0.681721 0

previous with profile-guided compilation
1D: 0.31165 0

Clang++
1D: 0.611522 0

The GNU C++ compiler is from the GNAT GPL 2012 distribution for Mac OS X,
Clang++ is Apple's v 3.1.


=== 8< === 8< === 8< === 8< ===

package Fringes is

    pragma Pure (Fringes);

    type Full_Index is mod 2**32;
    subtype Num is Full_Index Range 0 .. 100_000;
    subtype Index is Full_Index range 0 .. 1000;
    Len : constant Full_Index := Index'Last - Index'First + 1;

    type Matrix_1D is
      array (Index'First .. Index'First + Len * Len - 1) of Num;
    type Matrix_2 is array (Index, Index) of Num;

    procedure Compute_1D (A : in out Matrix_1D);
    procedure Compute_2  (A : in out Matrix_2);

end Fringes;

package body Fringes is

    --
    --  Each Compute procedure has A pointless loop that assigns each
    --  inner field the sum of non-diagonal neighbors modulo Num'Last
    --

    procedure NOT_USED_Compute_1D_Slow (A : in out Matrix_1D) is
       J : Full_Index;
    begin
       J := A'First + Len;
       loop
         exit when J > A'Last - Len;
         for K in J + 1 .. J + Len - 1 - 1 loop
            A (K) := (A(K + 1)
                      + A(K - Len)
                      + A(K - 1)
                      + A(K + Len)) mod Num'Last;
         end loop;
         J := J + Len;
       end loop;
    end NOT_USED_Compute_1D_Slow;

    procedure Compute_1D (A : in out Matrix_1D) is
    begin
       for K in A'First + Len + 1 .. A'Last - Len - 1 loop
          case K mod Len is
          when 0 | Len - 1 => null;
          when others =>
             A (K) := (A(K + 1)
                       + A(K - Len)
                       + A(K - 1)
                       + A(K + Len)) mod Num'Last;
          end case;
       end loop;
    end Compute_1D;

    procedure Compute_2 (A : in out Matrix_2) is
    begin
       for J in A'First(1) + 1 .. A'Last(1) - 1 loop
          for K in A'First(2) + 1 .. A'Last(2) - 1 loop
             A (J, K) := (A(J, K + 1)
                          + A(j - 1, K)
                          + A(J, K - 1)
                          + A(j + 1, K)) mod Num'Last;
          end loop;
       end loop;
    end Compute_2;

end Fringes;


with Fringes;
with Ada.Command_Line, Ada.Real_Time;
procedure Test_Fringes is

    use Ada.Real_Time;

    Runs : constant Natural :=
      Natural'Value (Ada.Command_Line.Argument(1));

    Start, Stop: Time;

    package Os_Or_Gnat_Stacksize_Nuisance is
       use Fringes;
       type P1 is access Matrix_1D;
       type P2 is access Matrix_2;
       M1P : constant P1 := new Matrix_1D'(others => 123);
       M2p : constant P2 := new Matrix_2'(Index => (Index => 123));
       M1D : Matrix_1D renames M1P.all;
       M2 : Matrix_2 renames M2P.all;
    end Os_Or_Gnat_Stacksize_Nuisance;
    use Os_Or_Gnat_Stacksize_Nuisance;

    procedure Print_Timing (Part : String; N : Fringes.Num) is separate;
    use type Fringes.Full_Index;
begin
    Start := Clock;
    for Run in 1 .. Runs loop
       Fringes.Compute_1D (M1D);
    end loop;
    Stop := Clock;
    Print_Timing ("1D", M1D ((M1D'First + 1 * Fringes.Len) + (2)));

    Start := Clock;
    for Run in 1 .. Runs loop
       Fringes.Compute_2 (M2);
    end loop;
    Stop := Clock;
    Print_Timing ("2D", M2 (1, 2));
end Test_Fringes;

with Ada.Text_IO;
separate (Test_Fringes)
procedure Print_Timing (Part : String; N : Fringes.Num) is
    use Ada.Text_IO;
begin
    Put (Part);
    Put (": ");
    Put (Duration'Image (To_Duration(Stop - Start)));
    Put (Fringes.Num'Image (N));
    New_Line;
end print_timing;


=== 8< === 8< === 8< === 8< ===

#include <stdint.h>

namespace Fringes {

     typedef  uint32_t Full_Index;
#define Num_Last 100000
     typedef Full_Index Num;
#define Index_Last 1000
     typedef Full_Index Index;
     const Full_Index Len = Index_Last + 1;

#define A_Last ((Len * Len) - 1)
     typedef Num Matrix_1D[A_Last + 1];

     void Compute_1D (Matrix_1D&  A);
     //  Each Compute procedure has a pointless loop that assigns each
     //  inner field the sum of non-diagonal neighbors modulo Num_Last
     void Compute_1D (Matrix_1D&  A) {

         for (Full_Index K = Len + 1; K < A_Last - Len - 1; ++K) {
             switch (K % Len) {
             case 0: case Len - 1: break;
             default:
                 A [K] = (A[K + 1]
                           + A[K - Len]
                           + A[K - 1]
                           + A[K + Len]) % Num_Last;
             }
         }
     }
}

#include <sys/time.h>
#include <string>

class Reporter {
     struct timeval Start, Stop;
public:
     void logStart();
     void logStop();
     void Print_Timing (const std::string Part, const Fringes::Num N);
};


#include <sstream>

int main(int argc, char* argv[]) {
     using namespace Fringes;

     int Runs;
     Matrix_1D* M1D = new Matrix_1D[Len];
     Reporter history;

     if (argc == 0) {
         throw "usage";
     } else {
         std::istringstream converter(argv[1]);

         if (! ((converter >> Runs) && (Runs >= 0))) {
             throw "argument error?";
         }
     }
     
     history.logStart();
     for (int Run = 1; Run <= Runs; ++Run) {
         Compute_1D (*M1D);
     }
     history.logStop();
     history.Print_Timing ("1D", (*M1D) [(1 * Len) + (2)]);
}


#include <iostream>
#include <sys/time.h>

void Reporter::Print_Timing (const std::string Part, const Fringes::Num N) {
     double difference = (Stop.tv_sec - Start.tv_sec) * 1000000
             + (Stop.tv_usec - Start.tv_usec);
     std::cout << Part << ": ";
     std::cout << (difference / 1000000.0) << " " << N;
     std::cout << '\n';
}

void Reporter::logStart() {
     gettimeofday(&this->Start, 0);
}

void Reporter::logStop() {
     gettimeofday(&this->Stop, 0);
}





  parent reply	other threads:[~2012-07-01 17:45 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-29  9:17 GNAT (GCC) Profile Guided Compilation Keean Schupke
2012-06-29  9:34 ` Dmitry A. Kazakov
2012-06-29 10:01   ` Keean Schupke
2012-06-29 10:24     ` Keean Schupke
2012-06-29 12:26       ` stefan-lucks
2012-06-29 12:51         ` Keean Schupke
2012-06-29 12:05     ` Dmitry A. Kazakov
2012-06-29 10:48 ` Simon Wright
2012-06-29 11:14   ` Keean Schupke
2012-06-29 12:39 ` gautier_niouzes
2012-06-29 12:52   ` Keean Schupke
2012-06-29 14:14     ` gautier_niouzes
2012-06-29 15:05       ` gautier_niouzes
2012-06-29 17:03         ` Keean Schupke
2012-07-01  9:29           ` Georg Bauhaus
2012-07-01 17:45           ` Georg Bauhaus [this message]
2012-07-01 22:57             ` Keean Schupke
2012-07-02 17:15               ` Georg Bauhaus
2012-07-02 17:26                 ` Keean Schupke
2012-07-02 23:48                   ` Keean Schupke
2012-07-04 10:38                     ` Georg Bauhaus
2012-07-04 10:57                       ` Keean Schupke
2012-07-04 12:36                         ` Mark Lorenzen
2012-07-04 12:38                         ` Georg Bauhaus
2012-07-14 20:17                           ` Keean Schupke
2012-07-14 20:33                             ` Keean Schupke
2012-07-14 20:43                             ` Niklas Holsti
2012-07-14 22:32                               ` Keean Schupke
2012-07-14 23:40                                 ` Keean Schupke
2012-07-15  7:15                                   ` Niklas Holsti
2012-07-15  8:27                                     ` Keean Schupke
2012-07-18 10:01                                       ` Georg Bauhaus
2012-07-18 17:36                                         ` Keean Schupke
2012-07-19  5:42                                           ` Georg Bauhaus
2012-07-19 10:18                                             ` Keean Schupke
2012-07-15 11:02                                     ` Niklas Holsti
2012-07-15 12:48                                       ` Keean Schupke
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox