From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,103803355c3db607 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.196.232 with SMTP id ip8mr8289869pbc.6.1341164723919; Sun, 01 Jul 2012 10:45:23 -0700 (PDT) Path: l9ni6799pbj.0!nntp.google.com!news1.google.com!npeer01.iad.highwinds-media.com!feed-me.highwinds-media.com!cyclone01.ams2.highwinds-media.com!news.highwinds-media.com!voer-me.highwinds-media.com!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!news.imp.ch!newsfeed.tiscali.ch!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Sun, 01 Jul 2012 19:45:22 +0200 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: GNAT (GCC) Profile Guided Compilation References: <38b9c365-a2b2-4b8b-8d2a-1ea39d08ce86@googlegroups.com> <982d531a-3972-4971-b802-c7e7778b8649@googlegroups.com> <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> In-Reply-To: <520bdc39-6004-4142-a227-facf14ebb0e8@googlegroups.com> Message-ID: <4ff08cb2$0$6575$9b4e6d93@newsspool3.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 01 Jul 2012 19:45:22 CEST NNTP-Posting-Host: 5eb4a7fc.newsspool3.arcor-online.net X-Trace: DXC=Q`[D328SmaP<<0iRN7DLEQMcF=Q^Z^V3X4Fo<]lROoRQ8kFejVXeWj][KfSJEPRna_84DX4YZ X-Complaints-To: usenet-abuse@arcor.de X-Received-Bytes: 9876 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Date: 2012-07-01T19:45:22+02:00 List-Id: On 29.06.12 19:03, Keean Schupke wrote: > Note: this is not really answering my question about the lack of improvement due to profile-guided compilation - but perhaps that side of things is more a compiler issue. Does anyone have any ideas about that side of things? Some more observations, collected with the help of the simple two example programs appended below. For Ada and GNAT at least, any advantages to be had from profile-guided compilation seem to vary with optimization options and sizes of data. The same is true about a 1D approach vs a 2D approach. Among the various arrangements, one of the best I have got from both GNAT GPL 2012, and from an FSF GNAT as of June 2012 uses *) -O2 -funroll-loops -gnatp *) the 2D approach (If it means a anything, adding -ftree-vectorize to the -O2 set produces only the second best Ada result, the same as the -O3 times listed below. I had chosen -ftree-vectorize as this switch is in the O3 set.) Trying profile guided compilation at higher optimization levels consistently slowed the Ada program down (other Ada programs were faster after profile guided compilation, as mentioned elsewhere). The result at -O2 seems nice, though, in that the 2D approach is natural, the compiler switches are the ones that the manual recommends. and this combination produces the fastest Ada program. In short, from worst to best 1D / 2D, Runs = 100: Ada profile guided -> .88 / .80 (this program only!) Ada at -O3 -> .68 / .66 C++ at -O3 -> .66 Ada at -O2 -funr.. -> .68 / .47 C++ profile guided -> .31 Some differences seem to vary with hardware, too. On one computer I have seen a minor speedup, on another a very small slowdown, and a different ratio of 1D / 2D. For the C++ case I have tested a 1D approach only, not knowing how to write 2D arrays in C++ using C style arrays. I'm not a C++ programmer, apologies for my C++. Ada at -O2 -funroll-loops -gnatp -mtune=native 1D: 0.680750000 27303 2D: 0.469094000 27303 <-- Best for Ada! Ada at -O3 -gnatNp -fomit-frame-pointer -mtune=native 1D: 0.676616000 27303 2D: 0.664194000 27303 previous with profile-guided compilation 1D: 0.888206000 27303 2D: 0.806196000 27303 C++ at -O3 -mtune=native 1D: 0.681721 0 previous with profile-guided compilation 1D: 0.31165 0 Clang++ 1D: 0.611522 0 The GNU C++ compiler is from the GNAT GPL 2012 distribution for Mac OS X, Clang++ is Apple's v 3.1. === 8< === 8< === 8< === 8< === package Fringes is pragma Pure (Fringes); type Full_Index is mod 2**32; subtype Num is Full_Index Range 0 .. 100_000; subtype Index is Full_Index range 0 .. 1000; Len : constant Full_Index := Index'Last - Index'First + 1; type Matrix_1D is array (Index'First .. Index'First + Len * Len - 1) of Num; type Matrix_2 is array (Index, Index) of Num; procedure Compute_1D (A : in out Matrix_1D); procedure Compute_2 (A : in out Matrix_2); end Fringes; package body Fringes is -- -- Each Compute procedure has A pointless loop that assigns each -- inner field the sum of non-diagonal neighbors modulo Num'Last -- procedure NOT_USED_Compute_1D_Slow (A : in out Matrix_1D) is J : Full_Index; begin J := A'First + Len; loop exit when J > A'Last - Len; for K in J + 1 .. J + Len - 1 - 1 loop A (K) := (A(K + 1) + A(K - Len) + A(K - 1) + A(K + Len)) mod Num'Last; end loop; J := J + Len; end loop; end NOT_USED_Compute_1D_Slow; procedure Compute_1D (A : in out Matrix_1D) is begin for K in A'First + Len + 1 .. A'Last - Len - 1 loop case K mod Len is when 0 | Len - 1 => null; when others => A (K) := (A(K + 1) + A(K - Len) + A(K - 1) + A(K + Len)) mod Num'Last; end case; end loop; end Compute_1D; procedure Compute_2 (A : in out Matrix_2) is begin for J in A'First(1) + 1 .. A'Last(1) - 1 loop for K in A'First(2) + 1 .. A'Last(2) - 1 loop A (J, K) := (A(J, K + 1) + A(j - 1, K) + A(J, K - 1) + A(j + 1, K)) mod Num'Last; end loop; end loop; end Compute_2; end Fringes; with Fringes; with Ada.Command_Line, Ada.Real_Time; procedure Test_Fringes is use Ada.Real_Time; Runs : constant Natural := Natural'Value (Ada.Command_Line.Argument(1)); Start, Stop: Time; package Os_Or_Gnat_Stacksize_Nuisance is use Fringes; type P1 is access Matrix_1D; type P2 is access Matrix_2; M1P : constant P1 := new Matrix_1D'(others => 123); M2p : constant P2 := new Matrix_2'(Index => (Index => 123)); M1D : Matrix_1D renames M1P.all; M2 : Matrix_2 renames M2P.all; end Os_Or_Gnat_Stacksize_Nuisance; use Os_Or_Gnat_Stacksize_Nuisance; procedure Print_Timing (Part : String; N : Fringes.Num) is separate; use type Fringes.Full_Index; begin Start := Clock; for Run in 1 .. Runs loop Fringes.Compute_1D (M1D); end loop; Stop := Clock; Print_Timing ("1D", M1D ((M1D'First + 1 * Fringes.Len) + (2))); Start := Clock; for Run in 1 .. Runs loop Fringes.Compute_2 (M2); end loop; Stop := Clock; Print_Timing ("2D", M2 (1, 2)); end Test_Fringes; with Ada.Text_IO; separate (Test_Fringes) procedure Print_Timing (Part : String; N : Fringes.Num) is use Ada.Text_IO; begin Put (Part); Put (": "); Put (Duration'Image (To_Duration(Stop - Start))); Put (Fringes.Num'Image (N)); New_Line; end print_timing; === 8< === 8< === 8< === 8< === #include namespace Fringes { typedef uint32_t Full_Index; #define Num_Last 100000 typedef Full_Index Num; #define Index_Last 1000 typedef Full_Index Index; const Full_Index Len = Index_Last + 1; #define A_Last ((Len * Len) - 1) typedef Num Matrix_1D[A_Last + 1]; void Compute_1D (Matrix_1D& A); // Each Compute procedure has a pointless loop that assigns each // inner field the sum of non-diagonal neighbors modulo Num_Last void Compute_1D (Matrix_1D& A) { for (Full_Index K = Len + 1; K < A_Last - Len - 1; ++K) { switch (K % Len) { case 0: case Len - 1: break; default: A [K] = (A[K + 1] + A[K - Len] + A[K - 1] + A[K + Len]) % Num_Last; } } } } #include #include class Reporter { struct timeval Start, Stop; public: void logStart(); void logStop(); void Print_Timing (const std::string Part, const Fringes::Num N); }; #include int main(int argc, char* argv[]) { using namespace Fringes; int Runs; Matrix_1D* M1D = new Matrix_1D[Len]; Reporter history; if (argc == 0) { throw "usage"; } else { std::istringstream converter(argv[1]); if (! ((converter >> Runs) && (Runs >= 0))) { throw "argument error?"; } } history.logStart(); for (int Run = 1; Run <= Runs; ++Run) { Compute_1D (*M1D); } history.logStop(); history.Print_Timing ("1D", (*M1D) [(1 * Len) + (2)]); } #include #include void Reporter::Print_Timing (const std::string Part, const Fringes::Num N) { double difference = (Stop.tv_sec - Start.tv_sec) * 1000000 + (Stop.tv_usec - Start.tv_usec); std::cout << Part << ": "; std::cout << (difference / 1000000.0) << " " << N; std::cout << '\n'; } void Reporter::logStart() { gettimeofday(&this->Start, 0); } void Reporter::logStop() { gettimeofday(&this->Stop, 0); }