From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 2002:a24:b343:: with SMTP id z3mr2602867iti.60.1559956444651;
        Fri, 07 Jun 2019 18:14:04 -0700 (PDT)
X-Received: by 2002:a9d:32a6:: with SMTP id u35mr22096559otb.81.1559956444499;
 Fri, 07 Jun 2019 18:14:04 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.166.215.MISMATCH!s188no318152itb.0!news-out.google.com!l135ni358itc.0!nntp.google.com!s188no318148itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Fri, 7 Jun 2019 18:14:04 -0700 (PDT)
In-Reply-To: <55b14350-e255-406c-ab11-b824da77995b@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=87.112.92.44;
 posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C
NNTP-Posting-Host: 87.112.92.44
References: <55b14350-e255-406c-ab11-b824da77995b@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <10240625-5cff-4d5a-a144-f21a3b8b1a08@googlegroups.com>
Subject: Re: Toy computational "benchmark" in Ada (new blog post)
From: johnscpg@googlemail.com
Injection-Date: Sat, 08 Jun 2019 01:14:04 +0000
Content-Type: text/plain; charset="UTF-8"
Xref: reader01.eternal-september.org comp.lang.ada:56549
Date: 2019-06-07T18:14:04-07:00
List-Id: <comp.lang.ada>

>I thought the -O3 would unroll loops where appropriate. Is that not the case?

Not on gcc. Unrolling doesn't seem to help much though.

>I assume that native arch means it will generate optimal instructions for the >particular architecture on which the compile is running?

Sometimes it makes things worse! Though that's rare. Sometimes it helps a little. That's my experience, which is pretty limited.

>Ah yes. I used the heap because I didn't want to use such a huge stack (and got >the expected error message when I tried anyway). But I wonder why the heap >should be any slower? I can't see any reason why it would be.

CPUs and compilers are so complex now that I never know
for sure what's going on. The interesting thing here is
that the array is almost entirely in RAM, which makes floating
point desperately slow.

If you compile the 2 programs below with the -S switch,
and read the .s file, then you find that gcc produdes SSE code
for both the C and Ada programs.  In other words you
see instructions like:
   vmulsd  %xmm0, %xmm0, %xmm0
   vaddsd  %xmm0, %xmm1, %xmm1
That won't help much if fetching memory from RAM is too slow
to keep the multipliers busy. 

If you compile with the -mfpmath=387 switch, then no SSE code
is generated, and the running time is about the same. (On my
machine.)

When you compare programs in different languages, you need to
write them the same. See below! I get identical run times from
the two with all the compiler switches I try, as long as they
are the same compiler switches. You can try various combinations
of O2, O3, -mfpmath=387 etc:

  gnatmake -O3 -march=native -funroll-loops map.adb
  gcc -O3 -march=native -funroll-loops -march=native map.c

and remember to make room for the arrays on the stack. On the
bash shell, it's ulimit -s unlimited. On linux, timing
with 'time ./a.out' and 'time ./map' works ok, but run them
repeatedly, and remove any background processes, (like browsers!)

#include <stdio.h>
double main()
{
    int Calculation_Runs = 100;
    int Data_Points = 320000000;
    int i, j;
    double s;
    double v[Data_Points];

    for (i=0; i<Data_Points; i++){
      v[i] = 3.14159265358979323846;
    }

    for (j=0; j<Calculation_Runs; j++){
        for (i=0; i<Data_Points; i++){
          s = s + v[i] * v[i];
        }
    }
    printf("Sum = %f",s);
}

with Ada.Text_IO; use Ada.Text_IO;
procedure Map is
   Calculation_Runs : constant := 100;
   Data_Points : constant := 320_000_000;

   type Values_Index is range 1 .. Data_Points;
   type Float64 is digits 15;
   type Values_Array_Type is array (Values_Index) of Float64;
   Values_Array : Values_Array_Type;
   Sum : Float64 := 0.0;
begin
   for i in Values_Index loop
      Values_Array (i) := 3.14159265358979323846;
   end loop;

   for j in 1 .. Calculation_Runs loop
   for i in Values_Index loop
      Sum := Sum + Values_Array(i) * Values_Array(i);
   end loop;
   end loop;
   Put_Line ("Sum = " & Sum'Image);
end Map;