comp.lang.ada
 help / color / mirror / Atom feed
From: David Trudgett <dktrudgett@gmail.com>
Subject: Re: Toy computational "benchmark" in Ada (new blog post)
Date: Sat, 8 Jun 2019 03:17:03 -0700 (PDT)
Date: 2019-06-08T03:17:03-07:00	[thread overview]
Message-ID: <64388ca5-ae30-4451-9883-b5785e96de50@googlegroups.com> (raw)
In-Reply-To: <db62a2be-6eab-4c55-96a0-4a3879f05464@googlegroups.com>

Il giorno venerdì 7 giugno 2019 15:34:22 UTC+10, David Trudgett ha scritto:
> Il giorno venerdì 7 giugno 2019 11:42:07 UTC+10, john...@googlemail.com ha scritto:
> > 
> > On my machine I get a nice improvement over -O3 when I
> > take the arrays off the heap, and then use the following 2 flags:
> > 
> >    -march=native -funroll-loops
> 
> That's interesting. Thank you. I'll try that (and your mods below) over the weekend and see what the result is for me.
> 
> I thought the -O3 would unroll loops where appropriate. Is that not the case?
> 
> I assume that native arch means it will generate optimal instructions for the particular architecture on which the compile is running?
> 
> > 
> > Modifying the programs is easy:
> > 
> >  --Values_Array : Values_Array_Access := new Values_Array_Type;
> >    Values_Array : Values_Array_Type;
> > 
> > In the parallel version, change the loop in the task body:
> > 
> > --       declare
> > --          Val : Float64 renames Values_Array (Idx);
> > --       begin
> >             My_Sum := My_Sum + Values_Array (Idx) ** 2;
> > --       end;
> > 
> > The  -funroll-loops  gave me a nice improvement on the parallel
> > program, less so on the serial version. (Makes no sense to me
> > at all!) If you are running in a Unix shell, you usually need
> > to tell the system if you're going to put giant arrays on the
> > stack. I type this on the command line: ulimit -s unlimited.
> 
> Ah yes. I used the heap because I didn't want to use such a huge stack (and got the expected error message when I tried anyway). But I wonder why the heap should be any slower? I can't see any reason why it would be.
> 
> 

Okay, I have tried (a) using stack allocation instead of heap; and (b) arch=native compilation; and I have compared the resulting timings with the original. The timing results were as follows, and represent running the program three times in a row, and then averaging the reported run times, so that, in effect, the average is for 150 calculation runs all together (50 calc runs per program run).

Original program: 434.718 ms

Stack allocation: 435.667 ms

Native arch flag: 435.745 ms

As you can see, there is virtually no difference, and I did verify that the native architecture compilation did, in fact, use AVX instructions rather than SSE (but not AVX2).

It's interesting that AVX instructions did not cause any improvement in run time (it technically added 1 ms, but I expect that to be statistically insignificant).

Cheers,
David


  reply	other threads:[~2019-06-08 10:17 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 11:05 Toy computational "benchmark" in Ada (new blog post) David Trudgett
2019-06-06 17:48 ` Olivier Henley
2019-06-06 23:14   ` David Trudgett
2019-06-06 23:27     ` Paul Rubin
2019-06-07  5:24       ` David Trudgett
2019-06-07  5:36         ` Paul Rubin
2019-06-06 20:31 ` Jeffrey R. Carter
2019-06-06 23:02   ` David Trudgett
2019-06-07  0:13     ` Paul Rubin
2019-06-07  4:50       ` Paul Rubin
2019-06-07  5:41         ` David Trudgett
2019-06-07  6:00           ` Paul Rubin
2019-06-07  6:25             ` David Trudgett
2019-06-07  6:38               ` Paul Rubin
2019-06-07  5:28       ` David Trudgett
2019-06-07  5:57         ` Paul Rubin
2019-06-07  6:21           ` David Trudgett
2019-06-07  6:22             ` Paul Rubin
2019-06-07  6:29               ` David Trudgett
2019-06-07  6:42                 ` Paul Rubin
2019-06-07 17:55     ` Jeffrey R. Carter
2019-06-08 11:00       ` David Trudgett
2019-06-07  1:42 ` johnscpg
2019-06-07  5:34   ` David Trudgett
2019-06-08 10:17     ` David Trudgett [this message]
2019-06-08  1:14 ` johnscpg
2019-06-08 10:56   ` David Trudgett
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox