From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a24:148b:: with SMTP id 133mr3220579itg.73.1559885661027; Thu, 06 Jun 2019 22:34:21 -0700 (PDT) X-Received: by 2002:a9d:7995:: with SMTP id h21mr11396209otm.329.1559885660823; Thu, 06 Jun 2019 22:34:20 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!g15no158649itd.0!news-out.google.com!l135ni269itc.0!nntp.google.com!s188no164583itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 6 Jun 2019 22:34:20 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2001:8004:1420:de2:3c2c:6580:a925:cfe; posting-account=rfeywQoAAAC0TKn5ZjdVW0ytcQM1oMSv NNTP-Posting-Host: 2001:8004:1420:de2:3c2c:6580:a925:cfe References: <55b14350-e255-406c-ab11-b824da77995b@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Toy computational "benchmark" in Ada (new blog post) From: David Trudgett Injection-Date: Fri, 07 Jun 2019 05:34:21 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader01.eternal-september.org comp.lang.ada:56521 Date: 2019-06-06T22:34:20-07:00 List-Id: Il giorno venerd=C3=AC 7 giugno 2019 11:42:07 UTC+10, john...@googlemail.co= m ha scritto: >=20 > On my machine I get a nice improvement over -O3 when I > take the arrays off the heap, and then use the following 2 flags: >=20 > -march=3Dnative -funroll-loops That's interesting. Thank you. I'll try that (and your mods below) over the= weekend and see what the result is for me. I thought the -O3 would unroll loops where appropriate. Is that not the cas= e? I assume that native arch means it will generate optimal instructions for t= he particular architecture on which the compile is running? >=20 > Modifying the programs is easy: >=20 > --Values_Array : Values_Array_Access :=3D new Values_Array_Type; > Values_Array : Values_Array_Type; >=20 > In the parallel version, change the loop in the task body: >=20 > -- declare > -- Val : Float64 renames Values_Array (Idx); > -- begin > My_Sum :=3D My_Sum + Values_Array (Idx) ** 2; > -- end; >=20 > The -funroll-loops gave me a nice improvement on the parallel > program, less so on the serial version. (Makes no sense to me > at all!) If you are running in a Unix shell, you usually need > to tell the system if you're going to put giant arrays on the > stack. I type this on the command line: ulimit -s unlimited. Ah yes. I used the heap because I didn't want to use such a huge stack (and= got the expected error message when I tried anyway). But I wonder why the = heap should be any slower? I can't see any reason why it would be. Regards, David