From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a24:b343:: with SMTP id z3mr2602867iti.60.1559956444651; Fri, 07 Jun 2019 18:14:04 -0700 (PDT) X-Received: by 2002:a9d:32a6:: with SMTP id u35mr22096559otb.81.1559956444499; Fri, 07 Jun 2019 18:14:04 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.166.215.MISMATCH!s188no318152itb.0!news-out.google.com!l135ni358itc.0!nntp.google.com!s188no318148itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Fri, 7 Jun 2019 18:14:04 -0700 (PDT) In-Reply-To: <55b14350-e255-406c-ab11-b824da77995b@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=87.112.92.44; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C NNTP-Posting-Host: 87.112.92.44 References: <55b14350-e255-406c-ab11-b824da77995b@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <10240625-5cff-4d5a-a144-f21a3b8b1a08@googlegroups.com> Subject: Re: Toy computational "benchmark" in Ada (new blog post) From: johnscpg@googlemail.com Injection-Date: Sat, 08 Jun 2019 01:14:04 +0000 Content-Type: text/plain; charset="UTF-8" Xref: reader01.eternal-september.org comp.lang.ada:56549 Date: 2019-06-07T18:14:04-07:00 List-Id: >I thought the -O3 would unroll loops where appropriate. Is that not the case? Not on gcc. Unrolling doesn't seem to help much though. >I assume that native arch means it will generate optimal instructions for the >particular architecture on which the compile is running? Sometimes it makes things worse! Though that's rare. Sometimes it helps a little. That's my experience, which is pretty limited. >Ah yes. I used the heap because I didn't want to use such a huge stack (and got >the expected error message when I tried anyway). But I wonder why the heap >should be any slower? I can't see any reason why it would be. CPUs and compilers are so complex now that I never know for sure what's going on. The interesting thing here is that the array is almost entirely in RAM, which makes floating point desperately slow. If you compile the 2 programs below with the -S switch, and read the .s file, then you find that gcc produdes SSE code for both the C and Ada programs. In other words you see instructions like: vmulsd %xmm0, %xmm0, %xmm0 vaddsd %xmm0, %xmm1, %xmm1 That won't help much if fetching memory from RAM is too slow to keep the multipliers busy. If you compile with the -mfpmath=387 switch, then no SSE code is generated, and the running time is about the same. (On my machine.) When you compare programs in different languages, you need to write them the same. See below! I get identical run times from the two with all the compiler switches I try, as long as they are the same compiler switches. You can try various combinations of O2, O3, -mfpmath=387 etc: gnatmake -O3 -march=native -funroll-loops map.adb gcc -O3 -march=native -funroll-loops -march=native map.c and remember to make room for the arrays on the stack. On the bash shell, it's ulimit -s unlimited. On linux, timing with 'time ./a.out' and 'time ./map' works ok, but run them repeatedly, and remove any background processes, (like browsers!) #include double main() { int Calculation_Runs = 100; int Data_Points = 320000000; int i, j; double s; double v[Data_Points]; for (i=0; i