From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5164ccc41905b2d0
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
X-Received: by 10.180.106.73 with SMTP id gs9mr6272886wib.2.1362713523453;
        Thu, 07 Mar 2013 19:32:03 -0800 (PST)
MIME-Version: 1.0
Path: 
 bp2ni79073wib.1!nntp.google.com!proxad.net!feeder1-2.proxad.net!newsfeed.straub-nv.de!nuzba.szn.dk!news.jacob-sparre.dk!munin.jacob-sparre.dk!pnx.dk!.POSTED!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: Ada and OpenMP
Date: Thu, 7 Mar 2013 21:31:57 -0600
Organization: Jacob Sparre Andersen Research & Innovation
Message-ID: <khbm3h$f7f$1@munin.nbi.dk>
References: <fbdb8bd7-be77-4338-90ba-0b0c212f121f@googlegroups.com>
 <87k3pjht79.fsf@ludovic-brenta.org> <hr-dnULuncyRjqTM4p2dnAA@giganews.com>
 <khb8m6$933$1@munin.nbi.dk> <za6dnTU03LGyrqTM4p2dnAA@giganews.com>
NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net
X-Trace: munin.nbi.dk 1362713522 15599 69.95.181.76 (8 Mar 2013 03:32:02 GMT)
X-Complaints-To: news@jacob-sparre.dk
NNTP-Posting-Date: Fri, 8 Mar 2013 03:32:02 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Response
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Date: 2013-03-07T21:31:57-06:00
List-Id: <comp.lang.ada>

"Peter C. Chapin" <PChapin@vtc.vsc.edu> wrote in message 
news:za6dnTU03LGyrqTM4p2dnAA@giganews.com...

>> Isn't OpenMP aimed at SIMD-type machines (as in video processors), as
>> opposed to generalized cores as in typical Intel and ARM designs?
>> Fine-grained parallelism doesn't make much sense on the latter, because
>> cache coherence and core scheduling issues will eat up gains in almost 
>> all
>> circumstances. Ada tasks are a much better model.
>
>Well, I used OpenMP for a program targeting x64 architectures and it worked 
>well in my case. It was easy to use: my program became 8x faster by the 
>addition of a single line of source text. It even computed the right 
>answer. My program was very well suited to the OpenMP model of computation, 
>however, so I wouldn't expect such a dramatic result in all cases of 
>course.

But (based on the rest of your note) isn't "fine-grained parallelism". You 
called a bunch of expensive library functions in the loop, and thus your 
actual computations are large enough that the mechanism would work well. But 
so would have an arrangement like Paraffin (with a bit more code 
rearrangement).

...
>> Well, this doesn't make much sense. If the pragma doesn't change the
>> semantics of the loop, then its not necessary at all (the compiler can 
>> and
>> ought to do the optimization when it makes sense, possibly under the 
>> control
>> of global flags). Programmers are lousy at determining where and how the
>> best of use of machine resources can be made.
>
> I only used the pragma above to follow the mindset of OpenMP under C. I 
> agree it might not be the best way to do it in Ada.
>
> I'm a little uncertain, though, about how well the compiler can be 
> expected to find this sort of parallelization... at least with current 
> technology. The compiler I was using for the program above, and it wasn't 
> an ancient one, certainly had no idea how to do such things on its own.

Well, the problem is that if you follow Ada semantics (which are sequential 
for loops), you probably can't parallelize even if you have a pragma. That's 
because the sequential semantics are observable because of exceptions: an 
exception happening after the second iteration of a loop had better not 
modify anything that the fourth iteration would do.

And if you want to make such a major change to Ada semantics, writing a 
pragma is NOT the right way, IMHO. It would be many times better for Ada to 
simply have parallel loops:

     for I in 1 .. 10 loop in parallel
         ...
     end loop;

In which case depending upon the sequential exceution wouldn't be allowed. 
(There also would have to be some restrictions on what the "..." could be). 
We're exploring some such ideas for a future version of Ada, and it would be 
nice if some trial implementations appeared.

> In a high performance application nested loops are common and often the 
> body of a loop calls a subprogram implemented in a library that itself has 
> loops. I don't want all of these nested loops parallelized because that 
> would create huge overheads.

This is my point: "fine-grained parallelism" means that *everything* is 
(potentially) parallelized. (See Tucker's Parasail for an example.) You're 
essentially saying that it doesn't work.

Also, there is a strong argument that you're prematurely optimizing your 
code if you're worrying about "overheads" created. The compiler can figure 
these out far better than a human can -- when you do it, you're only 
guessing -- a compiler has a lot more information with which to decide. It's 
best to tell the compiler where you don't care about exceptions (for 
instance) and let it pick the best parallelization.

> Yet without detailed semantic information about what the library 
> subprograms do, I'm not sure how the compiler can know it's safe to 
> parallelize the top level loop.

The compiler *has* to have such information, or it can't do *anything* 
useful. By anything, I mean optimizations (both sequential or parallel), 
proofs, static checking, and the like. Either it has to have access to the 
bodies, or it needs *strong* contracts covering everything. We proposed all 
of those for Ada 2012, but we didn't have the energy to finish the global 
in/out contracts. You can get a bit of information from "Pure", but that's 
about it.

For Janus/Ada, we always assume a subprogram can do anything, and that 
prevents about 98% of optimizations from happening across subprogram calls. 
But that simply isn't acceptable today, especially with Ada 2012's 
assertions (you have to be able to remove redundant assertion checks to make 
the cost cheap enough that they don't need to be left on all the time). And 
parallelization is a similar situation: the compiler needs to know about 
side-effects of every function, in detail, before it can generate code to 
take advantage of modern features.

> I'm not an expert in writing parallelizing compilers for sure, but it 
> seemed to me, when I was experimenting with OpenMP, that it did a nice job 
> of taking care of the grunt work while still allow me to apply my broad 
> knowledge of the application to find good places to parallelize.

I suspect it works better with C, which doesn't have error semantics. In 
that case, how the loop is implemented doesn't really matter. Similarly, no 
one cares that you can easily introduce hard to find bugs if there is any 
overlap between your iterations. Ada is different in both of these 
regards -- new ways to introduce erroneous execution are not tolerated by 
most Ada users.

In any case, these sorts of things are hacks to use until the compilers and 
languages catch up. Programs shouldn't be specifying in-lining or 
parallelism in detail, at most some hints might be provided. Compilers do 
this sort of grunt work a whole lot better than humans.

                                                Randy.