From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: =?windows-1252?Q?GNAT=A0and_Tasklets?=
Date: Fri, 19 Dec 2014 18:28:45 +0100
Organization: cbb software GmbH
Message-ID: <1r2ziulc78imb$.ad6zx5upic6s$.dlg@40tude.net>
References: <f9828477-a98e-4795-803d-5926aa7a1fdb@googlegroups.com>
 <cfe7hnFaj8oU1@mid.individual.net>
 <avd3b8es5f6d$.byde19f9gn19$.dlg@40tude.net>
 <8277a521-7317-4839-b0b6-97f8155be1a4@googlegroups.com>
 <fegtcevrst5z$.fck8rij41mwt.dlg@40tude.net>
 <9e1d2b9f-1b97-4679-8eec-5ba75f3c357c@googlegroups.com>
 <zdj4kqj1rzxe.o579icl2x61u$.dlg@40tude.net>
 <478c81f9-5233-4ae1-a3eb-e67c4dbd0da1@googlegroups.com>
 <m70va3$t6a$1@dont-email.me> <yjrnhk0w8gjd.k6ht3uh7raiw.dlg@40tude.net>
 <gAYkw.944188$Fo3.593534@fx09.iad>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: DnbLlTjuGCQpsiNtwqNvSg.user.speranza.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news.eternal-september.org comp.lang.ada:24153
Date: 2014-12-19T18:28:45+01:00
List-Id: <comp.lang.ada>

On Fri, 19 Dec 2014 09:42:52 -0700, Brad Moore wrote:

> On 14-12-19 04:01 AM, Dmitry A. Kazakov wrote:
>> On Fri, 19 Dec 2014 10:40:03 +0000 (UTC), Georg Bauhaus wrote:
>>
>>> <vincent.diemunsch@gmail.com> wrote:
>>>
>>>> It would be interesting to do a little survey on existing code using tasking.
>>>> I have the impression that only tasks at Library level do rendez-vous and
>>>> protected object synchronisation, and local tasks, most of the time, are
>>>> limited to a rendez-vous with their parent task at the beginning or at
>>>> the end. So maybe we should put restrictions on local tasks, so that we
>>>> can map them to jobs.
>>>
>>> Won't the parallel loop feature be providing
>>> for this kind of mini job:
>>
>> Parallel loop is useless for practical purposes. It wonders me why people
>> wasting time with this.
> 
> For multicore, the idea is to make better use of the cores when doing so 
> will improve performance.

I don't think multi-core would bring any advantage. Starting / activating /
reusing / feeding / re-synchronizing threads is too expensive.

Parallel loops could be useful on some massively vectorized architectures
in some very specialized form, or on architectures with practically
infinite number of cores (e.g. molecular computers). Anyway feeding threads
with inputs and gathering outputs may still mean more overhead than any
gain.

> Also, the number of iterations does not need to be large to see 
> parallelism benefits.
> 
>      for I in parallel 1 .. 10 loop
>          Lengthy_Processing_Of_Image (I);
>      end loop;

Certainly not for image processing. In image processing when doing it by
segments, you need to sew the segments along their borders, practically in
all algorithms. That makes parallelization far more complicated to be
handled by such a blunt thing as parallel loop.

>> They could start with logical operations instead:
>>
>>      X and Y
>>
>> is already parallel by default. AFAIK nothing in RM forbids concurrent
>> evaluation of X and Y if they are independent. Same with Ada arithmetic.
>> E.g.
>>
>>     A + B + C + D
>>
>> So far no compiler evaluates arguments concurrently or vectorizes
>> sub-expressions like:
>>
>>     A
>>     B  +
>>     C      +
>>     D  +
>>
>> Because if they did the result would work slower than sequential code. It
>> simply does not worth the efforts with existing machine architectures.
> 
> The compiler should be able to make the decision to parallelize these if 
> there is any benefit to doing so. Likely the decision would be to *not* 
> parallelize these, if A, B, C, and D are objects of some elementary type..
> 
> But it depends on the datatype of A, B, C, and D.
> 
> Also A, B, C, and D might be function calls, not simple data references, 
> and these calls might involve lengthy processing, in which case, adding 
> parallelism might make sense.

Yes, provided the language has means to describe side effects of such
computations in a way making the decision safe.

> Or, if these are objects of a Big Number library with infinite 
> precision, you might have an irrational number with pages of digits each 
> for numerator and denominator. Performing math on such values might very 
> well benefit from parallelism.

It won't, because a big number library will use the heap or a shared part
of the stack which will require interlocking and thus will either be marked
as "impure", so that the compiler will not try to parallelize, or else will
make the compiler to use locks, which will effectively kill parallelism.

> We looked at being able to explicitly state parallelism for subprograms, 
> (parallel subprograms), but found that syntax was messy, and there were 
> too many other problems.
> 
> We are currently thinking a parallel block syntax better provides this 
> capability, if the programmer wants to explicitly indicate where 
> parallelism is desired.
> 
> eg.
> 
>       Left, Right, Total : Integer := 0;
> 
>       parallel
>            Left := A + B;
>       and
>            Right := C + D;
>       end parallel;
> 
>       Total := Left + Right;
> 
> or possibly allow some automatic reduction
> 
> 
>     Total : Integer with Reduce := 0;
> 
>     parallel
>          Total := A + B;
>     and
>          Total := C + D;
>     end parallel;
> 
> Here, two "tasklets" would get created that can execute in parallel, 
> each with their local instance of the Total result (i.e. thread local 
> storage), and at the end of the parallel block, the two results are 
> reduced into one and assigned to the actual Total.
> 
> The reduction operation and identity value used to initialize the local 
> instances of the Total could be defaulted by the compiler for simple 
> data types, but could be explicitly stated if desired.
> 
> eg.
> 
>     Total : Integer with Reduce => (Reducer => "+", Identity => 0) := 0;
> 
>     parallel
>          Total := A + B;
>     and
>          Total := C + D;
>     end parallel;

The semantics is not clear. What happens if:

   parallel
      Total := Total + 1;
      Total := A + B;
   and
      Total := C + D;
   end parallel;

and of course the question of exceptions raised within concurrent paths.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de