From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!fx09.iad.POSTED!not-for-mail
From: Brad Moore <brad.moore@shaw.ca>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: =?windows-1252?Q?GNAT=A0and_Tasklets?=
References: <455d0987-734a-4505-bb39-37bfd1a2cc6b@googlegroups.com>
 <XKOjw.764819$JH1.280080@fx08.iad>
 <f9828477-a98e-4795-803d-5926aa7a1fdb@googlegroups.com>
In-Reply-To: <f9828477-a98e-4795-803d-5926aa7a1fdb@googlegroups.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <khmkw.936253$Fo3.711103@fx09.iad>
NNTP-Posting-Host: 68.145.219.148
X-Complaints-To: internet.abuse@sjrb.ca
X-Trace: 1418850512 68.145.219.148 (Wed, 17 Dec 2014 21:08:32 UTC)
NNTP-Posting-Date: Wed, 17 Dec 2014 21:08:32 UTC
Date: Wed, 17 Dec 2014 14:08:31 -0700
X-Received-Bytes: 9368
X-Received-Body-CRC: 3688213886
Xref: news.eternal-september.org comp.lang.ada:24075
Date: 2014-12-17T14:08:31-07:00
List-Id: <comp.lang.ada>

On 14-12-17 06:06 AM, vincent.diemunsch@gmail.com wrote:
> Hello Brad,
>
>> I dont think this is accurate, as creating tasks in Ada generally serves
>> a different purpose than adding improved parallelism. Tasks are useful
>> constructs for creating independent concurrent activities. It is a way
>> of breaking an application into separate independent logical executions
>> that separate concerns, improving the logic and understanding of a
>> program. Parallelism on the other hand is only about making the program
>> execute faster. If the parallelism does not do that, it fails to serve
>> its purpose.
>
> I am rather surprised that you made a distinction between creating tasks
> and parallelism. I agree that the goal of parallelism is to increase CPU
> usage and therefore make the program run faster. For me creating tasks is
> the Ada way of implementing parallelism. And it is a sound way of doing it since compilers, as far as I know, are not really able to find automaticaly parallelism in a program. Moreover using things like state machines to
> create parallelism is to complex for a programmer and needs the use of a
> dedicated langage. So tasks are fine.

I made the distinction because they are not the same. Parallelism is a 
form of concurrency but is really just a subset of concurrency. In 
parallelism, multiple tasks are executing at the same time. In more 
general concurrency, this is not necessarily the case. Time slicing 
might be used for example, so that in reality only one task is executing 
at a given instance in time.

I am not disagreeing that that tasks can be useful for implementing 
parallelism. The Paraffin libraries for instance use Ada tasks as the 
underlying worker. It just that to effectively use multicores/manycore 
architectures for parallelism, tasks generally are too course grained a 
construct for the application programmer to have to use as a starting 
point every time they want to to introduce parallelism. It's just too 
much code to have to write each time if you want to take advantage of 
things like load balancing, reductions, variable numbers of cores, 
oversubscription of parallelism prevention, choosing the right number of 
workers for the job, obtaining workers from a task pool, etc.
Having a library that does all this work for you makes sense, so that 
the programmer can focus more on the algorithm without having to think 
so much about the parallelism.

The other alternative is to build in more smarts into the compiler so 
that it can implicitly generate parallelism.


Ada compilers are already able to automatically parallelize some things, 
and I believe some of them (all of them?) do.

Someone posted an example a year or so ago of a loop in Ada that GNAT 
could optimize to utilize the cores. It might be that the parallelism 
was implemented as vectorization of the GCC backend, and it might have 
been that the compiler was taking advantage of hardware parallelism 
instructions rather than using a software thread based approach.

But there are limits to what the compiler can safely parallelize. The 
compiler needs to ensure that data races are not introduced, for 
instance, which could cause an otherwise good sequential program to fail 
disastrously.

In our HILT paper from last October, we presented a notion of being able 
to define a Global aspect, which identify dependencies on global data, 
that can be applied to subprograms. This is an extension of the Global 
aspect associated with SPARK. We also have a similar aspect that 
identifies subprograms that are potentially blocking. If the compiler 
can statically tell which subprograms have unsafe dependencies on global 
data, or can tell that no such dependencies exist, then it should be 
able to implicitly parallelize a loop that includes such calls.  Without 
such information being statically available to the compiler, it cannot 
safely inject parallelism, and has to play it safe and generate 
sequential code.

If we get into the realm of having the compiler generate the parallelism 
implicitly, then the underlying worker does not necessarily need to be a 
task. It could be, but if the compiler can make use of some lighter 
weight mechanism, it can do so, so long as the semantic effect of the 
parallelism is the same.


>
>> So the availability of a parallelism library shouldn't really affect the
>> way one structures their program into a collection of tasks.
>> I find such a library is useful when one ones to improve the execution
>> time of one/some of the tasks in the application where performance is
>> not adequate. Tasks and parallelism libraries can complement each other
>> to achieve the best of both worlds.
>
> I am sorry to disagree : the very existence of a parallelism Library shows
> the inability of the current Ada technology to do deal directly with parallelism
> inside the Ada langage. I realy think this is due to the weakness of current compilers, but if there are also problems inside the langage they should be addressed (like the Ravenscar restriction that allowed predictable tasking, or
> special constructs to express parallelism, or "aspects" to indicate that a task
> should be run on a GPU...). Since should be only a few features of Ada 202X.

I dont think we are disagreeing here. I was only mentioning that for 
non-parallelism concurrency, or for courser grained parallelism, tasks 
can still be used in much the same way they are used for concurrency on 
a single core. We are hoping that some new aspects and syntax could be 
considered for Ada 202x, but if that doesn't get the support needed for 
standardization, then one could either ask their compiler vendors for 
implementation defined, non-portable support, or they can resort to 
using libraries such as Paraffin, which actually are portable.
I have used Paraffin on both GNAT and the ICC Ada compiler.


>
>
>> My understanding is that GNAT generally maps tasks to OS threads on a
>> one to one basis, but as others have pointed out, there may be
>> configurations where other mappings are also available.
>
> I could understand that a Library level task (i.e. a task declared immediately
> in a package that is at lirary level) be mapped to an OS thread, but a
> simple local task should definetly not.

Why not?

And even that is a simplification since
> as you pointed, there is often no use to create more kernel threads than the
> number of available CPU.

It depends. For example, if those threads block, then while they are 
blocked the cores are not being used, so it is actually beneficial to 
have more threads than there are CPU for such cases.


>
>> My understanding also is that at one time, GNAT had an implementation
>> built on top of FSU threads developed at Florida State University, by
>> Ted Baker. This implementation ran all tasks under one OS thread.
>> [...] The FSU
>> thread implementation gives you concurrency by allowing tasks to execute
>> independently from each other, using some preemptive scheduling model to
>> shift the processor between the multiple tasks of an application.
>
> The solution of all tasks under one kernel thread is good for monoprocessors, and since User Level threads are lightweight compared to Kernel threads, it was acceptable to map a task to a thread.
> But with multiple cores, we need all tasks running on a pool of kernel threads, one thread per core. And I suppose that when multicores came, it has been considered easier to drop the FSU implementation and simply map one task to a kernel thread. But doing this is an oversimplification that gives poor performances for pure parallel computing, and gave rise to the need of parallelism Library ! (Not to mention the problem of GPU that are commonly used for highly demanding computations and are not supported by GNAT... )
>
> What we need now is a new implementation of tasking in GNAT, able to treat
> local tasks as jobs.

I'm not convinced here. In Paraffin, I have several versions of the 
libraries. Some that use task pools where worker tasks are started 
beforehand, and some that create local tasks on the fly. I was quite 
surprised at the good performance of creating local tasks on the fly on 
the platforms I've tried. (Windows, Linux, and Android).

Brad

>
> Regards,
>
> Vincent
>