From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!fx10.iad.POSTED!not-for-mail From: Brad Moore User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: =?ISO-8859-1?Q?GNAT=A0and_Tasklets?= References: <455d0987-734a-4505-bb39-37bfd1a2cc6b@googlegroups.com> <529f18b8-0612-4236-8aac-efd641f677cd@googlegroups.com> In-Reply-To: <529f18b8-0612-4236-8aac-efd641f677cd@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Message-ID: NNTP-Posting-Host: 68.145.219.148 X-Complaints-To: internet.abuse@sjrb.ca X-Trace: 1418706579 68.145.219.148 (Tue, 16 Dec 2014 05:09:39 UTC) NNTP-Posting-Date: Tue, 16 Dec 2014 05:09:39 UTC Date: Mon, 15 Dec 2014 22:09:39 -0700 X-Received-Bytes: 5628 X-Received-Body-CRC: 306260030 Xref: news.eternal-september.org comp.lang.ada:24031 Date: 2014-12-15T22:09:39-07:00 List-Id: On 2014-12-14 2:29 PM, vincent.diemunsch@gmail.com wrote: > Le dimanche 14 décembre 2014 01:18:42 UTC+1, Hubert a écrit : > >> the result of my research was that depending >> on the OS the Ada program was running on you could get several 100 OS >> threads or maybe 1-2K on Linux but there is an upper limit because every >> OS thread that runs a task will have a stack associated with it, so >> mostly the available memory is the limit, I think. >> [...] >> My solution was to implement my own pre-emptive Job system on top of the >> OS threads. I allocate as many threads (or Tasks in Ada) as there are >> processor cores and then assign a number of Jobs to each. >> [...] >> Depending on what your requirements are (great number of parallel Jobs), >> this may very well be your only reliable solution. >> > > Yes, I think you are completely right. Is your library private or do you plan to release > it as Open Source ? As another alternative, you could look at the Paraffin libraries, which can be found at https://sourceforge.net/projects/paraffin/ These libraries are a set of open source generics that provide several different strategies to use for parallel loops, parallel recursion, and parallel blocks. You can choose between different parallelism strategies such as a static load balancing (work sharing), or dynamic load balancing using work stealing approach for loops, or what I call work seeking which is another variation of load balancing. You can also choose between using task pools, or creating worker tasks dynamically on the fly. Generally I found similar results as reported by Hubert, that the optimal number of workers is typically based on the number of available cores in the system. Adding more workers above that typically does not improve performance, and eventually degrades performance, as each worker introduces some overhead, and if the cores are already fully loaded with work, adding more workers only adds overhead without adding performance benefits. > > This shows clearly that the compiler wasn't able to produce an adequate solution, even > if the case of a lot of little local tasks is quite simple, and has become a standard way of > using multicore computers (see for instance Grand Central Dispatch on Mac OS X). The compiler is already allowed to use implicit parallelism when it sees fit, if it can achieve the same semantic effects that would result from sequential execution. RM 9.11 "Concurrent task execution may be implemented on multicomputers, multiprocessors, or with interleaved execution on a single physical processor. On the other hand, whenever an implementation can determine that the required semantic effects can be achieved when parts of the execution of a given task are performed by different physical processors acting in parallel, it may choose to perform them in this way" However, there are limits to what the compiler can do implicitly. For instance it cannot determine if the following loop can execute in parallel. Sum : Integer := 0; for I in 1 .. 1000 loop Sum := Sum + Foo(I); end loop; For one there is a data race on the variable Sum. If the loop were to be broken up into multiple tasklets executing in parallel, the compiler would need to structure the implementation of the loop very differently than written, and the semantics of execution would not be the same as the sequential case, particularly if an exception is raised inside the loop. Secondly, if Foo is a third party library call, the compiler cannot know if the Foo function itself is modifying global variables which would be unsafe for parallelization. > > I really hope that Ada 202X will limit new features to a few real improvements, and try hard improving compilers. In order for the compiler to generate implicit parallelism for code such as the example above, it needs to be given additional semantic information so that it can guarantee the parallel transformation can be done safely. We are looking at ways of providing such information to the compiler via new aspects that can be checked statically by the compiler. Whether such proposals will actually become part of Ada 202x is another question. It depends on the demand for such features, and how well they can be worked out, without adding too much complexity to the language, or implementation burden to the compiler vendors. I think the general goal at this point will be to limit Ada 202x in terms of new features, but that is the future, and the future is unknown. Brad > > Kind regards, > > Vincent >