From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.66.83.230 with SMTP id t6mr11783308pay.37.1468111536957;
        Sat, 09 Jul 2016 17:45:36 -0700 (PDT)
X-Received: by 10.157.14.227 with SMTP id 90mr328343otj.0.1468111536895; Sat,
 09 Jul 2016 17:45:36 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!r1no14966070ige.0!news-out.google.com!d68ni2098ith.0!nntp.google.com!r1no14966064ige.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Sat, 9 Jul 2016 17:45:36 -0700 (PDT)
In-Reply-To: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8202:8510:5985:2c17:9409:aa9c;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8202:8510:5985:2c17:9409:aa9c
References: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3cdf2098-adc8-48e7-959a-2a51743c50ed@googlegroups.com>
Subject: Re: RFC: Prototype for a user threading library in Ada
From: rieachus@comcast.net
Injection-Date: Sun, 10 Jul 2016 00:45:36 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:31048
Date: 2016-07-09T17:45:36-07:00
List-Id: <comp.lang.ada>

On Friday, June 17, 2016 at 5:44:18 AM UTC-4, Hadrien Grasland wrote:
> So, a while ago, after playing with the nice user-mode threading librarie=
s that are available these days in C++, like Intel TBB and HPX, I thought i=
t would be nice if Ada had something similar.

I've been following this thread for a while, and I keep thinking that there=
 is a lot of barking up the wrong tree.  But then again, I am used to think=
ing about the size of problem that takes supercomputers.  Ada is a decent f=
it but the real issues are language independent--and as you may have grown =
tired of seeing me say, the optimal program--in any language will finish wi=
thin a few seconds of the estimate from only looking at moving data.  Even =
if you have thousands of CPU hours of number crunching to do, your program =
looks like this:

Distribute the program code to the correct number of nodes.
Start the program.  The nodes will collect the data from the data storage s=
ystem, interact (with hopefully only adjacent nodes) to complete the comput=
ation.
Collect and assemble the finished data for presentation.

The hard part of supercomputer programming today is not writing the code th=
at does the work, it is distributing the program and data, and then assembl=
ing the results.

How to do this?  Let's forget about Ada tasking for a moment.  It may be gr=
eat for tying together the (relatively small) number of CPUs at a node whic=
h share memory, and in some cases even cache memory.  What you need to star=
t with assuming you wrote your program as a thousand or more distributed an=
nex programs, is either to use tools that are part of the supercomputer to =
distribute thousands of copies of the identical code to different nodes, or=
 to create your program with a hierarchical tree structure.  Otherwise you =
will spend all your time just getting the code distributed from a single no=
de.

The other end of the program presents the same issues.  Imagine you want to=
 simulate the evolution of the universe over a space that grows to a cube a=
 million parsecs on a side and a period of a billion years.  After each tim=
e step you want to collect the data for that epoch for later viewing.  Agai=
n, if all the nodes are hammering on a single file, your program won't run =
very fast.

One solution is to store result data on a per node basis, then when the pro=
gram is finished, run a program that converts from slices through time to 3=
d images at a given point in time.  This program may run several times as l=
ong as your big number crunching program.  But you can do it on your workst=
ation over the weekend. ;-)

Another, tricky but it works, is to skew time in the main program.  This ca=
n result in duplicate number crunching at the boundaries, but as I keep say=
ing, that is not the problem.  Now you feed the data into files, but you ha=
ve several hundred files corresponding to different points in time, all ope=
n and collecting data.

Why post all this here?  The Ada distributed annex is actually a good fit f=
or the tools available on most supercomputers.  Combining the distributed a=
nnex with Ada tasking on the nodes provide by modern CPUs, is a good fit.  =
Unfortunately, or we just have to learn to live with it, supercomputers are=
 being taken over by GPUs.  The CPUs are still there, but they end up deleg=
ated to moving data between nodes, and doing whatever synchronization is ne=
eded.

Technically the tools are there to write code in high-level languages and r=
un it on GPUs.  But right now, you end up with lots of structural artifacts=
 in your program that make it non-portable.  (The biggest of these is the n=
umber of shader processors per GPU.  AMD has done a nice thing in breaking =
the shaders into groups of 64 on all their cards.  Helps a lot when working=
 on a single machine, but...  Besides, right now most supercomputers use nV=
idia cards.  This may change if AMD gets their GFLOPS/watt down to where nV=
idia is.)