Re: RFC: Prototype for a user threading library in Ada

comp.lang.ada
 help / color / mirror / Atom feed

From: rieachus@comcast.net
Subject: Re: RFC: Prototype for a user threading library in Ada
Date: Sat, 9 Jul 2016 17:45:36 -0700 (PDT)
Date: 2016-07-09T17:45:36-07:00	[thread overview]
Message-ID: <3cdf2098-adc8-48e7-959a-2a51743c50ed@googlegroups.com> (raw)
In-Reply-To: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com>

On Friday, June 17, 2016 at 5:44:18 AM UTC-4, Hadrien Grasland wrote:
> So, a while ago, after playing with the nice user-mode threading libraries that are available these days in C++, like Intel TBB and HPX, I thought it would be nice if Ada had something similar.

I've been following this thread for a while, and I keep thinking that there is a lot of barking up the wrong tree.  But then again, I am used to thinking about the size of problem that takes supercomputers.  Ada is a decent fit but the real issues are language independent--and as you may have grown tired of seeing me say, the optimal program--in any language will finish within a few seconds of the estimate from only looking at moving data.  Even if you have thousands of CPU hours of number crunching to do, your program looks like this:

Distribute the program code to the correct number of nodes.
Start the program.  The nodes will collect the data from the data storage system, interact (with hopefully only adjacent nodes) to complete the computation.
Collect and assemble the finished data for presentation.

The hard part of supercomputer programming today is not writing the code that does the work, it is distributing the program and data, and then assembling the results.

How to do this?  Let's forget about Ada tasking for a moment.  It may be great for tying together the (relatively small) number of CPUs at a node which share memory, and in some cases even cache memory.  What you need to start with assuming you wrote your program as a thousand or more distributed annex programs, is either to use tools that are part of the supercomputer to distribute thousands of copies of the identical code to different nodes, or to create your program with a hierarchical tree structure.  Otherwise you will spend all your time just getting the code distributed from a single node.

The other end of the program presents the same issues.  Imagine you want to simulate the evolution of the universe over a space that grows to a cube a million parsecs on a side and a period of a billion years.  After each time step you want to collect the data for that epoch for later viewing.  Again, if all the nodes are hammering on a single file, your program won't run very fast.

One solution is to store result data on a per node basis, then when the program is finished, run a program that converts from slices through time to 3d images at a given point in time.  This program may run several times as long as your big number crunching program.  But you can do it on your workstation over the weekend. ;-)

Another, tricky but it works, is to skew time in the main program.  This can result in duplicate number crunching at the boundaries, but as I keep saying, that is not the problem.  Now you feed the data into files, but you have several hundred files corresponding to different points in time, all open and collecting data.

Why post all this here?  The Ada distributed annex is actually a good fit for the tools available on most supercomputers.  Combining the distributed annex with Ada tasking on the nodes provide by modern CPUs, is a good fit.  Unfortunately, or we just have to learn to live with it, supercomputers are being taken over by GPUs.  The CPUs are still there, but they end up delegated to moving data between nodes, and doing whatever synchronization is needed.

Technically the tools are there to write code in high-level languages and run it on GPUs.  But right now, you end up with lots of structural artifacts in your program that make it non-portable.  (The biggest of these is the number of shader processors per GPU.  AMD has done a nice thing in breaking the shaders into groups of 64 on all their cards.  Helps a lot when working on a single machine, but...  Besides, right now most supercomputers use nVidia cards.  This may change if AMD gets their GFLOPS/watt down to where nVidia is.)

     prev parent reply	other threads:[~2016-07-10  0:45 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  9:44 RFC: Prototype for a user threading library in Ada Hadrien Grasland
2016-06-17 16:18 ` Niklas Holsti
2016-06-17 16:46   ` Dmitry A. Kazakov
2016-06-18  8:16     ` Hadrien Grasland
2016-06-18  8:47       ` Dmitry A. Kazakov
2016-06-18  9:17         ` Hadrien Grasland
2016-06-18 11:53           ` Dmitry A. Kazakov
2016-06-20  8:23             ` Hadrien Grasland
2016-06-20  9:22               ` Dmitry A. Kazakov
2016-06-23  1:42       ` Randy Brukardt
2016-06-23  8:39         ` Dmitry A. Kazakov
2016-06-23 22:12           ` Randy Brukardt
2016-06-24  7:34             ` Dmitry A. Kazakov
2016-06-24 23:00               ` Randy Brukardt
2016-06-25  7:11                 ` Dmitry A. Kazakov
2016-06-26  2:02                   ` rieachus
2016-06-26  6:26                     ` Dmitry A. Kazakov
2016-06-24  0:38           ` rieachus
2016-06-25  6:28             ` Dmitry A. Kazakov
2016-06-26  1:34               ` rieachus
2016-06-26  3:21               ` Randy Brukardt
2016-06-26  6:15                 ` Dmitry A. Kazakov
2016-06-28 20:44                   ` Anh Vo
2016-07-02  4:13                   ` Randy Brukardt
2016-07-02 10:25                     ` Dmitry A. Kazakov
2016-07-05 21:53                       ` Randy Brukardt
2016-07-06  9:25                         ` Dmitry A. Kazakov
2016-07-07  0:32                           ` Randy Brukardt
2016-07-07  6:08                             ` Niklas Holsti
2016-07-08  0:03                               ` Randy Brukardt
2016-07-08  7:32                                 ` Dmitry A. Kazakov
2016-07-11 19:40                                   ` Randy Brukardt
2016-07-12  8:37                                     ` Dmitry A. Kazakov
2016-07-12 21:31                                       ` Randy Brukardt
2016-07-08 20:17                                 ` Niklas Holsti
2016-06-24 21:06         ` Hadrien Grasland
2016-06-26  3:09           ` Randy Brukardt
2016-06-26  6:41             ` Dmitry A. Kazakov
2016-07-02  4:21               ` Randy Brukardt
2016-07-02 10:33                 ` Dmitry A. Kazakov
2016-07-05 21:24                   ` Randy Brukardt
2016-07-06 13:46                     ` Dmitry A. Kazakov
2016-07-07  1:00                       ` Randy Brukardt
2016-07-07 14:23                         ` Dmitry A. Kazakov
2016-07-07 23:43                           ` Randy Brukardt
2016-07-08  8:23                             ` Dmitry A. Kazakov
2016-07-11 19:44                               ` Randy Brukardt
2016-06-26  9:09             ` Hadrien Grasland
2016-07-02  4:36               ` Randy Brukardt
2016-07-02  5:30                 ` Simon Wright
2016-07-05 21:29                   ` Randy Brukardt
2016-07-02 11:13                 ` Hadrien Grasland
2016-07-02 13:18                   ` Dmitry A. Kazakov
2016-07-02 16:49                     ` Hadrien Grasland
2016-07-02 21:33                       ` Niklas Holsti
2016-07-03 20:56                         ` Hadrien Grasland
2016-07-02 17:26                   ` Niklas Holsti
2016-07-02 21:14                   ` Niklas Holsti
2016-07-03  7:42                     ` Hadrien Grasland
2016-07-03  8:39                       ` Dmitry A. Kazakov
2016-07-03 21:15                         ` Hadrien Grasland
2016-07-04  7:44                           ` Dmitry A. Kazakov
2016-07-05 21:38                   ` Randy Brukardt
2016-06-21  2:40     ` rieachus
2016-06-21  7:34       ` Dmitry A. Kazakov
2016-06-18  7:56   ` Hadrien Grasland
2016-06-18  8:33 ` Hadrien Grasland
2016-06-18 11:38 ` Hadrien Grasland
2016-06-18 13:17   ` Niklas Holsti
2016-06-18 16:27   ` Jeffrey R. Carter
2016-06-20  8:42 ` Hadrien Grasland
2016-07-10  0:45 ` rieachus [this message]

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox