From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.66.83.230 with SMTP id t6mr11783308pay.37.1468111536957; Sat, 09 Jul 2016 17:45:36 -0700 (PDT) X-Received: by 10.157.14.227 with SMTP id 90mr328343otj.0.1468111536895; Sat, 09 Jul 2016 17:45:36 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!r1no14966070ige.0!news-out.google.com!d68ni2098ith.0!nntp.google.com!r1no14966064ige.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sat, 9 Jul 2016 17:45:36 -0700 (PDT) In-Reply-To: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2601:191:8202:8510:5985:2c17:9409:aa9c; posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3 NNTP-Posting-Host: 2601:191:8202:8510:5985:2c17:9409:aa9c References: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <3cdf2098-adc8-48e7-959a-2a51743c50ed@googlegroups.com> Subject: Re: RFC: Prototype for a user threading library in Ada From: rieachus@comcast.net Injection-Date: Sun, 10 Jul 2016 00:45:36 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:31048 Date: 2016-07-09T17:45:36-07:00 List-Id: On Friday, June 17, 2016 at 5:44:18 AM UTC-4, Hadrien Grasland wrote: > So, a while ago, after playing with the nice user-mode threading librarie= s that are available these days in C++, like Intel TBB and HPX, I thought i= t would be nice if Ada had something similar. I've been following this thread for a while, and I keep thinking that there= is a lot of barking up the wrong tree. But then again, I am used to think= ing about the size of problem that takes supercomputers. Ada is a decent f= it but the real issues are language independent--and as you may have grown = tired of seeing me say, the optimal program--in any language will finish wi= thin a few seconds of the estimate from only looking at moving data. Even = if you have thousands of CPU hours of number crunching to do, your program = looks like this: Distribute the program code to the correct number of nodes. Start the program. The nodes will collect the data from the data storage s= ystem, interact (with hopefully only adjacent nodes) to complete the comput= ation. Collect and assemble the finished data for presentation. The hard part of supercomputer programming today is not writing the code th= at does the work, it is distributing the program and data, and then assembl= ing the results. How to do this? Let's forget about Ada tasking for a moment. It may be gr= eat for tying together the (relatively small) number of CPUs at a node whic= h share memory, and in some cases even cache memory. What you need to star= t with assuming you wrote your program as a thousand or more distributed an= nex programs, is either to use tools that are part of the supercomputer to = distribute thousands of copies of the identical code to different nodes, or= to create your program with a hierarchical tree structure. Otherwise you = will spend all your time just getting the code distributed from a single no= de. The other end of the program presents the same issues. Imagine you want to= simulate the evolution of the universe over a space that grows to a cube a= million parsecs on a side and a period of a billion years. After each tim= e step you want to collect the data for that epoch for later viewing. Agai= n, if all the nodes are hammering on a single file, your program won't run = very fast. One solution is to store result data on a per node basis, then when the pro= gram is finished, run a program that converts from slices through time to 3= d images at a given point in time. This program may run several times as l= ong as your big number crunching program. But you can do it on your workst= ation over the weekend. ;-) Another, tricky but it works, is to skew time in the main program. This ca= n result in duplicate number crunching at the boundaries, but as I keep say= ing, that is not the problem. Now you feed the data into files, but you ha= ve several hundred files corresponding to different points in time, all ope= n and collecting data. Why post all this here? The Ada distributed annex is actually a good fit f= or the tools available on most supercomputers. Combining the distributed a= nnex with Ada tasking on the nodes provide by modern CPUs, is a good fit. = Unfortunately, or we just have to learn to live with it, supercomputers are= being taken over by GPUs. The CPUs are still there, but they end up deleg= ated to moving data between nodes, and doing whatever synchronization is ne= eded. Technically the tools are there to write code in high-level languages and r= un it on GPUs. But right now, you end up with lots of structural artifacts= in your program that make it non-portable. (The biggest of these is the n= umber of shader processors per GPU. AMD has done a nice thing in breaking = the shaders into groups of 64 on all their cards. Helps a lot when working= on a single machine, but... Besides, right now most supercomputers use nV= idia cards. This may change if AMD gets their GFLOPS/watt down to where nV= idia is.)