From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.31.133.79 with SMTP id h76mr10431524vkd.23.1500210704666; Sun, 16 Jul 2017 06:11:44 -0700 (PDT) X-Received: by 10.36.3.72 with SMTP id e69mr67792ite.1.1500210704489; Sun, 16 Jul 2017 06:11:44 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!v31no893764qtb.0!news-out.google.com!196ni6182itl.0!nntp.google.com!188no1186220itx.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sun, 16 Jul 2017 06:11:44 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2601:191:8303:2100:5985:2c17:9409:aa9c; posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3 NNTP-Posting-Host: 2601:191:8303:2100:5985:2c17:9409:aa9c References: User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Generators/coroutines in future Ada? From: Robert Eachus Injection-Date: Sun, 16 Jul 2017 13:11:44 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 4951 X-Received-Body-CRC: 1509861800 Xref: news.eternal-september.org comp.lang.ada:47427 Date: 2017-07-16T06:11:44-07:00 List-Id: On Wednesday, July 12, 2017 at 6:47:19 PM UTC-4, Shark8 wrote: > On Tuesday, July 11, 2017 at 11:35:19 PM UTC-6, Randy Brukardt wrote: > >=20 > > The coming problem with "classic" programming languages like Ada is tha= t=20 > > they don't map well to architectures with a lot of parallelism. The mor= e=20 > > constructs that require sequential execution, the worse that Ada progra= ms=20 > > will perform. >=20 =20 > That's also why I'm also a bit dubious about the proposed PARALLEL / AND = / END blocks: it seems to me that the mistake here is to delve too far into= the minutia (the above "parallel assembly" idea) so as to make it difficul= t or impossible to automatically optimize parallel code because of the more= low-level view imposed by the language... much like C's notion of arrays i= s fundamentally broken and undermines the benefits of C++. >=20 > Now, I admit I could very well be wrong here, but there's a gut-feeling t= hat this is not the route we want to go down. In the process of going down this rabbit hole from the back entrance... Consider a very lightweight task construct like: for parallel I in X'Range loop...end loop; Obviously anything declared within the loop needs no protection (unless the= re are nested tasks). Similarly, reads of external variables need no protec= tion as long as there are no updates during the existence of the loop. Can= these tasks call (heavyweight) tasks. Sure, as long the reverse is not pe= rmitted. Note that you don't want to do this unless it is a single call wit= h the correct answer. (This needs some thinking about. Maybe: for I in X'= Range loop until Solved... with the semantics that no new threads will be s= pawned once Solved is set to true.) What about protected objects? Fine, as = long as you realize that calling functions of a protected object are fine, = but calls to procedures and entries of a protected object from within a par= allel loop are likely to bring you down to single threaded speeds. Atomic = objects can be updated, but it seems best to have intrinsic operations whic= h map to RMW cycles which have hardware support. For example, a Max intrin= sic: procedure Max(X,Y) assigns Y to X if Y is greater than X, otherwise do= es nothing. Or a function Add(X,Y) that returns X+Y. Or even an Add_One(X= ) that returns true if X becomes zero if that is all you have. Again the direction I am coming at this from is how to create very lightwei= ght threads that can use all of those CPU cores or GPU shaders out there. = When computing an array, you depend on the cache coherency and write pipes = to insure that writes to a result array don't produce garbage. If this mea= ns that threads must be merged (by hand or compiler) so that a complete cac= he line is written by one lightweight thread so be it. For example, in mat= rix multiplication you might have to compute 4 or 8 entries in the result a= rray from one thread. Not a problem if it allows me to compute the result = using all the CPU cores or shaders available without spending 90% of cycles= in synchronization. Would doing all this "right" require a lot of compiler support? Sure. But= if that is what it takes, there will be (expensive) compilers for large ma= chines, and compilers which recognize the syntax and translate it into stan= dard Ada tasking.