From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: ff499,1739747548fda79c X-Google-Attributes: gidff499,public X-Google-Thread: 103376,1739747548fda79c X-Google-Attributes: gid103376,public From: "Alaric B. Williams" Subject: Any compilers for parallel platforms Date: 1996/04/29 Message-ID: <96-04-149@comp.compilers> X-Deja-AN: 152162376 sender: johnl@iecc.com references: <96-04-091@comp.compilers> <96-04-115@comp.compilers> organization: Compilers Central keywords: Ada, parallel newsgroups: comp.compilers,comp.lang.ada Date: 1996-04-29T00:00:00+00:00 List-Id: rmeenaks@bfm.com (Ram Meenakshisundaram) wrote: >Samir N. Muhammad (sam818@seas.gwu.edu) wrote: >: I am interested in Ada compilers for parallel machines. Parallel processing in compilers is an underexplored area IMHO - as so few multi-CPU systems exist to play with these days. For a start, there are a few kinds of multi-CPU systems. The lowest level is one CPU, multithreading. This is a common case, so I won't go into it much, beyond saying that it can be treated very similarly to the second case, but the gains in forking a lot are usually lost. The first is more than one CPU with the same address space. This can be used to implement threads, with an equal distribution of the threads between CPUs, but most languages need threading to be explicitly managed with fork(), PAR, etc. Languages that offer extensions to conventional loop constructs, ie rather than "For each number from 1 to 10, complete the code 'Process Element X'", we have "Process Elements 1..10". Cs loop system cannot reliably be parallelised, since the code in the loop may depend upon the last pass's results, and the loop variable/ending condition can be modified at will. The next level are things like systolic arrays, where many many CPUs with independent address spaces hook up through some kind of network. The kind of code that can take advantage of this is code written as a load of subproblems which communicate via some rigid mechanism, often message passing, which can be run over the network. This usually requires extreme modification of conventional imperative code! The above mentioned languages with cool loops can sometimes take advantage of this with non-communicating subprocesses, but they must draw the dividing line between threading on chip and using extra CPUs, as setting up a CPU can be very time consuming (boot it up, pass it code+data, run, get back data). The next level are systems where loads of individual computers, with differing CPUs and hardware, link up together. In general, it is nearly impossible to split tasks over such a system, as the time consuming interfaces between platforms (code must be compiled for each CPU involved before execution, for example) make it too hard to manage. This only works well for systems where individual tasks go to each node, and that node controls the entire task. OK, now for some observations. First of all, all of this can be emulated by the next level down. Multiple CPUs on an address space can be emulated by one CPU threading preemptively. Multiple CPUs with restricted IPC can be emulated by a single CPU or a cluster. Multiple computers can be emulated by a CPU array. In the extreme, a network containing Crays, Z80s, PCs, etc. can all be emulated by one PC (forgetting the processor-type boundaries, for a start!). In my opinion, the best system at the moment would be a hierachical one, where a new task (application) is allocated to a computer, where it is compiled for the local CPU's instruction set. Tasks can communicate with a rich IPC system. The task itself is divided into Sections, which execute in parallel and can communicate with message passing. Sections are divided into Threads, which execute orthoganally (in parallel). Threads are divided into Streams, which use cooperative multitasking. Streams should be on the same CPU to take advantage of the trust implicit in cooperative tasking. A single computer would, of course, always allocate tasks to itself, and allow them to IPC locally just as if they were on other systems. If there are no systolic arrays on this computer, Sections are all just preemptively multitasked. Threads are allocated to CPUs on an address space, and preemptively threaded if there are more threads than CPUs. Perhaps they all share one. Threads can share each other's memory implicitly. And Streams happily sit switching between themselves, sharing memory and processor time as they see fit. As far as I know, no system supports this :-) Oh, yes; 'loops' that are allowed to execute in parallel are, automatically, turned into seperate Sections or Threads if they seem large enough, depending on system-specific cost/time heuristics. Phew. ABW -- Alaric B. Williams Internet : alaric@abwillms.demon.co.uk http://www.hardcafe.co.uk/Alaric/ -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.