From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: ff499,1739747548fda79c
X-Google-Attributes: gidff499,public
X-Google-Thread: 103376,1739747548fda79c
X-Google-Attributes: gid103376,public
From: "Alaric B. Williams" <alaric@abwillms.demon.co.uk>
Subject: Any compilers for parallel platforms
Date: 1996/04/29
Message-ID: <96-04-149@comp.compilers>
X-Deja-AN: 152162376
sender: johnl@iecc.com
references: <96-04-091@comp.compilers> <96-04-115@comp.compilers>
organization: Compilers Central
keywords: Ada, parallel
newsgroups: comp.compilers,comp.lang.ada
Date: 1996-04-29T00:00:00+00:00
List-Id: <comp.lang.ada>


rmeenaks@bfm.com (Ram Meenakshisundaram) wrote:

>Samir N. Muhammad (sam818@seas.gwu.edu) wrote:
>: I am interested in Ada compilers for parallel machines.

Parallel processing in compilers is an underexplored area IMHO - as so
few multi-CPU systems exist to play with these days. For a start,
there are a few kinds of multi-CPU systems.

The lowest level is one CPU, multithreading. This is a common case, so
I won't go into it much, beyond saying that it can be treated very
similarly to the second case, but the gains in forking a lot are
usually lost.

The first is more than one CPU with the same address space. This can
be used to implement threads, with an equal distribution of the
threads between CPUs, but most languages need threading to be
explicitly managed with fork(), PAR, etc. Languages that offer
extensions to conventional loop constructs, ie rather than "For each
number from 1 to 10, complete the code 'Process Element X'", we have
"Process Elements 1..10". Cs loop system cannot reliably be
parallelised, since the code in the loop may depend upon the last
pass's results, and the loop variable/ending condition can be modified
at will.

The next level are things like systolic arrays, where many many CPUs
with independent address spaces hook up through some kind of network.
The kind of code that can take advantage of this is code written as a
load of subproblems which communicate via some rigid mechanism, often
message passing, which can be run over the network. This usually
requires extreme modification of conventional imperative code! The
above mentioned languages with cool loops can sometimes take advantage
of this with non-communicating subprocesses, but they must draw the
dividing line between threading on chip and using extra CPUs, as
setting up a CPU can be very time consuming (boot it up, pass it
code+data, run, get back data).

The next level are systems where loads of individual computers, with
differing CPUs and hardware, link up together. In general, it is
nearly impossible to split tasks over such a system, as the time
consuming interfaces between platforms (code must be compiled for each
CPU involved before execution, for example) make it too hard to
manage. This only works well for systems where individual tasks go to
each node, and that node controls the entire task.

OK, now for some observations.

First of all, all of this can be emulated by the next level down.
Multiple CPUs on an address space can be emulated by one CPU threading
preemptively. Multiple CPUs with restricted IPC can be emulated by a
single CPU or a cluster. Multiple computers can be emulated by a CPU
array. In the extreme, a network containing Crays, Z80s, PCs, etc. can
all be emulated by one PC (forgetting the processor-type boundaries,
for a start!).

In my opinion, the best system at the moment would be a hierachical
one, where a new task (application) is allocated to a computer, where
it is compiled for the local CPU's instruction set.

Tasks can communicate with a rich IPC system.

The task itself is divided into Sections, which execute in parallel
and can communicate with message passing.

Sections are divided into Threads, which execute orthoganally (in
parallel).

Threads are divided into Streams, which use cooperative multitasking.
Streams should be on the same CPU to take advantage of the trust
implicit in cooperative tasking.

A single computer would, of course, always allocate tasks to itself,
and allow them to IPC locally just as if they were on other systems.

If there are no systolic arrays on this computer, Sections are all
just preemptively multitasked. Threads are allocated to CPUs on an
address space, and preemptively threaded if there are more threads
than CPUs. Perhaps they all share one. Threads can share each other's
memory implicitly.

And Streams happily sit switching between themselves, sharing memory
and processor time as they see fit.

As far as I know, no system supports this :-)

Oh, yes; 'loops' that are allowed to execute in parallel are,
automatically, turned into seperate Sections or Threads if they seem
large enough, depending on system-specific cost/time heuristics.

Phew.


ABW
--
Alaric B. Williams Internet : alaric@abwillms.demon.co.uk
<A HREF="http://www.hardcafe.co.uk/Alaric/">http://www.hardcafe.co.uk/Alaric/</A>


--
Send compilers articles to compilers@iecc.com,
meta-mail to compilers-request@iecc.com.