Re: Ada lacks lighterweight-than-task parallelism

comp.lang.ada
 help / color / mirror / Atom feed

From: Brian Drummond <brian@shapes.demon.co.uk>
Subject: Re: Ada lacks lighterweight-than-task parallelism
Date: Wed, 20 Jun 2018 12:28:22 -0000 (UTC)
Date: 2018-06-20T12:28:22+00:00	[thread overview]
Message-ID: <pgdh96$92k$1@dont-email.me> (raw)
In-Reply-To: e72534b1-17a7-40b5-92b9-01a4695e2743@googlegroups.com

On Tue, 19 Jun 2018 15:14:16 -0700, Dan'l Miller wrote:

> http://www.theregister.co.uk/2018/06/18/microsoft_e2_edge_windows_10
> 
> As discussed in the article above, Microsoft is starting to unveil its
> formerly-secret development of what could be described as “Itanium done
> right“.

wait what? ... JAN GRAY? 

(in the Further Reading section) breadcrumbs to
https://arxiv.org/abs/1803.06617

"Design productivity is still a challenge for reconfigurable
computing. It is expensive to port workloads into gates and
to endure 10**2 to 10**4 second bitstream rebuild design itera-
tions." 

( ... no kidding, but tolerable where it improves program execution times 
below 10**6 or 10**7 seconds)

so this is primarily work that emerged from the RC shadows, where for the 
past quarter century, people like JG have exploited parallelism not at 
the task level or even the "slice" level but at the gate level where that 
helps...

and where one of the chief difficulties has been the interface between 
that (unconstrained) level and the tightly constrained (single operation 
stream from the compiler, reverse engineered into OO superscalar within 
the CPU) level

and where some other efforts to smooth the way between parallelism 
domains are still ongoing...
https://www.extremetech.com/computing/269461-intel-shows-off-xeon-
scalable-gold-6138p-with-an-integrated-fpga
https://www.nextplatform.com/2018/05/24/a-peek-inside-that-intel-xeon-
fpga-hybrid-chip/
(I'm imagining on-chip PCIE links working like the old Transputer 
channels here, but streaming data to/from the bespoke hardware engine 
directly, much less overhead than I used to have, doing RC with external 
FPGA boards)

... if there are Ada dimensions here, one might be compiling Ada directly 
to hardware...

https://www.cs.york.ac.uk/ftpdir/papers/rtspapers/R:Ward:2001.ps

...which paper is only slightly weakened by the fact that his published 
example procedure is also synthesisable VHDL! in fact Xilinx XST 
synthesises that example to run in a single clock cycle ...

 ... however, at an appallingly slow clock ...

vs 732 cycles for the paper's result and an estimated 44,000 for an 80486.

Thus the Ward paper's true merit is, ironically, that it allows automatic 
extraction of a degree of sequentialism from an inherently parallel 
example; opening the way to automatic generation of faster (and maybe 
smaller) pipelined dataflow hardware. 

Which is actually quite difficult, and historically an extensively manual 
process in past RC processes - another bottleneck in addition to the 
"bitstream rebuild" times JG complains about.

so why stop at the "slice" level as EDGE does?

it makes sense if there is automatic translation (compilation at usably 
fast rates) from source to that level, AND if a large and sufficiently 
generally useful structure of slices can be implemented in ASIC without 
the time and area penalty of FPGA routing. 

One way of looking at it is to see a "slice" as a higher-level or larger 
grained FPGA LUT (which is a generalisation of one or several gates).

FPGAs have been becoming coarser grained anyway, as well as adding RAM 
Blocks and (first multiplier, then DSP primitive) blocks by the hundreds 
- because fewer more powerful primitive blocks reduce that routing (area 
+ speed) penalty.

 A few dozen BlockRams for example, configured the right way, open the 
doors to allowing stack architectures to go supercalar (eliminating huge 
problems addressing registers), though I don't know if this has ever been 
exploited.

interesting development... perhaps a logical growth from JG's involvement 
with the ultra fine grained XC6200 FPGA, where RC pretty much started.
-- Brian

next prev parent reply	other threads:[~2018-06-20 12:28 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-19 22:14 Ada lacks lighterweight-than-task parallelism Dan'l Miller
2018-06-19 22:23 ` Dan'l Miller
2018-06-20  0:03 ` Dan'l Miller
2018-06-20  0:41 ` Lucretia
2018-06-20  1:36   ` Dan'l Miller
2018-06-20 13:39     ` Luke A. Guest
2018-06-20  1:12 ` Shark8
2018-06-20  1:41   ` Dan'l Miller
2018-06-20  7:13     ` Dmitry A. Kazakov
2018-06-20 12:03       ` Dan'l Miller
2018-06-20 12:29         ` Dmitry A. Kazakov
2018-06-20 13:14           ` Mehdi Saada
2018-06-20 13:38             ` Dmitry A. Kazakov
2018-06-20 14:01               ` Mehdi Saada
2018-06-20 14:32                 ` Dmitry A. Kazakov
2018-06-29 22:01                   ` Randy Brukardt
2018-06-29 22:15                     ` Dmitry A. Kazakov
2018-06-29 22:47                       ` Randy Brukardt
2018-06-30  8:41                         ` Dmitry A. Kazakov
2018-06-30 15:43                           ` Brad Moore
2018-07-01  9:46                             ` Dmitry A. Kazakov
2018-07-02 13:13                               ` Marius Amado-Alves
2018-07-02 15:05                                 ` Dmitry A. Kazakov
2018-07-02 16:01                                   ` Marius Amado-Alves
2018-07-02 16:48                                     ` Dmitry A. Kazakov
2018-06-20 15:58                 ` Niklas Holsti
2018-06-29 21:58                 ` Randy Brukardt
2018-06-21  0:19               ` Shark8
2018-06-21  9:09                 ` Dmitry A. Kazakov
2018-06-21 14:42                   ` Shark8
2018-06-21 15:55                     ` Dan'l Miller
2018-06-27 11:49                       ` Marius Amado-Alves
2018-06-21 16:06                     ` Dmitry A. Kazakov
2018-06-22 17:06                       ` Shark8
2018-06-22 18:53                         ` Dmitry A. Kazakov
2018-06-21  0:17         ` Shark8
2018-06-20 12:28 ` Brian Drummond [this message]
2018-06-21  1:51   ` Dan'l Miller
2018-06-21 10:22     ` Brian Drummond

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox