From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,1a44c40a66c293f3 X-Google-Thread: 1089ad,7e78f469a06e6516 X-Google-Attributes: gid103376,gid1089ad,public X-Google-Language: ENGLISH,ASCII Path: g2news2.google.com!news1.google.com!news.germany.com!feeder.news-service.com!216.196.110.148.MISMATCH!border1.nntp.ams.giganews.com!nntp.giganews.com!news-in.ntli.net!newsrout1-win.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe2-win.ntli.net.POSTED!53ab2750!not-for-mail From: "Dr. Adrian Wrigley" Subject: Re: Embedded languages based on early Ada (from "Re: Preferred OS, processor family for running embedded Ada?") User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.) Message-Id: Newsgroups: comp.lang.ada,comp.lang.vhdl References: <113ls6wugt43q$.cwaeexcj166j$.dlg@40tude.net> <1i3drcyut9aaw.isde6utlv6iq.dlg@40tude.net> <1j0a3kevqhqal.riuhe88py2tq$.dlg@40tude.net> <45E9B032.60502@obry.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Date: Sat, 03 Mar 2007 21:28:23 GMT NNTP-Posting-Host: 82.21.99.109 X-Trace: newsfe2-win.ntli.net 1172957303 82.21.99.109 (Sat, 03 Mar 2007 21:28:23 GMT) NNTP-Posting-Date: Sat, 03 Mar 2007 21:28:23 GMT Organization: NTL Xref: g2news2.google.com comp.lang.ada:9663 comp.lang.vhdl:7627 Date: 2007-03-03T21:28:23+00:00 List-Id: On Sat, 03 Mar 2007 18:28:18 +0100, Pascal Obry wrote: > Dr. Adrian Wrigley a �crit : >> Numerous algorithms in simulation are "embarrassingly parallel", >> but this fact is completely and deliberately obscured from compilers. > > Not a big problem. If the algorithms are "embarrassingly parallel" then > the jobs are fully independent. In this case that is quite simple, They aren't independent in terms of cache use! They may also have common subexpressions, which independent treatments re-evalutates. > create as many tasks as you have of processors. No big deal. Each task > will compute a specific job. Ada has no problem with "embarrassingly > parallel" jobs. A problem is it that it breaks the memory bandwidth budget. This approach is tricky with large numbers of processors. And even more challenging with hardware synthesis. > What I have not yet understood is that people are trying to solve, in > all cases, the parallelism at the lowest lever. Trying to parallelize an > algorithm in an "embarrassingly parallel" context is loosing precious > time. You need to parallelise at the lowest level to take advantage of hardware synthesis. For normal threads a somewhat higher level is desirable. For multiple systems on a network, a high level is needed. What I want in a language is the ability to specify when things must be evaluated sequentially, and when it doesn't matter (even if the result of changing the order may differ). > Many real case simulations have billions of those algorithm to > compute on multiple data, just create a set of task to compute in > parallel multiple of those algorithm. Easier and as effective. Reasonable for compilers and processors as they are designed now. Even so it can be challenging to take advantage of shared calculations and memory capacity and bandwidth limitations. But useless for hardware synthesis. Or automated partitioning software. Or generating system diagrams from code. Manual partitioning into tasks and sequential code segments is something which is not part of the problem domain, but part of the solution domain. It implies a multiplicity of sequentially executing process threads. Using concurrent statements in the source code is not the same thing as "trying to parallelise an algorithm". It doesn't lose any prescious execution time. It simply informs the reader and the compiler that the order of certain actions isn't considered relevant. The compiler can takes some parts of the source and convert to a netlist for an ASIC or FPGA. Other parts could be broken down into threads. Or maybe parts could be passed to separate computer systems on a network. Much of it could be ignored. It is the compiler which tries to parallelise the execution. Unlike tasks, where the programmer does try to parallelise. Whose job is it to parallise operations? Traditionally, programmers try to specify exactly what sequence of operations is to take place. And then the compiler does its best to shuffle things around (limited). And the CPU tries to overlap data fetch, calculation, address calculation by watching the instruction sequence for concurrency opportunities. Why do the work to force sequential operation if the compiler and hardware are desperately trying to infer concurrency? > In other words, what I'm saying is that in some cases ("embarrassingly > parallel" computation is one of them) it is easier to do n computations > in n tasks than n x (1 parallel computation in n tasks), and the overall > performance is better. This is definitely the case. And it helps explain why parallelisation is not a job for the programmer or the hardware designer, but for the synthesis tool, OS, processor, compiler or run-time. Forcing the programmer or hardware designer to hard-code a specific parallism type (threads), and a particular partitioning, while denying the expressiveness of a concurrent language will result in inferior flexibility and inability to map the problem onto certain types of solution. If all the parallelism your hardware has is a few threads then all you need to code for is tasks. If you want to be able to target FPGAs, million-thread CPUs, ASICs and loosely coupled processor networks, the Ada task model alone serves very poorly. Perhaps mapping execution of a program onto threads or other concurent structure is like mapping execution onto memory. It *is* possible to manage a processor with a small, fast memory, mapped at a fixed address range. You use special calls to move data to and from your main store, based on your own analysis of how the memory access patterns will operate. But this approach has given way to automated caches with dynamic mapping of memory cells to addresses. And virtual memory. Trying to manage tasks "manually", based on your hunches about task coherence and work load will surely give way to automatic thread inference creation and management based on the interaction of thread management hardware and OS support. Building in hunches about tasking to achieve parallelism can only be a short-term solution. -- Adrian