Re: Loops and parallel execution

comp.lang.ada
 help / color / mirror / Atom feed

From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Loops and parallel execution
Date: Fri, 28 Jan 2011 18:23:13 -0600
Date: 2011-01-28T18:23:13-06:00	[thread overview]
Message-ID: <ihvmll$6tr$1@munin.nbi.dk> (raw)
In-Reply-To: ihsth1$igr$1@speranza.aioe.org

<tmoran@acm.org> wrote in message news:ihsth1$igr$1@speranza.aioe.org...
>> Finally, like Dmitry, I'm skeptical about fine-grained parallelism buying
>> much. Unless there is specific architectural support (something that 
>> doesn't
>> exist in commonly used processors -- and especially in commonly used 
>> target
>> OSes/RTOSes), the management overhead will kill any savings on "small"
>
>  What about the SIMD (vector) instructions in Intel CPUs?  Or is that
> better done by simply calling their optimized, CPU capability detecting,
> libraries?

That's a code generation problem; I don't believe that there is much if any 
value to the programmer cluttering their code with parallel operations in 
that point.

To expand on that a bit: code generation for a CISC machine is primarily a 
pattern matching problem. That is, the intermediate code is a list of very 
simple pseudo instructions, and the code generator needs to map those to 
more complex machine instructions (along with simple ones when the pattern 
matching fails). Matching SIMD instructions is a more complex problem than 
the simple matcher used in Janus/Ada (to take the example I'm most familar 
with), but it is fundementally the same problem. In this case, I would 
probably apply a loop unrolling optimization, then a series of pattern 
matching operations to create the SIMD instructions.

We already do something like this for aggregates in Janus/Ada. An aggregate 
assignment like:

                My_Str :=  (others => Ch)

can get turned into the Intel STOSB (I think that's the right opcode) 
instruction (plus a bit of setup code); which is a lot simpler than the loop 
that would be otherwise generated.

In either case, you'll automatically get the benefit of the advanced 
instructions when they can be used, and no code changes are needed. Of 
course, if your code doesn't match the pattern, the advanced instructions 
wouldn't be used, but it's unlikely that adding a "parallel" direction to 
the loop would somehow change that.

I'd be surprised if GCC doesn't already do something like this. (This 
particular problem hasn't been on my radar, in part because I didn't even 
have a machine that supported most of those instructions until last year.)

                                         Randy.

next prev parent reply	other threads:[~2011-01-29  0:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-25 15:40 Loops and parallel execution Georg Bauhaus
2011-01-25 16:37 ` Dmitry A. Kazakov
2011-01-25 17:36   ` Georg Bauhaus
2011-01-25 17:38     ` Georg Bauhaus
2011-01-25 21:32     ` Dmitry A. Kazakov
2011-01-25 22:07       ` Georg Bauhaus
2011-01-26  1:31         ` Yannick Duchêne (Hibou57)
2011-01-26  9:04         ` Dmitry A. Kazakov
2011-01-26  1:06       ` Yannick Duchêne (Hibou57)
2011-01-26 10:08         ` Dmitry A. Kazakov
2011-01-31 13:01         ` Paul Colin Gloster
2011-02-06 20:06           ` Yannick Duchêne (Hibou57)
2011-02-07 11:43             ` Nicholas Paul Collin Gloster
2011-01-26  8:46 ` Egil Høvik
2011-01-26 10:47   ` Georg Bauhaus
2011-02-14 23:27     ` Tuck
2011-02-15 21:10       ` Georg Bauhaus
2011-01-26 11:29 ` Peter C. Chapin
2011-01-26 21:57 ` Randy Brukardt
2011-01-27 23:01   ` tmoran
2011-01-29  0:23     ` Randy Brukardt [this message]
2011-02-06 20:10       ` Yannick Duchêne (Hibou57)

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox