Ada and vectorization

comp.lang.ada
 help / color / mirror / Atom feed

* Ada and vectorization
@ 2002-06-16  9:56 Guillaume Foliard
  2002-06-16 12:50 ` Dale Stanbrough
  2002-06-17 23:47 ` Robert I. Eachus
  0 siblings, 2 replies; 15+ messages in thread
From: Guillaume Foliard @ 2002-06-16  9:56 UTC (permalink / raw)


Hello,

I start to learn how to use the Intel's SSE instruction set in Ada programs 
with inline assembly. And while reading Intel documentation (1) I was 
asking myself if Ada could provide a clean way of vectorization through its 
strong-typed approach. Could it be sensible, for the next Ada revision, to 
create some new attributes for array types to explicitly hint the compiler 
that we want to use SIMD instructions ?
Language lawyers comments are definitly welcome. As SIMD in modern general 
purpose processors is largely available nowadays (SSE, SSE2, Altivec, 
etc...), IMHO, it would be a mistake for Ada to ignore the performance 
benefit this could bring.

(1)
http://www.intel.com/software/products/college/ia32/strmsimd/814down.htm

Guillaume Foliard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16  9:56 Ada and vectorization Guillaume Foliard
@ 2002-06-16 12:50 ` Dale Stanbrough
  2002-06-16 20:07   ` Matthias Kretschmer
  2002-06-16 22:45   ` Ted Dennison
  2002-06-17 23:47 ` Robert I. Eachus
  1 sibling, 2 replies; 15+ messages in thread
From: Dale Stanbrough @ 2002-06-16 12:50 UTC (permalink / raw)


Guillaume Foliard wrote:

> Hello,
> 
> I start to learn how to use the Intel's SSE instruction set in Ada programs 
> with inline assembly. And while reading Intel documentation (1) I was 
> asking myself if Ada could provide a clean way of vectorization through its 
> strong-typed approach. Could it be sensible, for the next Ada revision, to 
> create some new attributes for array types to explicitly hint the compiler 
> that we want to use SIMD instructions ?
> Language lawyers comments are definitly welcome. As SIMD in modern general 
> purpose processors is largely available nowadays (SSE, SSE2, Altivec, 
> etc...), IMHO, it would be a mistake for Ada to ignore the performance 
> benefit this could bring.


I think the best way to do this is via pragmas. There is one pragma - 
Annotate - which would be perfect for the job.
I think Annotate is a Gnat only thing - the real work would have
to be done with an ASIS like tool.

Very much like the fortran world, where the structured comments can 
be ignored by ignorant compilers, and the program still behaves
correctly (if not as fast).

Dale



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16 12:50 ` Dale Stanbrough
@ 2002-06-16 20:07   ` Matthias Kretschmer
  2002-06-16 22:38     ` Robert A Duff
  2002-06-16 22:45   ` Ted Dennison
  1 sibling, 1 reply; 15+ messages in thread
From: Matthias Kretschmer @ 2002-06-16 20:07 UTC (permalink / raw)

I think, that this job should be done by the compiler itself, without 
changing the language:
a) I do not want to specify everywhere which feature to use
b) vectorisation could be useful in many places, don't we want to use it 
anywhere, we get speed gain from the vector unit?
c) there are good examples that there is no need - think of x86 architecture 
and intel's c compiler, it uses mmx/sse/sse2 if one let's him to use, 
everywhere it makes sense and the performance gain is for clean written 
programs high
d) why binding this to variables - in one place it would be usefull to use 
vectorisation for this part of code, in the other place is it not, with the 
same variables - so explicitly using it just for one bunch of array 
variables could be very inefficient comparing to another approach - why not 
grouping together variables used in records (e.g. someone using for 
3d-representation a record with x,y and z kartesian coordinates)? Ok one 
could enhance this feature, to record constructs, but why?
e) on many architectures the vector unit is just a coprocessor, so fpu could 
calculate one part and vu the other one, I think we want to let the 
optimizer decide how to use both units to get the best performance - so why 
we don't let the optimizer decide when to use vu, too?

GNAT uses as backend the GNU Compiler Suite - in 3.1 it is implemented - so 
using the same code generator, which is capable of using mmx/sse/3dnow or 
something now in some way (do not ask me how much - I just know, that 
povray is far faster after compiling it with the intel c compiler ...).

Dale Stanbrough wrote:

> Guillaume Foliard wrote:
> 
>> Hello,
>> 
>> I start to learn how to use the Intel's SSE instruction set in Ada
>> programs with inline assembly. And while reading Intel documentation (1)
>> I was asking myself if Ada could provide a clean way of vectorization
>> through its strong-typed approach. Could it be sensible, for the next Ada
>> revision, to create some new attributes for array types to explicitly
>> hint the compiler that we want to use SIMD instructions ?
>> Language lawyers comments are definitly welcome. As SIMD in modern
>> general purpose processors is largely available nowadays (SSE, SSE2,
>> Altivec, etc...), IMHO, it would be a mistake for Ada to ignore the
>> performance benefit this could bring.
> 
> 
> I think the best way to do this is via pragmas. There is one pragma -
> Annotate - which would be perfect for the job.
> I think Annotate is a Gnat only thing - the real work would have
> to be done with an ASIS like tool.
> 
> Very much like the fortran world, where the structured comments can
> be ignored by ignorant compilers, and the program still behaves
> correctly (if not as fast).
> 
> Dale

-- 
Greetings
Matthias Kretschmer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16 20:07   ` Matthias Kretschmer
@ 2002-06-16 22:38     ` Robert A Duff
  2002-06-18  8:24       ` Matthias Kretschmer
  0 siblings, 1 reply; 15+ messages in thread
From: Robert A Duff @ 2002-06-16 22:38 UTC (permalink / raw)


Various early versions of the Ada 9X proposals had some explicit support
for vectorizing and the like.  I don't remember the details.  You could
look up the early versions if you're interested.

These were removed, not because there was anything wrong with them
technically in and of themselves, but because there was a general
feeling amongst reviewers (especially compiler writers) that there were
too many new features.

- Bob



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16 22:38     ` Robert A Duff
@ 2002-06-18  8:24       ` Matthias Kretschmer
  2002-06-18 10:02         ` Dale Stanbrough
  0 siblings, 1 reply; 15+ messages in thread
From: Matthias Kretschmer @ 2002-06-18  8:24 UTC (permalink / raw)

Robert A Duff wrote:

> Various early versions of the Ada 9X proposals had some explicit support
> for vectorizing and the like.  I don't remember the details.  You could
> look up the early versions if you're interested.
> 
> These were removed, not because there was anything wrong with them
> technically in and of themselves, but because there was a general
> feeling amongst reviewers (especially compiler writers) that there were
> too many new features.
> 
> - Bob

Oh didn't wanted to say, this is wrong, but I think there is a better 
solution - I do not want to care about, where to use or not use this or 
that feature of an architecture or of architectures, maybe they are 
changing, what to do next, write some new pragmas, to let the compiler 
optimize for the new architectures better? I think that today compilers - 
optimizers of compilers - are capable of finding places where to use vector 
units or what how ever so cool feature of you cpu, so I want them to decide 
and let me alone with the real important stuff. I do not want to know how 
many clock cycles A takes in unit B of cpu C. Think of what you have to 
know just to get something done, which other compilers do for their own 
without bothering the programmer.

The logic to decide when to use or not to use vectorization of instructions 
is out there, just someone should implement this in an Ada compiler. No 
need to change the language itself. So I do not know how good gcc3.1 is in 
vectorization of instructions, but there are some good examples how to do 
this right (so I don't know of Ada compiler :( ): Sun's C Compiler, Intel's 
C Compiler, both if someone wants them to use, do a lot of vectorization 
with clean written code.

-- 
Greetings
Matthias Kretschmer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18  8:24       ` Matthias Kretschmer
@ 2002-06-18 10:02         ` Dale Stanbrough
  2002-06-18 16:21           ` Matthias Kretschmer
  2002-06-18 17:46           ` Ted Dennison
  0 siblings, 2 replies; 15+ messages in thread
From: Dale Stanbrough @ 2002-06-18 10:02 UTC (permalink / raw)

In article <aemqnr$grq$07$1@news.t-online.com>,
 Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> wrote:

> I think that today compilers - 
> optimizers of compilers - are capable of finding places where to use vector 
> units or what how ever so cool feature of you cpu, so I want them to decide 
> and let me alone with the real important stuff. I do not want to know how 
> many clock cycles A takes in unit B of cpu C. Think of what you have to 
> know just to get something done, which other compilers do for their own 
> without bothering the programmer.

It would be nice if we let the compiler discover all of the possible
vectorisations possible.  I've got no idea what the current state
of the art is in this respect, however I would imagine that it would
still be -cheaper- to build a simple compiler that took hints or 
directions from the programmer about possible vectorisation.

Does anyone have real info instead of my speculation?

Dale

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 10:02         ` Dale Stanbrough
@ 2002-06-18 16:21           ` Matthias Kretschmer
  2002-06-18 19:13             ` Robert A Duff
  2002-06-18 20:13             ` Guillaume Foliard
  2002-06-18 17:46           ` Ted Dennison
  1 sibling, 2 replies; 15+ messages in thread
From: Matthias Kretschmer @ 2002-06-18 16:21 UTC (permalink / raw)

Dale Stanbrough wrote:

> In article <aemqnr$grq$07$1@news.t-online.com>,
>  Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> wrote:
> 
>> I think that today compilers -
>> optimizers of compilers - are capable of finding places where to use
>> vector units or what how ever so cool feature of you cpu, so I want them
>> to decide and let me alone with the real important stuff. I do not want
>> to know how many clock cycles A takes in unit B of cpu C. Think of what
>> you have to know just to get something done, which other compilers do for
>> their own without bothering the programmer.
> 
> It would be nice if we let the compiler discover all of the possible
> vectorisations possible.  I've got no idea what the current state
> of the art is in this respect, however I would imagine that it would
> still be -cheaper- to build a simple compiler that took hints or
> directions from the programmer about possible vectorisation.

maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all those 
bells and whistles, all those wonderful facilities of your so-called 
powerful programming languages belong to the solution set rather than to 
the problem set?"

And this is the question we have to ask here I think, and I am quite sure, 
that vectorization hints belong to the problem set ...

and looking at the compiler design people, they are doing a great job, what 
today a compiler does is not compareable to the stuff possible (or 
available) twenty years ago - there are definatively nice compiler 
implementations available today using all those nice feature of your cpu - 
so not all of course, but I don't think it is a solution to move all the 
logic in the language, so the programmer has to care about, it just let's 
the programmer to do all over the stuff again (somehow inventing the wheel 
everytime he writes down code again).

The other advantage is, that old code can gain more performance without 
changing one line of code, just by using a newer version of a compiler or 
another compiler.

Having all features architectures provide accessable through a language I 
think should be called assembler and has nothing to do with abstraction of 
the programming of the underlying hardware. Do we really want to implement 
every single feature cpu-designers provide in the language itself? Then we 
will have some very bloated, complex language which will raise the 
difficult level of programming in this language. And this is not the aim of 
"higher programming languages". They should make it easy, or we all could 
just use assembler. The reason why I personally use Ada is, that it is 
abstract, not that I have to care about the hardware and I think this is 
the way it should be.

> 
> Does anyone have real info instead of my speculation?
> 
> Dale

-- 
Greetings
Matthias Kretschmer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 16:21           ` Matthias Kretschmer
@ 2002-06-18 19:13             ` Robert A Duff
  2002-06-18 20:12               ` Matthias Kretschmer
  2002-06-18 20:13             ` Guillaume Foliard
  1 sibling, 1 reply; 15+ messages in thread
From: Robert A Duff @ 2002-06-18 19:13 UTC (permalink / raw)

Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> writes:

> maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all those 
> bells and whistles, all those wonderful facilities of your so-called 
> powerful programming languages belong to the solution set rather than to 
> the problem set?"

Buggy optimizers are part of my problem set, too.

You're probably right in this case, but surely in *some* cases, it is
appropriate to let the programmer give the compiler hints about how to
optimize.  The compiler is still doing the error-prone part (deciding
whether the optimization is correct, and actually performing the
transformation).  The programmer is merely suggesting that the
optimization is worthwhile.

- Bob

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 19:13             ` Robert A Duff
@ 2002-06-18 20:12               ` Matthias Kretschmer
  2002-06-18 20:51                 ` Guillaume Foliard
  0 siblings, 1 reply; 15+ messages in thread
From: Matthias Kretschmer @ 2002-06-18 20:12 UTC (permalink / raw)

Robert A Duff wrote:

> Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> writes:
> 
>> maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all
>> those bells and whistles, all those wonderful facilities of your
>> so-called powerful programming languages belong to the solution set
>> rather than to the problem set?"
> 
> Buggy optimizers are part of my problem set, too.

sure :) - but hopefully this won't happen - but this could happen with any 
piece of code of any language, if the optimizer/compiler is written poorly

> 
> You're probably right in this case, but surely in *some* cases, it is
> appropriate to let the programmer give the compiler hints about how to
> optimize.  The compiler is still doing the error-prone part (deciding
> whether the optimization is correct, and actually performing the
> transformation).  The programmer is merely suggesting that the
> optimization is worthwhile.
> 
> - Bob

Yeah, but the compiler should use it even the programmer isn't suggesting it 
- if you look at pragma inline (so I do not know how gnat which I currently 
only use handles this, but there are good examples that not only functions 
or procedures which are told to be inlined are inlined to gain speed with, 
e.g. Intel's C++ Compiler) - of course only if it is useful, which of 
course means more performance (or regulated the more speed <-> more size 
measurement about optimization flags or something).

As suggest in this thread using pragma for loops only isn't enough I think 
(so making it complicated I think - bloating the language up), because if 
you just think about something like:
  a := a1*a2;
  b := b1*b2;
  c := c1*c2;
  d := d1*d2
wouldn't be cool if it is vectorized? you may say, throw anything in an 
array and then put it in a loop, but can't it happen, that these a,b,c and 
d aren't related, so putting it together into one array wouldn't be very 
wise.

And these situations could be achived without having these statements put 
together somewhere in one procedure or something, think of these 
inter-procedure optimization features of compilers like sun's c compiler 
(sorry but I do not know much about available Ada compilers, so my examples 
are from other languages adapter, but they aren't depended on c itself) 
which are able to optimize code and vectorize it if useful even if some 
frictions of code are written in other procedures/functions - this of 
course includes some inlinening and the code is very useless if one wants 
to debug - but who really cares how the code gets faster if one needs 
speed? :)

Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend is 
capable of using the vector units of at least x86-based cpus as stated on 
gcc.gnu.org) which currently use vectorization and/or inter-procedure 
optimization?

-- 
Greetings
Matthias Kretschmer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 20:12               ` Matthias Kretschmer
@ 2002-06-18 20:51                 ` Guillaume Foliard
  2002-06-19  4:28                   ` Matthias Kretschmer
  0 siblings, 1 reply; 15+ messages in thread
From: Guillaume Foliard @ 2002-06-18 20:51 UTC (permalink / raw)

Matthias Kretschmer wrote:

> As suggest in this thread using pragma for loops only isn't enough I think
> (so making it complicated I think - bloating the language up), because if
> you just think about something like:
>   a := a1*a2;
>   b := b1*b2;
>   c := c1*c2;
>   d := d1*d2
> wouldn't be cool if it is vectorized? you may say, throw anything in an
> array and then put it in a loop, but can't it happen, that these a,b,c and
> d aren't related, so putting it together into one array wouldn't be very
> wise.

Even if there not related from a semantic point of view, they are from a 
computational point of view. For the sake of performance, if performance 
matters of course, why should not we layout data in a efficient manner ? 
This does not break data abstraction, just the layout.

> Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend
> is capable of using the vector units of at least x86-based cpus as stated
> on gcc.gnu.org) which currently use vectorization and/or inter-procedure
> optimization?

Just a precision here, GCC 3.1 does not vectorize, it just uses the vector 
unit in a scalar manner as a faster x87 FPU.
Have you got any links talking about "inter-procedure optimization" ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 20:51                 ` Guillaume Foliard
@ 2002-06-19  4:28                   ` Matthias Kretschmer
  0 siblings, 0 replies; 15+ messages in thread
From: Matthias Kretschmer @ 2002-06-19  4:28 UTC (permalink / raw)

Guillaume Foliard wrote:

> Matthias Kretschmer wrote:
> 
>> As suggest in this thread using pragma for loops only isn't enough I
>> think (so making it complicated I think - bloating the language up),
>> because if you just think about something like:
>>   a := a1*a2;
>>   b := b1*b2;
>>   c := c1*c2;
>>   d := d1*d2
>> wouldn't be cool if it is vectorized? you may say, throw anything in an
>> array and then put it in a loop, but can't it happen, that these a,b,c
>> and d aren't related, so putting it together into one array wouldn't be
>> very wise.
> 
> Even if there not related from a semantic point of view, they are from a
> computational point of view. For the sake of performance, if performance
> matters of course, why should not we layout data in a efficient manner ?
> This does not break data abstraction, just the layout.

Well I consider this as very ugly, I think - just looking at some compilers 
I don't feel unwise or stupid - that the compiler has to care about how to 
rearrange stuff, so it runs fast. Do we always want to read these nice 
optimization manuals for every new CPU that comes up? I do not want it - 
for C I can just wait till a new version of icc is out and magic the same 
code runs much faster on the new cpu (as it was with P4 and before with P3 
ans so on ...).

> 
>> Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend
>> is capable of using the vector units of at least x86-based cpus as stated
>> on gcc.gnu.org) which currently use vectorization and/or inter-procedure
>> optimization?
> 
> Just a precision here, GCC 3.1 does not vectorize, it just uses the vector
> unit in a scalar manner as a faster x87 FPU.
> Have you got any links talking about "inter-procedure optimization" ?

ah ok - then I got something wrong - but this is of course available in 
other compilers ...

for the last point just look at the icc documents - afaik it uses the same 
technique, didn't found useful abstract information about this stuff :( The 
Intel C Manual itself holds a short abstract what is done with these 
optimizations enable (btw. inter-procedure optimization is available across 
module borders - it would be nice to have it, too, for ada across package 
borders).

Btw. refering to you other post - I know that it isn't really trivial to 
transform a sequential program to a parallel one, but why should be going 
one abstraction level back be the right step, maybe if it is getting a 
problem for compiler design, we should even try to find some other 
solution, even if we loose Ada on the way...

On the other hand, the parallelization of code is done in cpus today - they 
are rearranging the code, so it can be executed in parallel, nothing else 
has to be done now in the compiler.

-- 
Greetings
Matthias Kretschmer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 16:21           ` Matthias Kretschmer
  2002-06-18 19:13             ` Robert A Duff
@ 2002-06-18 20:13             ` Guillaume Foliard
  1 sibling, 0 replies; 15+ messages in thread
From: Guillaume Foliard @ 2002-06-18 20:13 UTC (permalink / raw)

Matthias Kretschmer wrote:

> Having all features architectures provide accessable through a language I
> think should be called assembler and has nothing to do with abstraction of
> the programming of the underlying hardware. Do we really want to implement
> every single feature cpu-designers provide in the language itself? 

SIMD concepts does not seem a cpu-designer's feature to me, but rather a 
different approach to problems solving...

> Then we
> will have some very bloated, complex language which will raise the
> difficult level of programming in this language. And this is not the aim
> of "higher programming languages". They should make it easy, or we all
> could just use assembler. The reason why I personally use Ada is, that it
> is abstract, not that I have to care about the hardware and I think this
> is the way it should be.

I agree with you on this. That's why, I was initially wondering if we could 
find a way to abstract "vectorization processing". After having read the 
Intel document on Pentium4 optimizations I don't think compilers can 
automagically perform a really efficient vectorization as a true 
vectorization implies algorithms different from the ones used in a 
sequential processing, and most importantly a different data layout. Can 
moderm compiler technology deal with data layout, and automatically choose 
the best one (switch between Arrays of Structs and Structs of Array for 
instance, an operation called "Data Swizzling" (1)) ?  Okay, you may have 
some optimized loops but a sequential algorithm by its own nature may not 
be easely transformed to an efficient parallel one by a compiler.
If vectorization processing abstraction reveals to be infeasible, we should 
not be disgusted by introducing some low-level features anyway. After all, 
there is some quite low-level stuff in the B.2 annex of the Ada95 RM (2).  
Bitwise operations can really be helpful sometimes. So can be SIMD 
instructions.

(1) 
http://www.google.fr/search?hl=fr&q=Data+Swizzling&btnG=Recherche+Google&meta=

(2)
 http://www.adahome.com/rm95/rm9x-B-02.html#6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-18 10:02         ` Dale Stanbrough
  2002-06-18 16:21           ` Matthias Kretschmer
@ 2002-06-18 17:46           ` Ted Dennison
  1 sibling, 0 replies; 15+ messages in thread
From: Ted Dennison @ 2002-06-18 17:46 UTC (permalink / raw)


Dale Stanbrough <dstanbro@bigpond.net.au> wrote in message news:<dstanbro-16FC0C.20004918062002@news-server.bigpond.net.au>...
> It would be nice if we let the compiler discover all of the possible
> vectorisations possible.  I've got no idea what the current state
> of the art is in this respect, however I would imagine that it would
> still be -cheaper- to build a simple compiler that took hints or 
> directions from the programmer about possible vectorisation.
> 
> Does anyone have real info instead of my speculation?
> 

I took a graduate-level compiler optimzation course a couple of years
ago that dealt almost entirely with this. Apparently most reasearch
done into compiler optimizations of this sort is done using Fortran,
as most of the folks who need that kind of number-cruching power have
Fortran code they want it done with.

Fortran's solution to this issue was to use the "hint" approach by
introducing new loop constructs (and a new dialect - HPF) for this. So
clearly they think it isn't feasable for normal Fortran. Actually, I
think the real issue may be that some operations will give different
results when done in parallel, and there has to be a way of saying
that's OK (or not OK) for your particular app.

So I really think the best way to do this in Ada would be via a pragma
on the loop name. The main drawback here is that if you put such code
through a compiler that doesn't support the pragma, then you may end
up with an incorrect calculation (in addition to a slower one).



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16 12:50 ` Dale Stanbrough
  2002-06-16 20:07   ` Matthias Kretschmer
@ 2002-06-16 22:45   ` Ted Dennison
  1 sibling, 0 replies; 15+ messages in thread
From: Ted Dennison @ 2002-06-16 22:45 UTC (permalink / raw)

Dale Stanbrough wrote:
>>I start to learn how to use the Intel's SSE instruction set in Ada programs 
>>with inline assembly. And while reading Intel documentation (1) I was 
>>asking myself if Ada could provide a clean way of vectorization through its 
>>strong-typed approach. Could it be sensible, for the next Ada revision, to 

> I think the best way to do this is via pragmas. There is one pragma - 

When I was reading about HPF, I remember thinking that the parallel 
loops could be done just as easily in Ada with custom pragmas ("pragma 
parallel (Loopname);"). I also remember thinking that a lot of the 
optimization problems that we obsessed over in class (it was a compiler 
optimization class) would be much simpler in Ada.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ada and vectorization
  2002-06-16  9:56 Ada and vectorization Guillaume Foliard
  2002-06-16 12:50 ` Dale Stanbrough
@ 2002-06-17 23:47 ` Robert I. Eachus
  1 sibling, 0 replies; 15+ messages in thread
From: Robert I. Eachus @ 2002-06-17 23:47 UTC (permalink / raw)

Guillaume Foliard wrote:

 > I start to learn how to use the Intel's SSE instruction set in Ada
 > programs  with inline assembly. And while reading Intel
 > documentation (1) I was  asking myself if Ada could provide a clean
 >  way of vectorization through its  strong-typed approach. Could it
 > be sensible, for the next Ada revision, to  create some new
 > attributes for array types to explicitly hint the compiler  that we
 >  want to use SIMD instructions ? Language lawyers comments are
 > definitly welcome. As SIMD in modern general  purpose processors is
 >  largely available nowadays (SSE, SSE2, Altivec,  etc...), IMHO, it
 >  would be a mistake for Ada to ignore the performance  benefit this
 >  could bring.

Let me answer this with two different hats on.

First language lawyer:  You have to ask what restrictions imposed by the
language prevent the use of these features, then look at how to either
relax the restrictions or create language features which explicitly
bypass the restrictions.  This has been done in Ada.  For example:

Ada allows non-standard numeric types to allow for things like a
floating-point type with inaccurate divides, integer types that cannot
be used as array indicies, etc.

If you don't need accuracy, you can compile and execute programs with
the strict mode of the Numeric Annex turned off.  I am not quite that
crazy, but it could make sense for 3d display code. ;-)

See 11.6 Exceptions and Optimization (and that section can lead to a
real long thread...)

So if it takes something special to use an SIMD instruction set, the
language allows it.  In practice all of the existing interesting SIMD
extensions can be mapped to standard integer, boolean, float, etc. types.

Now from a practical point of view:  There are two problems with
designing language extensions to map to specific hardware.  The first is
that software and language lifetimes are much greater than hardware
lifetimes.  For example, you mention AMD's 3dNow!  As it happens, there
are three versions of 3dNow!  The original version in the K6/2, the
extended version in the original Athlons, and the version in the Athlon
XP (and Morgan Duron) chips that is a superset of Intel's SSE.

The Intel situation is a little clearer, but even there if you are doing
a decent (portable) programming job you have to deal with MMX only
chips, those with SSE, and those with SSE2.  It is much nicer to use the
right architecture switch and have the compiler produce efficient code
for your target architecture.  (If you are really doing a good job, you
will isolate all the SIMD dependent code into a few dlls, and have the
installer choose the correct version of each for the current hardware.

The second practical issue is much nastier. Two implementations of the 
same identical ISA can have very different performance behavior.  Worse, 
two otherwise exclusive features can have nasty interactions in an 
implementation.  Let me take a simple example, MMX and 3dNow!  Athlons 
allow integers and floating-point values to share architectural 
registers.  Due to the large floating-point register renaming files this 
is actually a nice feature.  But if you reset mode bits, the programmer 
usually cares which mode bits are used for which operations.  The 
solution is to generate an SFENCE instruction which insures that the 
view of hardware registers and memory is globally consistant, even for 
things which are otherwise weakly ordered. This instruction can have 
almost no latency--or require thousands of clock cycles in the worst 
cases.  (For example a write may cause a TLB miss, and the part of the 
memory table that needs to be read may not be in L1 or L2 cache.)

So what code should a compiler generate?  The usual solution is to 
consider both the average execution time and the variance when choosing 
between two solutions.  Would you rather that the compiler used sequence 
A, with a minimum of 107 clocks and a maximum of 192, or sequence B with 
a minumum of 100 clocks and a worst case of 1000?  This often results in 
not using MMX registers or SSE code where the potential savings is only 
a few percent.  If the user forces the compiler to use SSE in all cases, 
the horrible sequences will be in there along with the good ones.

One last horrible problem with the innocent sounding name of store to 
load forwarding.  On modern processors, actual stores from registers to 
either cache or main memory can take place hundreds of clock cycles 
later than the beginning of the move instruction.  Out of order 
processors get around this by keeping track of pending writes of 
renaming registers and if a load instruction for that data is 
encountered, the load is turned into a no-op, and the register is 
renamed as the target of the load.

But what if only part of the load data is coming from the store and the 
rest is being read from cache or main memory?  Most chips throw up their 
hands and make the load instruction dependent on the store instruction 
being retired.  This is a nasty cost you don't want to run into.  (There 
are also other ways to run into store to load forwarding problems, but 
that is another topic.)  What if you have a 32-bit integer in an integer 
  register and want to combine it into a 64-bit or 128-bit SSE operand. 
  Uh-oh!  Much better to avoid the store to load restrictions and the 
SSE operations.  Again this is something you where you expect (hope?) 
the compiler will get it right, and forcing the use of SSE can result in 
very suboptimal code.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-06-19  4:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-16  9:56 Ada and vectorization Guillaume Foliard
2002-06-16 12:50 ` Dale Stanbrough
2002-06-16 20:07   ` Matthias Kretschmer
2002-06-16 22:38     ` Robert A Duff
2002-06-18  8:24       ` Matthias Kretschmer
2002-06-18 10:02         ` Dale Stanbrough
2002-06-18 16:21           ` Matthias Kretschmer
2002-06-18 19:13             ` Robert A Duff
2002-06-18 20:12               ` Matthias Kretschmer
2002-06-18 20:51                 ` Guillaume Foliard
2002-06-19  4:28                   ` Matthias Kretschmer
2002-06-18 20:13             ` Guillaume Foliard
2002-06-18 17:46           ` Ted Dennison
2002-06-16 22:45   ` Ted Dennison
2002-06-17 23:47 ` Robert I. Eachus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox