* Ada and vectorization @ 2002-06-16 9:56 Guillaume Foliard 2002-06-16 12:50 ` Dale Stanbrough 2002-06-17 23:47 ` Robert I. Eachus 0 siblings, 2 replies; 15+ messages in thread From: Guillaume Foliard @ 2002-06-16 9:56 UTC (permalink / raw) Hello, I start to learn how to use the Intel's SSE instruction set in Ada programs with inline assembly. And while reading Intel documentation (1) I was asking myself if Ada could provide a clean way of vectorization through its strong-typed approach. Could it be sensible, for the next Ada revision, to create some new attributes for array types to explicitly hint the compiler that we want to use SIMD instructions ? Language lawyers comments are definitly welcome. As SIMD in modern general purpose processors is largely available nowadays (SSE, SSE2, Altivec, etc...), IMHO, it would be a mistake for Ada to ignore the performance benefit this could bring. (1) http://www.intel.com/software/products/college/ia32/strmsimd/814down.htm Guillaume Foliard ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 9:56 Ada and vectorization Guillaume Foliard @ 2002-06-16 12:50 ` Dale Stanbrough 2002-06-16 20:07 ` Matthias Kretschmer 2002-06-16 22:45 ` Ted Dennison 2002-06-17 23:47 ` Robert I. Eachus 1 sibling, 2 replies; 15+ messages in thread From: Dale Stanbrough @ 2002-06-16 12:50 UTC (permalink / raw) Guillaume Foliard wrote: > Hello, > > I start to learn how to use the Intel's SSE instruction set in Ada programs > with inline assembly. And while reading Intel documentation (1) I was > asking myself if Ada could provide a clean way of vectorization through its > strong-typed approach. Could it be sensible, for the next Ada revision, to > create some new attributes for array types to explicitly hint the compiler > that we want to use SIMD instructions ? > Language lawyers comments are definitly welcome. As SIMD in modern general > purpose processors is largely available nowadays (SSE, SSE2, Altivec, > etc...), IMHO, it would be a mistake for Ada to ignore the performance > benefit this could bring. I think the best way to do this is via pragmas. There is one pragma - Annotate - which would be perfect for the job. I think Annotate is a Gnat only thing - the real work would have to be done with an ASIS like tool. Very much like the fortran world, where the structured comments can be ignored by ignorant compilers, and the program still behaves correctly (if not as fast). Dale ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 12:50 ` Dale Stanbrough @ 2002-06-16 20:07 ` Matthias Kretschmer 2002-06-16 22:38 ` Robert A Duff 2002-06-16 22:45 ` Ted Dennison 1 sibling, 1 reply; 15+ messages in thread From: Matthias Kretschmer @ 2002-06-16 20:07 UTC (permalink / raw) I think, that this job should be done by the compiler itself, without changing the language: a) I do not want to specify everywhere which feature to use b) vectorisation could be useful in many places, don't we want to use it anywhere, we get speed gain from the vector unit? c) there are good examples that there is no need - think of x86 architecture and intel's c compiler, it uses mmx/sse/sse2 if one let's him to use, everywhere it makes sense and the performance gain is for clean written programs high d) why binding this to variables - in one place it would be usefull to use vectorisation for this part of code, in the other place is it not, with the same variables - so explicitly using it just for one bunch of array variables could be very inefficient comparing to another approach - why not grouping together variables used in records (e.g. someone using for 3d-representation a record with x,y and z kartesian coordinates)? Ok one could enhance this feature, to record constructs, but why? e) on many architectures the vector unit is just a coprocessor, so fpu could calculate one part and vu the other one, I think we want to let the optimizer decide how to use both units to get the best performance - so why we don't let the optimizer decide when to use vu, too? GNAT uses as backend the GNU Compiler Suite - in 3.1 it is implemented - so using the same code generator, which is capable of using mmx/sse/3dnow or something now in some way (do not ask me how much - I just know, that povray is far faster after compiling it with the intel c compiler ...). Dale Stanbrough wrote: > Guillaume Foliard wrote: > >> Hello, >> >> I start to learn how to use the Intel's SSE instruction set in Ada >> programs with inline assembly. And while reading Intel documentation (1) >> I was asking myself if Ada could provide a clean way of vectorization >> through its strong-typed approach. Could it be sensible, for the next Ada >> revision, to create some new attributes for array types to explicitly >> hint the compiler that we want to use SIMD instructions ? >> Language lawyers comments are definitly welcome. As SIMD in modern >> general purpose processors is largely available nowadays (SSE, SSE2, >> Altivec, etc...), IMHO, it would be a mistake for Ada to ignore the >> performance benefit this could bring. > > > I think the best way to do this is via pragmas. There is one pragma - > Annotate - which would be perfect for the job. > I think Annotate is a Gnat only thing - the real work would have > to be done with an ASIS like tool. > > Very much like the fortran world, where the structured comments can > be ignored by ignorant compilers, and the program still behaves > correctly (if not as fast). > > Dale -- Greetings Matthias Kretschmer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 20:07 ` Matthias Kretschmer @ 2002-06-16 22:38 ` Robert A Duff 2002-06-18 8:24 ` Matthias Kretschmer 0 siblings, 1 reply; 15+ messages in thread From: Robert A Duff @ 2002-06-16 22:38 UTC (permalink / raw) Various early versions of the Ada 9X proposals had some explicit support for vectorizing and the like. I don't remember the details. You could look up the early versions if you're interested. These were removed, not because there was anything wrong with them technically in and of themselves, but because there was a general feeling amongst reviewers (especially compiler writers) that there were too many new features. - Bob ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 22:38 ` Robert A Duff @ 2002-06-18 8:24 ` Matthias Kretschmer 2002-06-18 10:02 ` Dale Stanbrough 0 siblings, 1 reply; 15+ messages in thread From: Matthias Kretschmer @ 2002-06-18 8:24 UTC (permalink / raw) Robert A Duff wrote: > Various early versions of the Ada 9X proposals had some explicit support > for vectorizing and the like. I don't remember the details. You could > look up the early versions if you're interested. > > These were removed, not because there was anything wrong with them > technically in and of themselves, but because there was a general > feeling amongst reviewers (especially compiler writers) that there were > too many new features. > > - Bob Oh didn't wanted to say, this is wrong, but I think there is a better solution - I do not want to care about, where to use or not use this or that feature of an architecture or of architectures, maybe they are changing, what to do next, write some new pragmas, to let the compiler optimize for the new architectures better? I think that today compilers - optimizers of compilers - are capable of finding places where to use vector units or what how ever so cool feature of you cpu, so I want them to decide and let me alone with the real important stuff. I do not want to know how many clock cycles A takes in unit B of cpu C. Think of what you have to know just to get something done, which other compilers do for their own without bothering the programmer. The logic to decide when to use or not to use vectorization of instructions is out there, just someone should implement this in an Ada compiler. No need to change the language itself. So I do not know how good gcc3.1 is in vectorization of instructions, but there are some good examples how to do this right (so I don't know of Ada compiler :( ): Sun's C Compiler, Intel's C Compiler, both if someone wants them to use, do a lot of vectorization with clean written code. -- Greetings Matthias Kretschmer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 8:24 ` Matthias Kretschmer @ 2002-06-18 10:02 ` Dale Stanbrough 2002-06-18 16:21 ` Matthias Kretschmer 2002-06-18 17:46 ` Ted Dennison 0 siblings, 2 replies; 15+ messages in thread From: Dale Stanbrough @ 2002-06-18 10:02 UTC (permalink / raw) In article <aemqnr$grq$07$1@news.t-online.com>, Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> wrote: > I think that today compilers - > optimizers of compilers - are capable of finding places where to use vector > units or what how ever so cool feature of you cpu, so I want them to decide > and let me alone with the real important stuff. I do not want to know how > many clock cycles A takes in unit B of cpu C. Think of what you have to > know just to get something done, which other compilers do for their own > without bothering the programmer. It would be nice if we let the compiler discover all of the possible vectorisations possible. I've got no idea what the current state of the art is in this respect, however I would imagine that it would still be -cheaper- to build a simple compiler that took hints or directions from the programmer about possible vectorisation. Does anyone have real info instead of my speculation? Dale ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 10:02 ` Dale Stanbrough @ 2002-06-18 16:21 ` Matthias Kretschmer 2002-06-18 19:13 ` Robert A Duff 2002-06-18 20:13 ` Guillaume Foliard 2002-06-18 17:46 ` Ted Dennison 1 sibling, 2 replies; 15+ messages in thread From: Matthias Kretschmer @ 2002-06-18 16:21 UTC (permalink / raw) Dale Stanbrough wrote: > In article <aemqnr$grq$07$1@news.t-online.com>, > Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> wrote: > >> I think that today compilers - >> optimizers of compilers - are capable of finding places where to use >> vector units or what how ever so cool feature of you cpu, so I want them >> to decide and let me alone with the real important stuff. I do not want >> to know how many clock cycles A takes in unit B of cpu C. Think of what >> you have to know just to get something done, which other compilers do for >> their own without bothering the programmer. > > It would be nice if we let the compiler discover all of the possible > vectorisations possible. I've got no idea what the current state > of the art is in this respect, however I would imagine that it would > still be -cheaper- to build a simple compiler that took hints or > directions from the programmer about possible vectorisation. maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all those bells and whistles, all those wonderful facilities of your so-called powerful programming languages belong to the solution set rather than to the problem set?" And this is the question we have to ask here I think, and I am quite sure, that vectorization hints belong to the problem set ... and looking at the compiler design people, they are doing a great job, what today a compiler does is not compareable to the stuff possible (or available) twenty years ago - there are definatively nice compiler implementations available today using all those nice feature of your cpu - so not all of course, but I don't think it is a solution to move all the logic in the language, so the programmer has to care about, it just let's the programmer to do all over the stuff again (somehow inventing the wheel everytime he writes down code again). The other advantage is, that old code can gain more performance without changing one line of code, just by using a newer version of a compiler or another compiler. Having all features architectures provide accessable through a language I think should be called assembler and has nothing to do with abstraction of the programming of the underlying hardware. Do we really want to implement every single feature cpu-designers provide in the language itself? Then we will have some very bloated, complex language which will raise the difficult level of programming in this language. And this is not the aim of "higher programming languages". They should make it easy, or we all could just use assembler. The reason why I personally use Ada is, that it is abstract, not that I have to care about the hardware and I think this is the way it should be. > > Does anyone have real info instead of my speculation? > > Dale -- Greetings Matthias Kretschmer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 16:21 ` Matthias Kretschmer @ 2002-06-18 19:13 ` Robert A Duff 2002-06-18 20:12 ` Matthias Kretschmer 2002-06-18 20:13 ` Guillaume Foliard 1 sibling, 1 reply; 15+ messages in thread From: Robert A Duff @ 2002-06-18 19:13 UTC (permalink / raw) Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> writes: > maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all those > bells and whistles, all those wonderful facilities of your so-called > powerful programming languages belong to the solution set rather than to > the problem set?" Buggy optimizers are part of my problem set, too. You're probably right in this case, but surely in *some* cases, it is appropriate to let the programmer give the compiler hints about how to optimize. The compiler is still doing the error-prone part (deciding whether the optimization is correct, and actually performing the transformation). The programmer is merely suggesting that the optimization is worthwhile. - Bob ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 19:13 ` Robert A Duff @ 2002-06-18 20:12 ` Matthias Kretschmer 2002-06-18 20:51 ` Guillaume Foliard 0 siblings, 1 reply; 15+ messages in thread From: Matthias Kretschmer @ 2002-06-18 20:12 UTC (permalink / raw) Robert A Duff wrote: > Matthias Kretschmer <schreib_mir_du_spacken@gmx.de> writes: > >> maybe cheaper, but let me cite Dijkstra: "Are you quite sure that all >> those bells and whistles, all those wonderful facilities of your >> so-called powerful programming languages belong to the solution set >> rather than to the problem set?" > > Buggy optimizers are part of my problem set, too. sure :) - but hopefully this won't happen - but this could happen with any piece of code of any language, if the optimizer/compiler is written poorly > > You're probably right in this case, but surely in *some* cases, it is > appropriate to let the programmer give the compiler hints about how to > optimize. The compiler is still doing the error-prone part (deciding > whether the optimization is correct, and actually performing the > transformation). The programmer is merely suggesting that the > optimization is worthwhile. > > - Bob Yeah, but the compiler should use it even the programmer isn't suggesting it - if you look at pragma inline (so I do not know how gnat which I currently only use handles this, but there are good examples that not only functions or procedures which are told to be inlined are inlined to gain speed with, e.g. Intel's C++ Compiler) - of course only if it is useful, which of course means more performance (or regulated the more speed <-> more size measurement about optimization flags or something). As suggest in this thread using pragma for loops only isn't enough I think (so making it complicated I think - bloating the language up), because if you just think about something like: a := a1*a2; b := b1*b2; c := c1*c2; d := d1*d2 wouldn't be cool if it is vectorized? you may say, throw anything in an array and then put it in a loop, but can't it happen, that these a,b,c and d aren't related, so putting it together into one array wouldn't be very wise. And these situations could be achived without having these statements put together somewhere in one procedure or something, think of these inter-procedure optimization features of compilers like sun's c compiler (sorry but I do not know much about available Ada compilers, so my examples are from other languages adapter, but they aren't depended on c itself) which are able to optimize code and vectorize it if useful even if some frictions of code are written in other procedures/functions - this of course includes some inlinening and the code is very useless if one wants to debug - but who really cares how the code gets faster if one needs speed? :) Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend is capable of using the vector units of at least x86-based cpus as stated on gcc.gnu.org) which currently use vectorization and/or inter-procedure optimization? -- Greetings Matthias Kretschmer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 20:12 ` Matthias Kretschmer @ 2002-06-18 20:51 ` Guillaume Foliard 2002-06-19 4:28 ` Matthias Kretschmer 0 siblings, 1 reply; 15+ messages in thread From: Guillaume Foliard @ 2002-06-18 20:51 UTC (permalink / raw) Matthias Kretschmer wrote: > As suggest in this thread using pragma for loops only isn't enough I think > (so making it complicated I think - bloating the language up), because if > you just think about something like: > a := a1*a2; > b := b1*b2; > c := c1*c2; > d := d1*d2 > wouldn't be cool if it is vectorized? you may say, throw anything in an > array and then put it in a loop, but can't it happen, that these a,b,c and > d aren't related, so putting it together into one array wouldn't be very > wise. Even if there not related from a semantic point of view, they are from a computational point of view. For the sake of performance, if performance matters of course, why should not we layout data in a efficient manner ? This does not break data abstraction, just the layout. > Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend > is capable of using the vector units of at least x86-based cpus as stated > on gcc.gnu.org) which currently use vectorization and/or inter-procedure > optimization? Just a precision here, GCC 3.1 does not vectorize, it just uses the vector unit in a scalar manner as a faster x87 FPU. Have you got any links talking about "inter-procedure optimization" ? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 20:51 ` Guillaume Foliard @ 2002-06-19 4:28 ` Matthias Kretschmer 0 siblings, 0 replies; 15+ messages in thread From: Matthias Kretschmer @ 2002-06-19 4:28 UTC (permalink / raw) Guillaume Foliard wrote: > Matthias Kretschmer wrote: > >> As suggest in this thread using pragma for loops only isn't enough I >> think (so making it complicated I think - bloating the language up), >> because if you just think about something like: >> a := a1*a2; >> b := b1*b2; >> c := c1*c2; >> d := d1*d2 >> wouldn't be cool if it is vectorized? you may say, throw anything in an >> array and then put it in a loop, but can't it happen, that these a,b,c >> and d aren't related, so putting it together into one array wouldn't be >> very wise. > > Even if there not related from a semantic point of view, they are from a > computational point of view. For the sake of performance, if performance > matters of course, why should not we layout data in a efficient manner ? > This does not break data abstraction, just the layout. Well I consider this as very ugly, I think - just looking at some compilers I don't feel unwise or stupid - that the compiler has to care about how to rearrange stuff, so it runs fast. Do we always want to read these nice optimization manuals for every new CPU that comes up? I do not want it - for C I can just wait till a new version of icc is out and magic the same code runs much faster on the new cpu (as it was with P4 and before with P3 ans so on ...). > >> Btw. are there Ada compilers available (beside gcc 3.1 - yes the backend >> is capable of using the vector units of at least x86-based cpus as stated >> on gcc.gnu.org) which currently use vectorization and/or inter-procedure >> optimization? > > Just a precision here, GCC 3.1 does not vectorize, it just uses the vector > unit in a scalar manner as a faster x87 FPU. > Have you got any links talking about "inter-procedure optimization" ? ah ok - then I got something wrong - but this is of course available in other compilers ... for the last point just look at the icc documents - afaik it uses the same technique, didn't found useful abstract information about this stuff :( The Intel C Manual itself holds a short abstract what is done with these optimizations enable (btw. inter-procedure optimization is available across module borders - it would be nice to have it, too, for ada across package borders). Btw. refering to you other post - I know that it isn't really trivial to transform a sequential program to a parallel one, but why should be going one abstraction level back be the right step, maybe if it is getting a problem for compiler design, we should even try to find some other solution, even if we loose Ada on the way... On the other hand, the parallelization of code is done in cpus today - they are rearranging the code, so it can be executed in parallel, nothing else has to be done now in the compiler. -- Greetings Matthias Kretschmer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 16:21 ` Matthias Kretschmer 2002-06-18 19:13 ` Robert A Duff @ 2002-06-18 20:13 ` Guillaume Foliard 1 sibling, 0 replies; 15+ messages in thread From: Guillaume Foliard @ 2002-06-18 20:13 UTC (permalink / raw) Matthias Kretschmer wrote: > Having all features architectures provide accessable through a language I > think should be called assembler and has nothing to do with abstraction of > the programming of the underlying hardware. Do we really want to implement > every single feature cpu-designers provide in the language itself? SIMD concepts does not seem a cpu-designer's feature to me, but rather a different approach to problems solving... > Then we > will have some very bloated, complex language which will raise the > difficult level of programming in this language. And this is not the aim > of "higher programming languages". They should make it easy, or we all > could just use assembler. The reason why I personally use Ada is, that it > is abstract, not that I have to care about the hardware and I think this > is the way it should be. I agree with you on this. That's why, I was initially wondering if we could find a way to abstract "vectorization processing". After having read the Intel document on Pentium4 optimizations I don't think compilers can automagically perform a really efficient vectorization as a true vectorization implies algorithms different from the ones used in a sequential processing, and most importantly a different data layout. Can moderm compiler technology deal with data layout, and automatically choose the best one (switch between Arrays of Structs and Structs of Array for instance, an operation called "Data Swizzling" (1)) ? Okay, you may have some optimized loops but a sequential algorithm by its own nature may not be easely transformed to an efficient parallel one by a compiler. If vectorization processing abstraction reveals to be infeasible, we should not be disgusted by introducing some low-level features anyway. After all, there is some quite low-level stuff in the B.2 annex of the Ada95 RM (2). Bitwise operations can really be helpful sometimes. So can be SIMD instructions. (1) http://www.google.fr/search?hl=fr&q=Data+Swizzling&btnG=Recherche+Google&meta= (2) http://www.adahome.com/rm95/rm9x-B-02.html#6 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-18 10:02 ` Dale Stanbrough 2002-06-18 16:21 ` Matthias Kretschmer @ 2002-06-18 17:46 ` Ted Dennison 1 sibling, 0 replies; 15+ messages in thread From: Ted Dennison @ 2002-06-18 17:46 UTC (permalink / raw) Dale Stanbrough <dstanbro@bigpond.net.au> wrote in message news:<dstanbro-16FC0C.20004918062002@news-server.bigpond.net.au>... > It would be nice if we let the compiler discover all of the possible > vectorisations possible. I've got no idea what the current state > of the art is in this respect, however I would imagine that it would > still be -cheaper- to build a simple compiler that took hints or > directions from the programmer about possible vectorisation. > > Does anyone have real info instead of my speculation? > I took a graduate-level compiler optimzation course a couple of years ago that dealt almost entirely with this. Apparently most reasearch done into compiler optimizations of this sort is done using Fortran, as most of the folks who need that kind of number-cruching power have Fortran code they want it done with. Fortran's solution to this issue was to use the "hint" approach by introducing new loop constructs (and a new dialect - HPF) for this. So clearly they think it isn't feasable for normal Fortran. Actually, I think the real issue may be that some operations will give different results when done in parallel, and there has to be a way of saying that's OK (or not OK) for your particular app. So I really think the best way to do this in Ada would be via a pragma on the loop name. The main drawback here is that if you put such code through a compiler that doesn't support the pragma, then you may end up with an incorrect calculation (in addition to a slower one). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 12:50 ` Dale Stanbrough 2002-06-16 20:07 ` Matthias Kretschmer @ 2002-06-16 22:45 ` Ted Dennison 1 sibling, 0 replies; 15+ messages in thread From: Ted Dennison @ 2002-06-16 22:45 UTC (permalink / raw) Dale Stanbrough wrote: >>I start to learn how to use the Intel's SSE instruction set in Ada programs >>with inline assembly. And while reading Intel documentation (1) I was >>asking myself if Ada could provide a clean way of vectorization through its >>strong-typed approach. Could it be sensible, for the next Ada revision, to > I think the best way to do this is via pragmas. There is one pragma - When I was reading about HPF, I remember thinking that the parallel loops could be done just as easily in Ada with custom pragmas ("pragma parallel (Loopname);"). I also remember thinking that a lot of the optimization problems that we obsessed over in class (it was a compiler optimization class) would be much simpler in Ada. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ada and vectorization 2002-06-16 9:56 Ada and vectorization Guillaume Foliard 2002-06-16 12:50 ` Dale Stanbrough @ 2002-06-17 23:47 ` Robert I. Eachus 1 sibling, 0 replies; 15+ messages in thread From: Robert I. Eachus @ 2002-06-17 23:47 UTC (permalink / raw) Guillaume Foliard wrote: > I start to learn how to use the Intel's SSE instruction set in Ada > programs with inline assembly. And while reading Intel > documentation (1) I was asking myself if Ada could provide a clean > way of vectorization through its strong-typed approach. Could it > be sensible, for the next Ada revision, to create some new > attributes for array types to explicitly hint the compiler that we > want to use SIMD instructions ? Language lawyers comments are > definitly welcome. As SIMD in modern general purpose processors is > largely available nowadays (SSE, SSE2, Altivec, etc...), IMHO, it > would be a mistake for Ada to ignore the performance benefit this > could bring. Let me answer this with two different hats on. First language lawyer: You have to ask what restrictions imposed by the language prevent the use of these features, then look at how to either relax the restrictions or create language features which explicitly bypass the restrictions. This has been done in Ada. For example: Ada allows non-standard numeric types to allow for things like a floating-point type with inaccurate divides, integer types that cannot be used as array indicies, etc. If you don't need accuracy, you can compile and execute programs with the strict mode of the Numeric Annex turned off. I am not quite that crazy, but it could make sense for 3d display code. ;-) See 11.6 Exceptions and Optimization (and that section can lead to a real long thread...) So if it takes something special to use an SIMD instruction set, the language allows it. In practice all of the existing interesting SIMD extensions can be mapped to standard integer, boolean, float, etc. types. Now from a practical point of view: There are two problems with designing language extensions to map to specific hardware. The first is that software and language lifetimes are much greater than hardware lifetimes. For example, you mention AMD's 3dNow! As it happens, there are three versions of 3dNow! The original version in the K6/2, the extended version in the original Athlons, and the version in the Athlon XP (and Morgan Duron) chips that is a superset of Intel's SSE. The Intel situation is a little clearer, but even there if you are doing a decent (portable) programming job you have to deal with MMX only chips, those with SSE, and those with SSE2. It is much nicer to use the right architecture switch and have the compiler produce efficient code for your target architecture. (If you are really doing a good job, you will isolate all the SIMD dependent code into a few dlls, and have the installer choose the correct version of each for the current hardware. The second practical issue is much nastier. Two implementations of the same identical ISA can have very different performance behavior. Worse, two otherwise exclusive features can have nasty interactions in an implementation. Let me take a simple example, MMX and 3dNow! Athlons allow integers and floating-point values to share architectural registers. Due to the large floating-point register renaming files this is actually a nice feature. But if you reset mode bits, the programmer usually cares which mode bits are used for which operations. The solution is to generate an SFENCE instruction which insures that the view of hardware registers and memory is globally consistant, even for things which are otherwise weakly ordered. This instruction can have almost no latency--or require thousands of clock cycles in the worst cases. (For example a write may cause a TLB miss, and the part of the memory table that needs to be read may not be in L1 or L2 cache.) So what code should a compiler generate? The usual solution is to consider both the average execution time and the variance when choosing between two solutions. Would you rather that the compiler used sequence A, with a minimum of 107 clocks and a maximum of 192, or sequence B with a minumum of 100 clocks and a worst case of 1000? This often results in not using MMX registers or SSE code where the potential savings is only a few percent. If the user forces the compiler to use SSE in all cases, the horrible sequences will be in there along with the good ones. One last horrible problem with the innocent sounding name of store to load forwarding. On modern processors, actual stores from registers to either cache or main memory can take place hundreds of clock cycles later than the beginning of the move instruction. Out of order processors get around this by keeping track of pending writes of renaming registers and if a load instruction for that data is encountered, the load is turned into a no-op, and the register is renamed as the target of the load. But what if only part of the load data is coming from the store and the rest is being read from cache or main memory? Most chips throw up their hands and make the load instruction dependent on the store instruction being retired. This is a nasty cost you don't want to run into. (There are also other ways to run into store to load forwarding problems, but that is another topic.) What if you have a 32-bit integer in an integer register and want to combine it into a 64-bit or 128-bit SSE operand. Uh-oh! Much better to avoid the store to load restrictions and the SSE operations. Again this is something you where you expect (hope?) the compiler will get it right, and forcing the use of SSE can result in very suboptimal code. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2002-06-19 4:28 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-06-16 9:56 Ada and vectorization Guillaume Foliard 2002-06-16 12:50 ` Dale Stanbrough 2002-06-16 20:07 ` Matthias Kretschmer 2002-06-16 22:38 ` Robert A Duff 2002-06-18 8:24 ` Matthias Kretschmer 2002-06-18 10:02 ` Dale Stanbrough 2002-06-18 16:21 ` Matthias Kretschmer 2002-06-18 19:13 ` Robert A Duff 2002-06-18 20:12 ` Matthias Kretschmer 2002-06-18 20:51 ` Guillaume Foliard 2002-06-19 4:28 ` Matthias Kretschmer 2002-06-18 20:13 ` Guillaume Foliard 2002-06-18 17:46 ` Ted Dennison 2002-06-16 22:45 ` Ted Dennison 2002-06-17 23:47 ` Robert I. Eachus
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox