From: "Robert I. Eachus" <rieachus@attbi.com>
Subject: Re: Ada and vectorization
Date: Mon, 17 Jun 2002 23:47:54 GMT
Date: 2002-06-17T23:47:54+00:00 [thread overview]
Message-ID: <3D0E7575.1070709@attbi.com> (raw)
In-Reply-To: aehnbn$9ea$1@wanadoo.fr
Guillaume Foliard wrote:
> I start to learn how to use the Intel's SSE instruction set in Ada
> programs with inline assembly. And while reading Intel
> documentation (1) I was asking myself if Ada could provide a clean
> way of vectorization through its strong-typed approach. Could it
> be sensible, for the next Ada revision, to create some new
> attributes for array types to explicitly hint the compiler that we
> want to use SIMD instructions ? Language lawyers comments are
> definitly welcome. As SIMD in modern general purpose processors is
> largely available nowadays (SSE, SSE2, Altivec, etc...), IMHO, it
> would be a mistake for Ada to ignore the performance benefit this
> could bring.
Let me answer this with two different hats on.
First language lawyer: You have to ask what restrictions imposed by the
language prevent the use of these features, then look at how to either
relax the restrictions or create language features which explicitly
bypass the restrictions. This has been done in Ada. For example:
Ada allows non-standard numeric types to allow for things like a
floating-point type with inaccurate divides, integer types that cannot
be used as array indicies, etc.
If you don't need accuracy, you can compile and execute programs with
the strict mode of the Numeric Annex turned off. I am not quite that
crazy, but it could make sense for 3d display code. ;-)
See 11.6 Exceptions and Optimization (and that section can lead to a
real long thread...)
So if it takes something special to use an SIMD instruction set, the
language allows it. In practice all of the existing interesting SIMD
extensions can be mapped to standard integer, boolean, float, etc. types.
Now from a practical point of view: There are two problems with
designing language extensions to map to specific hardware. The first is
that software and language lifetimes are much greater than hardware
lifetimes. For example, you mention AMD's 3dNow! As it happens, there
are three versions of 3dNow! The original version in the K6/2, the
extended version in the original Athlons, and the version in the Athlon
XP (and Morgan Duron) chips that is a superset of Intel's SSE.
The Intel situation is a little clearer, but even there if you are doing
a decent (portable) programming job you have to deal with MMX only
chips, those with SSE, and those with SSE2. It is much nicer to use the
right architecture switch and have the compiler produce efficient code
for your target architecture. (If you are really doing a good job, you
will isolate all the SIMD dependent code into a few dlls, and have the
installer choose the correct version of each for the current hardware.
The second practical issue is much nastier. Two implementations of the
same identical ISA can have very different performance behavior. Worse,
two otherwise exclusive features can have nasty interactions in an
implementation. Let me take a simple example, MMX and 3dNow! Athlons
allow integers and floating-point values to share architectural
registers. Due to the large floating-point register renaming files this
is actually a nice feature. But if you reset mode bits, the programmer
usually cares which mode bits are used for which operations. The
solution is to generate an SFENCE instruction which insures that the
view of hardware registers and memory is globally consistant, even for
things which are otherwise weakly ordered. This instruction can have
almost no latency--or require thousands of clock cycles in the worst
cases. (For example a write may cause a TLB miss, and the part of the
memory table that needs to be read may not be in L1 or L2 cache.)
So what code should a compiler generate? The usual solution is to
consider both the average execution time and the variance when choosing
between two solutions. Would you rather that the compiler used sequence
A, with a minimum of 107 clocks and a maximum of 192, or sequence B with
a minumum of 100 clocks and a worst case of 1000? This often results in
not using MMX registers or SSE code where the potential savings is only
a few percent. If the user forces the compiler to use SSE in all cases,
the horrible sequences will be in there along with the good ones.
One last horrible problem with the innocent sounding name of store to
load forwarding. On modern processors, actual stores from registers to
either cache or main memory can take place hundreds of clock cycles
later than the beginning of the move instruction. Out of order
processors get around this by keeping track of pending writes of
renaming registers and if a load instruction for that data is
encountered, the load is turned into a no-op, and the register is
renamed as the target of the load.
But what if only part of the load data is coming from the store and the
rest is being read from cache or main memory? Most chips throw up their
hands and make the load instruction dependent on the store instruction
being retired. This is a nasty cost you don't want to run into. (There
are also other ways to run into store to load forwarding problems, but
that is another topic.) What if you have a 32-bit integer in an integer
register and want to combine it into a 64-bit or 128-bit SSE operand.
Uh-oh! Much better to avoid the store to load restrictions and the
SSE operations. Again this is something you where you expect (hope?)
the compiler will get it right, and forcing the use of SSE can result in
very suboptimal code.
prev parent reply other threads:[~2002-06-17 23:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-06-16 9:56 Ada and vectorization Guillaume Foliard
2002-06-16 12:50 ` Dale Stanbrough
2002-06-16 20:07 ` Matthias Kretschmer
2002-06-16 22:38 ` Robert A Duff
2002-06-18 8:24 ` Matthias Kretschmer
2002-06-18 10:02 ` Dale Stanbrough
2002-06-18 16:21 ` Matthias Kretschmer
2002-06-18 19:13 ` Robert A Duff
2002-06-18 20:12 ` Matthias Kretschmer
2002-06-18 20:51 ` Guillaume Foliard
2002-06-19 4:28 ` Matthias Kretschmer
2002-06-18 20:13 ` Guillaume Foliard
2002-06-18 17:46 ` Ted Dennison
2002-06-16 22:45 ` Ted Dennison
2002-06-17 23:47 ` Robert I. Eachus [this message]
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox