SIMD w/ MMX or 3dnow!

comp.lang.ada
 help / color / mirror / Atom feed

* SIMD w/ MMX or 3dnow!
@ 2003-05-30  4:55 Bobby D. Bryant
  2003-05-30 10:54 ` Jeffrey Creem
  0 siblings, 1 reply; 2+ messages in thread
From: Bobby D. Bryant @ 2003-05-30  4:55 UTC (permalink / raw)


Has anyone tried speeding up fp multiplication by calling MMX or 3dnow!
instructions from within an Ada program?

I work with artificial neural networks and much of the signal propagation
can be reduced to instances of multiplying a single fp number times all
the fp numbers in an array and adding the result of each individual
multiplication to a different accumulator variable for each number in the
array.  (I.e., an input value is multiplied against a slice of the weight
matrix and the results are added to the values in the hidden layer.)

Any benchmark results and/or example code would be appreciated.  Thanks.

-- 
Bobby Bryant
Austin, Texas




^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: SIMD w/ MMX or 3dnow!
  2003-05-30  4:55 SIMD w/ MMX or 3dnow! Bobby D. Bryant
@ 2003-05-30 10:54 ` Jeffrey Creem
  0 siblings, 0 replies; 2+ messages in thread
From: Jeffrey Creem @ 2003-05-30 10:54 UTC (permalink / raw)



"Bobby D. Bryant" <bdbryant@mail.utexas.edu> wrote in message
news:pan.2003.05.30.04.55.14.186980@mail.utexas.edu...
> Has anyone tried speeding up fp multiplication by calling MMX or 3dnow!
> instructions from within an Ada program?
>

Hmm... fp multiplication...As in fixed point (which would apply to MMX) or
Floating point (which would really require SSE)...

Either way it does not really matter since there is no good native support
out there for this for GNAT. Of course you did not mention which compiler
you are using and this would be key information since any sort of SIMD
support in any language tends to be very compiler dependant.

You might be able to get some speedup with a recent version of gcc/ada (like
3.3) by compiling with
-mcpu=pentium4 -march=pentium4 -msse -mfpmath=sse

(from the gcc manual)
-mfpmath=sse
Use scalar floating point instructions present in the SSE instruction set.
This instruction set is supported by Pentium3 and newer chips, in the AMD
line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE
instruction set supports only single precision arithmetics, thus the double
and extended precision arithmetics is still done using 387. Later version,
present only in Pentium4 and the future AMD x86-64 chips supports double
precision arithmetics too.
For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable
SSE extensions and make this option effective. For x86-64 compiler, these
extensions are enabled by default.

The resulting code should be considerably faster in majority of cases and
avoid the numerical instability problems of 387 code, but may break some
existing code that expects temporaries to be 80bit.

So, you might get some speedup although not true SIMD speedup this way.



If  you want true SIMD speedup you will either need to use inline assembly
in Ada or write some simple vector routines with the C vector intrinsics and
then pragma interface to those.



It would be nice to have something like

  function ia32_addps (A, B : in V4SF_Type) return v4SF_Type;

  pragma import(Intrinsic, ia32_addps);

available in GNAT but there is nothing like that available yet.







^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-05-30 10:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-30  4:55 SIMD w/ MMX or 3dnow! Bobby D. Bryant
2003-05-30 10:54 ` Jeffrey Creem

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox