Re: SIMD w/ MMX or 3dnow!

comp.lang.ada
 help / color / mirror / Atom feed

From: "Jeffrey Creem" <jeff@thecreems.com>
Subject: Re: SIMD w/ MMX or 3dnow!
Date: Fri, 30 May 2003 10:54:38 GMT
Date: 2003-05-30T10:54:38+00:00	[thread overview]
Message-ID: <OLGBa.776636$OV.711798@rwcrnsc54> (raw)
In-Reply-To: pan.2003.05.30.04.55.14.186980@mail.utexas.edu

"Bobby D. Bryant" <bdbryant@mail.utexas.edu> wrote in message
news:pan.2003.05.30.04.55.14.186980@mail.utexas.edu...
> Has anyone tried speeding up fp multiplication by calling MMX or 3dnow!
> instructions from within an Ada program?
>

Hmm... fp multiplication...As in fixed point (which would apply to MMX) or
Floating point (which would really require SSE)...

Either way it does not really matter since there is no good native support
out there for this for GNAT. Of course you did not mention which compiler
you are using and this would be key information since any sort of SIMD
support in any language tends to be very compiler dependant.

You might be able to get some speedup with a recent version of gcc/ada (like
3.3) by compiling with
-mcpu=pentium4 -march=pentium4 -msse -mfpmath=sse

(from the gcc manual)
-mfpmath=sse
Use scalar floating point instructions present in the SSE instruction set.
This instruction set is supported by Pentium3 and newer chips, in the AMD
line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE
instruction set supports only single precision arithmetics, thus the double
and extended precision arithmetics is still done using 387. Later version,
present only in Pentium4 and the future AMD x86-64 chips supports double
precision arithmetics too.
For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable
SSE extensions and make this option effective. For x86-64 compiler, these
extensions are enabled by default.

The resulting code should be considerably faster in majority of cases and
avoid the numerical instability problems of 387 code, but may break some
existing code that expects temporaries to be 80bit.

So, you might get some speedup although not true SIMD speedup this way.

If  you want true SIMD speedup you will either need to use inline assembly
in Ada or write some simple vector routines with the C vector intrinsics and
then pragma interface to those.

It would be nice to have something like

  function ia32_addps (A, B : in V4SF_Type) return v4SF_Type;

  pragma import(Intrinsic, ia32_addps);

available in GNAT but there is nothing like that available yet.

     prev parent reply	other threads:[~2003-05-30 10:54 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-30  4:55 SIMD w/ MMX or 3dnow! Bobby D. Bryant
2003-05-30 10:54 ` Jeffrey Creem [this message]

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox