From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,20e94ebefeef23df
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-05-30 03:54:46 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!logbridge.uoregon.edu!arclight.uoregon.edu!wn13feed!wn12feed!wn14feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi.com!rwcrnsc54.POSTED!not-for-mail
From: "Jeffrey Creem" <jeff@thecreems.com>
Newsgroups: comp.lang.ada
References: <pan.2003.05.30.04.55.14.186980@mail.utexas.edu>
Subject: Re: SIMD w/ MMX or 3dnow!
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <OLGBa.776636$OV.711798@rwcrnsc54>
NNTP-Posting-Host: 66.31.5.146
X-Complaints-To: abuse@attbi.com
X-Trace: rwcrnsc54 1054292078 66.31.5.146 (Fri, 30 May 2003 10:54:38 GMT)
NNTP-Posting-Date: Fri, 30 May 2003 10:54:38 GMT
Organization: AT&T Broadband
Date: Fri, 30 May 2003 10:54:38 GMT
Xref: archiver1.google.com comp.lang.ada:38073
Date: 2003-05-30T10:54:38+00:00
List-Id: <comp.lang.ada>


"Bobby D. Bryant" <bdbryant@mail.utexas.edu> wrote in message
news:pan.2003.05.30.04.55.14.186980@mail.utexas.edu...
> Has anyone tried speeding up fp multiplication by calling MMX or 3dnow!
> instructions from within an Ada program?
>

Hmm... fp multiplication...As in fixed point (which would apply to MMX) or
Floating point (which would really require SSE)...

Either way it does not really matter since there is no good native support
out there for this for GNAT. Of course you did not mention which compiler
you are using and this would be key information since any sort of SIMD
support in any language tends to be very compiler dependant.

You might be able to get some speedup with a recent version of gcc/ada (like
3.3) by compiling with
-mcpu=pentium4 -march=pentium4 -msse -mfpmath=sse

(from the gcc manual)
-mfpmath=sse
Use scalar floating point instructions present in the SSE instruction set.
This instruction set is supported by Pentium3 and newer chips, in the AMD
line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE
instruction set supports only single precision arithmetics, thus the double
and extended precision arithmetics is still done using 387. Later version,
present only in Pentium4 and the future AMD x86-64 chips supports double
precision arithmetics too.
For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable
SSE extensions and make this option effective. For x86-64 compiler, these
extensions are enabled by default.

The resulting code should be considerably faster in majority of cases and
avoid the numerical instability problems of 387 code, but may break some
existing code that expects temporaries to be 80bit.

So, you might get some speedup although not true SIMD speedup this way.


If  you want true SIMD speedup you will either need to use inline assembly
in Ada or write some simple vector routines with the C vector intrinsics and
then pragma interface to those.


It would be nice to have something like

  function ia32_addps (A, B : in V4SF_Type) return v4SF_Type;

  pragma import(Intrinsic, ia32_addps);

available in GNAT but there is nothing like that available yet.