From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,20e94ebefeef23df X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-05-30 03:54:46 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!logbridge.uoregon.edu!arclight.uoregon.edu!wn13feed!wn12feed!wn14feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi.com!rwcrnsc54.POSTED!not-for-mail From: "Jeffrey Creem" Newsgroups: comp.lang.ada References: Subject: Re: SIMD w/ MMX or 3dnow! X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Message-ID: NNTP-Posting-Host: 66.31.5.146 X-Complaints-To: abuse@attbi.com X-Trace: rwcrnsc54 1054292078 66.31.5.146 (Fri, 30 May 2003 10:54:38 GMT) NNTP-Posting-Date: Fri, 30 May 2003 10:54:38 GMT Organization: AT&T Broadband Date: Fri, 30 May 2003 10:54:38 GMT Xref: archiver1.google.com comp.lang.ada:38073 Date: 2003-05-30T10:54:38+00:00 List-Id: "Bobby D. Bryant" wrote in message news:pan.2003.05.30.04.55.14.186980@mail.utexas.edu... > Has anyone tried speeding up fp multiplication by calling MMX or 3dnow! > instructions from within an Ada program? > Hmm... fp multiplication...As in fixed point (which would apply to MMX) or Floating point (which would really require SSE)... Either way it does not really matter since there is no good native support out there for this for GNAT. Of course you did not mention which compiler you are using and this would be key information since any sort of SIMD support in any language tends to be very compiler dependant. You might be able to get some speedup with a recent version of gcc/ada (like 3.3) by compiling with -mcpu=pentium4 -march=pentium4 -msse -mfpmath=sse (from the gcc manual) -mfpmath=sse Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too. For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default. The resulting code should be considerably faster in majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit. So, you might get some speedup although not true SIMD speedup this way. If you want true SIMD speedup you will either need to use inline assembly in Ada or write some simple vector routines with the C vector intrinsics and then pragma interface to those. It would be nice to have something like function ia32_addps (A, B : in V4SF_Type) return v4SF_Type; pragma import(Intrinsic, ia32_addps); available in GNAT but there is nothing like that available yet.