From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.99.108.8 with SMTP id h8mr2676568pgc.50.1481930401385;
        Fri, 16 Dec 2016 15:20:01 -0800 (PST)
X-Received: by 10.157.4.119 with SMTP id 110mr355298otc.11.1481930401335; Fri,
 16 Dec 2016 15:20:01 -0800 (PST)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.glorb.com!b123no874886itb.0!news-out.google.com!u18ni15570ita.0!nntp.google.com!b123no874883itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Fri, 16 Dec 2016 15:20:00 -0800 (PST)
In-Reply-To: <o31i2o$k1l$1@franka.jacob-sparre.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8303:2100:5985:2c17:9409:aa9c;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8303:2100:5985:2c17:9409:aa9c
References: <8d0f7f03-9324-4702-9100-d6b8a1f16fc5@googlegroups.com>
 <o31i2o$k1l$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ce9bddec-64ba-40e9-8fc9-c70adc0555c1@googlegroups.com>
Subject: Re: Trigonometric operations on x86 and x64 CPUs
From: Robert Eachus <rieachus@comcast.net>
Injection-Date: Fri, 16 Dec 2016 23:20:01 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:32894
Date: 2016-12-16T15:20:00-08:00
List-Id: <comp.lang.ada>

On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
=20
> (1) We don't use it for Generic_Elementary_Functions because it's impossi=
ble=20
> to be sure that the built-in instructions meet that Annex G accuracy=20
> requirements.
>
Correct
>=20
> (2) Intel's documentation way back when made it clear that you had to do=
=20
> argument reduction first (yourself). The instructions were not intended t=
o=20
> be accurate for values outside of +/- 2*PI (or something like that, I'm=
=20
> writing this from memory)
>.
Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
>=20
> (3) Argument reduction is always going to lose a lot of precision for lar=
ge=20
> values, when you start with a 64 bit value there isn't going to be much l=
eft=20
> if the value is large. Hard to blame that mathematical fact on the hardwa=
re.
>
The problem is the 66-bit value of Pi in the hardware. Look at the sine of =
a number close to Pi call it X.  The sine will be very close to X - Pi.  As=
suming 64 bit (double) precision for X, the mantissa will be a couple bits,=
 perhaps none, from X and the rest of the bits will come from bits 49-96 of=
 Pi.  Use 80-bit extended which the x87 instructions support, and you will =
be taking bits 65 to 128 from the value of Pi.

Could Intel have done the range reduction right?  Sure.  It would add a few=
 instructions to the micro code, and require a longer value for Pi.=20
>=20
> In any case, in general, I'd trust the Ada implementer to have looked at =
the=20
> issues and having come up with the best possible implementation on the=20
> hardware. They have a lot more at stake than any individual user (and a l=
ot=20
> more tools as well). If they're not using something, most likely it's=20
> because of a good reason or two or six. :-)
>
Creating a package which does the range reduction right, and passes small v=
alues through to the hardware instructions is not all that hard.  However, =
FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx registers unless=
 they have been used. Other threads running at the same time are unlikely t=
o be using these registers, but the OS will need to save and restore these =
registers when moving to and from your thread.

In other words, the actual user instructions executed for a x87 trig functi=
on may be fewer and faster than doing it all in 64/128 bit XMM/YMM register=
s, but the overhead on thread switches and interrupts will more than make u=
p for it.  The Elementary_Functions package will have to run in a 32-bit th=
read, so unless your entire program is in a 32-bit mode, you will pay this =
cost on every call.