From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.129.74.66 with SMTP id x63mr2977913ywa.140.1482055794962;
        Sun, 18 Dec 2016 02:09:54 -0800 (PST)
X-Received: by 10.157.17.167 with SMTP id v36mr579370otf.12.1482055794917;
 Sun, 18 Dec 2016 02:09:54 -0800 (PST)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!newspeer1.nac.net!border2.nntp.dca1.giganews.com!nntp.giganews.com!n6no1102434qtd.0!news-out.google.com!u18ni7246ita.0!nntp.google.com!75no1270495ite.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Sun, 18 Dec 2016 02:09:54 -0800 (PST)
In-Reply-To: <ce9bddec-64ba-40e9-8fc9-c70adc0555c1@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=199.203.251.52;
 posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <8d0f7f03-9324-4702-9100-d6b8a1f16fc5@googlegroups.com>
 <o31i2o$k1l$1@franka.jacob-sparre.dk>
 <ce9bddec-64ba-40e9-8fc9-c70adc0555c1@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a6734b29-5ffc-4938-bbc2-453f7ae92325@googlegroups.com>
Subject: Re: Trigonometric operations on x86 and x64 CPUs
From: already5chosen@yahoo.com
Injection-Date: Sun, 18 Dec 2016 10:09:54 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:32909
Date: 2016-12-18T02:09:54-08:00
List-Id: <comp.lang.ada>

On Saturday, December 17, 2016 at 1:20:03 AM UTC+2, Robert Eachus wrote:
> On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
> =20
> > (1) We don't use it for Generic_Elementary_Functions because it's impos=
sible=20
> > to be sure that the built-in instructions meet that Annex G accuracy=20
> > requirements.
> >
> Correct
> >=20
> > (2) Intel's documentation way back when made it clear that you had to d=
o=20
> > argument reduction first (yourself). The instructions were not intended=
 to=20
> > be accurate for values outside of +/- 2*PI (or something like that, I'm=
=20
> > writing this from memory)
> >.
> Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
> >=20
> > (3) Argument reduction is always going to lose a lot of precision for l=
arge=20
> > values, when you start with a 64 bit value there isn't going to be much=
 left=20
> > if the value is large. Hard to blame that mathematical fact on the hard=
ware.
> >
> The problem is the 66-bit value of Pi in the hardware. Look at the sine o=
f a number close to Pi call it X.  The sine will be very close to X - Pi.  =
Assuming 64 bit (double) precision for X, the mantissa will be a couple bit=
s, perhaps none, from X and the rest of the bits will come from bits 49-96 =
of Pi.  Use 80-bit extended which the x87 instructions support, and you wil=
l be taking bits 65 to 128 from the value of Pi.
>=20

That part ignores the problem of GIGO.
When we say that argument of sin is x then 99.9999% of the time it does not=
 mean that argument of sin is *exactly* x, but that it is most probably in =
range [x-0.5*ULP..x+0.5*ULP]. Or worse.
x87 implementation of sin(x) for abs(x) > 2*pi always returns the value in =
range [sin(x-0.5*ULP)..sin(x+0.5*ULP)] (actually, 2 times better than that =
for extended precision or 4096 times better for double precision) so  for 9=
9.9999% of us it is more than good enough.
Remaining 0.0001% of us are either mathematicians that hopefully know what =
they are doing of sensationalist that should be ignored by any sane designe=
r of RTL.

> Could Intel have done the range reduction right?  Sure.  It would add a f=
ew instructions to the micro code, and require a longer value for Pi.=20
> >=20
> > In any case, in general, I'd trust the Ada implementer to have looked a=
t the=20
> > issues and having come up with the best possible implementation on the=
=20
> > hardware. They have a lot more at stake than any individual user (and a=
 lot=20
> > more tools as well). If they're not using something, most likely it's=
=20
> > because of a good reason or two or six. :-)
> >
> Creating a package which does the range reduction right, and passes small=
 values through to the hardware instructions is not all that hard. =20

Such implementations are common, due the hype created by sensationalists.
But, for reasons stated above, I personally consider them as disservice for=
 overwhelming majority of users that would be served much better by using x=
87 implementation "as is".

> However, FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx regis=
ters >unless they have been used. Other threads running at the same time ar=
e >unlikely to be using these registers, but the OS will need to save and r=
estore < these registers when moving to and from your thread.
>=20
> In other words, the actual user instructions executed for a x87 trig func=
tion may be fewer and faster than doing it all in 64/128 bit XMM/YMM regist=
ers, but the overhead on thread switches and interrupts will more than make=
 up for it.=20

It depends on number of x87 instructions used between task switches.
If there are more than few dozens of normal arithmetic instructions or more=
 than couple of transcendental instructions then an overhead per instructio=
n will be negligible.

Also, as far as I am concerned, the best thing about x87 sin instruction is=
 not that it is faster than the competent AVX implementation (for vectoriza=
ble cases it is likely non-trivially slower than competent AVX implementati=
on, like one in IPP), but that, for small arguments (range [-pi/2..pi/2]), =
it is more precise. I.e. in competent AVX implementation one would expect c=
orrectly rounded results for 85-90% of small double precision arguments. On=
 the other hand, x87 implementation will produce correctly rounded results =
for something like 99.98% of small double precision arguments.

>The Elementary_Functions package will have to run in a 32-bit thread, so >=
unless your entire program is in a 32-bit mode, you will pay this cost on >=
every call.

That part makes no sense to me whatsoever.
I think that you have some misconception about x87 in long mode.
FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppose =
MAC OS/X too, although I don't know it for sure) goes, x87 in long mode jus=
t works.