Trigonometric operations on x86 and x64 CPUs

comp.lang.ada
 help / color / mirror / Atom feed

* Trigonometric operations on x86 and x64 CPUs
@ 2016-12-16  0:38 Robert Eachus
  2016-12-16 14:00 ` Luke A. Guest
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Robert Eachus @ 2016-12-16  0:38 UTC (permalink / raw)


A while back I expressed surprise that there were no versions of the Elementary_Functions package for x86/x64 packages which used the built-in trig functions of the x86 family.  It looked like an easy project but...

It turns out that the only trig functions available are the former x87 instructions which use a special stack.  Not a problem--except that these instructions are not reachable in (64-bit) "Long_Mode".  You have to have a separate compilation unit that switches to one of the 32-bit modes.  Okay... But you also need to support saving and restoring the x87 register stack.  There is a protocol for doing that, but 64-bit code can't do it.  You would think "No problem 64-bit programs can't execute x87 instructions anyway."  Except, that is exactly what you are trying to do!  So you need to call a compatibility mode proceedure every time the CPU does a 64-bit context switch. 

At this point, I was ready to give up on the project as a bad job.  I could modify the stack save/restore code on Linux, but even then I would have a distributed cost.  Then I ran across a page which explains that the Intel instructions use a 66-bit representation of Pi, and that results from the range reduction folding done in the software, is some values having only four accurate significant bits!  It looks better when expressed relative to the argument in radians--but often you want to find sine or cosine of an angle in degrees...

Anyway, turns out you don't want a package that uses the built-in trig functions.  What about square root, logs, and e^x?  They should be all right but you still have the nasty distributed overhead. Oh! And the stored values for e, and the logs to convert to base 2 or 10 are all 66 bit.  Maybe some future instruction set extension will provide correct versions of the trig functions without the 32-bit mode requirement.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-16  0:38 Trigonometric operations on x86 and x64 CPUs Robert Eachus
@ 2016-12-16 14:00 ` Luke A. Guest
  2016-12-16 20:16 ` Randy Brukardt
  2016-12-16 20:50 ` Vadim Godunko
  2 siblings, 0 replies; 17+ messages in thread
From: Luke A. Guest @ 2016-12-16 14:00 UTC (permalink / raw)


Robert Eachus <rieachus@comcast.net> wrote:
values for e, and the logs to convert to base 2 or 10 are all 66 bit. 
Maybe some future instruction set extension will provide correct versions
of the trig functions without the 32-bit mode requirement.
> 

Surely long mode has fpu instructions that can be used?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-16  0:38 Trigonometric operations on x86 and x64 CPUs Robert Eachus
  2016-12-16 14:00 ` Luke A. Guest
@ 2016-12-16 20:16 ` Randy Brukardt
  2016-12-16 23:20   ` Robert Eachus
  2016-12-16 20:50 ` Vadim Godunko
  2 siblings, 1 reply; 17+ messages in thread
From: Randy Brukardt @ 2016-12-16 20:16 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:8d0f7f03-9324-4702-9100-d6b8a1f16fc5@googlegroups.com...
>A while back I expressed surprise that there were no versions of the
>Elementary_Functions package for x86/x64 packages which used the
>built-in trig functions of the x86 family.  It looked like an easy project 
>but...

Why? Janus/Ada has always (*) had such a package. It's no problem with a 
32-bit compiler. (Still don't know why anyone would need more than that much 
address space anyway.)

(*) OK, not in the earliest integer-only compilers.

But:

(1) We don't use it for Generic_Elementary_Functions because it's impossible 
to be sure that the built-in instructions meet that Annex G accuracy 
requirements.

(2) Intel's documentation way back when made it clear that you had to do 
argument reduction first (yourself). The instructions were not intended to 
be accurate for values outside of +/- 2*PI (or something like that, I'm 
writing this from memory).

(3) Argument reduction is always going to lose a lot of precision for large 
values, when you start with a 64 bit value there isn't going to be much left 
if the value is large. Hard to blame that mathematical fact on the hardware.

(4) Intel doesn't seem to have done anything performance-wise for these 
instructions. Last time I checked, they were only a few times faster than 
the usual software approximation. Not enough to make them useful outside of 
the most time-critical application. (Perhaps some preconditions would help 
here, as a lot of the cost is checking that the values are in range before 
applying the hardware operation. Moving that to the call-site would allow it 
to be eliminated.)

In any case, in general, I'd trust the Ada implementer to have looked at the 
issues and having come up with the best possible implementation on the 
hardware. They have a lot more at stake than any individual user (and a lot 
more tools as well). If they're not using something, most likely it's 
because of a good reason or two or six. :-)

                                Randy.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-16  0:38 Trigonometric operations on x86 and x64 CPUs Robert Eachus
  2016-12-16 14:00 ` Luke A. Guest
  2016-12-16 20:16 ` Randy Brukardt
@ 2016-12-16 20:50 ` Vadim Godunko
  2 siblings, 0 replies; 17+ messages in thread
From: Vadim Godunko @ 2016-12-16 20:50 UTC (permalink / raw)


Don't understand what are you worry about.

I suppose all (too lazy to check ALL instructions) floating point instructions are supported in 64-bit code. GNAT's RTL do all necessary software preprocessing to obtain good precision. OS do all necessary context store/restore operations on task/process switching. You need just to instantiate Ada.Numerics.Generic_Elementary_Functions and use it. ;)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-16 20:16 ` Randy Brukardt
@ 2016-12-16 23:20   ` Robert Eachus
  2016-12-18 10:09     ` already5chosen
  0 siblings, 1 reply; 17+ messages in thread
From: Robert Eachus @ 2016-12-16 23:20 UTC (permalink / raw)


On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
 
> (1) We don't use it for Generic_Elementary_Functions because it's impossible 
> to be sure that the built-in instructions meet that Annex G accuracy 
> requirements.
>
Correct
> 
> (2) Intel's documentation way back when made it clear that you had to do 
> argument reduction first (yourself). The instructions were not intended to 
> be accurate for values outside of +/- 2*PI (or something like that, I'm 
> writing this from memory)
>.
Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
> 
> (3) Argument reduction is always going to lose a lot of precision for large 
> values, when you start with a 64 bit value there isn't going to be much left 
> if the value is large. Hard to blame that mathematical fact on the hardware.
>
The problem is the 66-bit value of Pi in the hardware. Look at the sine of a number close to Pi call it X.  The sine will be very close to X - Pi.  Assuming 64 bit (double) precision for X, the mantissa will be a couple bits, perhaps none, from X and the rest of the bits will come from bits 49-96 of Pi.  Use 80-bit extended which the x87 instructions support, and you will be taking bits 65 to 128 from the value of Pi.

Could Intel have done the range reduction right?  Sure.  It would add a few instructions to the micro code, and require a longer value for Pi. 
> 
> In any case, in general, I'd trust the Ada implementer to have looked at the 
> issues and having come up with the best possible implementation on the 
> hardware. They have a lot more at stake than any individual user (and a lot 
> more tools as well). If they're not using something, most likely it's 
> because of a good reason or two or six. :-)
>
Creating a package which does the range reduction right, and passes small values through to the hardware instructions is not all that hard.  However, FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx registers unless they have been used. Other threads running at the same time are unlikely to be using these registers, but the OS will need to save and restore these registers when moving to and from your thread.

In other words, the actual user instructions executed for a x87 trig function may be fewer and faster than doing it all in 64/128 bit XMM/YMM registers, but the overhead on thread switches and interrupts will more than make up for it.  The Elementary_Functions package will have to run in a 32-bit thread, so unless your entire program is in a 32-bit mode, you will pay this cost on every call.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-16 23:20   ` Robert Eachus
@ 2016-12-18 10:09     ` already5chosen
  2016-12-18 14:19       ` Robert Eachus
  2016-12-19 23:11       ` Randy Brukardt
  0 siblings, 2 replies; 17+ messages in thread
From: already5chosen @ 2016-12-18 10:09 UTC (permalink / raw)

On Saturday, December 17, 2016 at 1:20:03 AM UTC+2, Robert Eachus wrote:
> On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
>  
> > (1) We don't use it for Generic_Elementary_Functions because it's impossible 
> > to be sure that the built-in instructions meet that Annex G accuracy 
> > requirements.
> >
> Correct
> > 
> > (2) Intel's documentation way back when made it clear that you had to do 
> > argument reduction first (yourself). The instructions were not intended to 
> > be accurate for values outside of +/- 2*PI (or something like that, I'm 
> > writing this from memory)
> >.
> Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
> > 
> > (3) Argument reduction is always going to lose a lot of precision for large 
> > values, when you start with a 64 bit value there isn't going to be much left 
> > if the value is large. Hard to blame that mathematical fact on the hardware.
> >
> The problem is the 66-bit value of Pi in the hardware. Look at the sine of a number close to Pi call it X.  The sine will be very close to X - Pi.  Assuming 64 bit (double) precision for X, the mantissa will be a couple bits, perhaps none, from X and the rest of the bits will come from bits 49-96 of Pi.  Use 80-bit extended which the x87 instructions support, and you will be taking bits 65 to 128 from the value of Pi.
> 

That part ignores the problem of GIGO.
When we say that argument of sin is x then 99.9999% of the time it does not mean that argument of sin is *exactly* x, but that it is most probably in range [x-0.5*ULP..x+0.5*ULP]. Or worse.
x87 implementation of sin(x) for abs(x) > 2*pi always returns the value in range [sin(x-0.5*ULP)..sin(x+0.5*ULP)] (actually, 2 times better than that for extended precision or 4096 times better for double precision) so  for 99.9999% of us it is more than good enough.
Remaining 0.0001% of us are either mathematicians that hopefully know what they are doing of sensationalist that should be ignored by any sane designer of RTL.

> Could Intel have done the range reduction right?  Sure.  It would add a few instructions to the micro code, and require a longer value for Pi. 
> > 
> > In any case, in general, I'd trust the Ada implementer to have looked at the 
> > issues and having come up with the best possible implementation on the 
> > hardware. They have a lot more at stake than any individual user (and a lot 
> > more tools as well). If they're not using something, most likely it's 
> > because of a good reason or two or six. :-)
> >
> Creating a package which does the range reduction right, and passes small values through to the hardware instructions is not all that hard.  

Such implementations are common, due the hype created by sensationalists.
But, for reasons stated above, I personally consider them as disservice for overwhelming majority of users that would be served much better by using x87 implementation "as is".

> However, FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx registers >unless they have been used. Other threads running at the same time are >unlikely to be using these registers, but the OS will need to save and restore < these registers when moving to and from your thread.
> 
> In other words, the actual user instructions executed for a x87 trig function may be fewer and faster than doing it all in 64/128 bit XMM/YMM registers, but the overhead on thread switches and interrupts will more than make up for it. 

It depends on number of x87 instructions used between task switches.
If there are more than few dozens of normal arithmetic instructions or more than couple of transcendental instructions then an overhead per instruction will be negligible.

Also, as far as I am concerned, the best thing about x87 sin instruction is not that it is faster than the competent AVX implementation (for vectorizable cases it is likely non-trivially slower than competent AVX implementation, like one in IPP), but that, for small arguments (range [-pi/2..pi/2]), it is more precise. I.e. in competent AVX implementation one would expect correctly rounded results for 85-90% of small double precision arguments. On the other hand, x87 implementation will produce correctly rounded results for something like 99.98% of small double precision arguments.

>The Elementary_Functions package will have to run in a 32-bit thread, so >unless your entire program is in a 32-bit mode, you will pay this cost on >every call.

That part makes no sense to me whatsoever.
I think that you have some misconception about x87 in long mode.
FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppose MAC OS/X too, although I don't know it for sure) goes, x87 in long mode just works.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-18 10:09     ` already5chosen
@ 2016-12-18 14:19       ` Robert Eachus
  2016-12-18 15:45         ` hreba
  2016-12-18 15:47         ` already5chosen
  2016-12-19 23:11       ` Randy Brukardt
  1 sibling, 2 replies; 17+ messages in thread
From: Robert Eachus @ 2016-12-18 14:19 UTC (permalink / raw)

On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wrote:

> Such implementations are common, due the hype created by sensationalists. 
> But, for reasons stated above, I personally consider them as disservice for
> overwhelming majority of users that would be served much better by using x87
> implementation "as is".

I am one of those other users.  In paricular tangents near Pi/4 (90 degrees) have very large swings for small errors.  A bad value of Pi can result in a very large negative value for the tangent instead of a very large positive value. 

> I think that you have some misconception about x87 in long mode.
> FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppose MAC OS/X too, although I don't know it for sure) goes, x87 in long mode just works.

For now, maybe:

"Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel mode-only threads (a limitation that exists in the 32-bit version as well). The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode, but that they are deprecated and may cause compatibility problems in the future."  (From Wikipedia)

Obviously, a package which results in garbage if some other thread running on the same CPU core is using MMX or x87 instructions is not something I want my name attached to.  That means that I have to either figure out a test* which guarantees that the x87 registers are saved and restored correctly, or mark the package as 32-bit compatibility code.  

By the way, you may want to check which mode of code your compiler puts out.  For (AMD and Intel) x64 CPUs there are three: Legacy, Compatibility, and Long modes.  Long mode uses 64-bit addressing throughout. Compatibility mode allows users a 32-bit (4 Gigbyte) virtual address space per program/application.  A program can mix long and compatibility mode sections, and in practice most libraries are compatibility mode.

* The test can be in the body of the package and executed only once per program execution.  If there were a bit I could check... 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-18 14:19       ` Robert Eachus
@ 2016-12-18 15:45         ` hreba
  2016-12-18 15:47         ` already5chosen
  1 sibling, 0 replies; 17+ messages in thread
From: hreba @ 2016-12-18 15:45 UTC (permalink / raw)


On 12/18/2016 03:19 PM, Robert Eachus wrote:
> On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wrote:
>

> I am one of those other users.  In paricular tangents near Pi/4 (90 degrees) have very large swings for small errors.  A bad value of Pi can result in a very large negative value for the tangent instead of a very large positive value.

This seems to be a bad mathematical formulation of some problem, as it 
is often the case when a very high numerical precision of a numerical 
function is demanded. But tell me if I am wrong. Close to Pi/4 the 
following relation holds:

    tan (Pi/4 + phi) -> -1/phi, for phi -> 0,

so an approximation like this should be used for values close to Pi/4. 
Is the angle really known with a precision which justifies a special 
implementation of the trigonometric functions?
-- 
Frank Hrebabetzky		+49 / 6355 / 989 5070

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-18 14:19       ` Robert Eachus
  2016-12-18 15:45         ` hreba
@ 2016-12-18 15:47         ` already5chosen
  1 sibling, 0 replies; 17+ messages in thread
From: already5chosen @ 2016-12-18 15:47 UTC (permalink / raw)

On Sunday, December 18, 2016 at 4:19:12 PM UTC+2, Robert Eachus wrote:
> On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wrote:
> 
> > Such implementations are common, due the hype created by sensationalists. 
> > But, for reasons stated above, I personally consider them as disservice for
> > overwhelming majority of users that would be served much better by using x87
> > implementation "as is".
> 
> I am one of those other users. 

What do you mean?
Are you one of 99.9999% that will be better served by fptan instructions 'as is' or one of 0.0001% that can benefit from more precise range reduction?

> In paricular tangents near Pi/4 (90 degrees)

pi/2

> have very large swings for small errors.  A bad value of Pi can result in a very large negative value for the tangent instead of a very large positive value. 

How do you calculate argument of tangent?
Is it result of addition/subtraction? In this case you are crying over spilled milk, because addition/subtraction already killed the precision of the argument and precision of evaluation of tangent is already irrelevant.

Or is it a result of multiplication by PI? Then you are using wrong function. You should use tanPi(). Well I know, easier said than done. tanPi() is a part of IEEE standard for more than decade, but it is still not part of standard libraries for majority of popular languages.

So, what should you do if you want to calculate tan(x*pi) for value of x very close to 0.5 and tanPi() is not available?
The answer is obvious - calculate 1/tan((0.5-x)*pi).

Anyway, more precise argument reduction for tan() is not something that can help your case.

>   
> > I think that you have some misconception about x87 in long mode.
> > FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppose MAC OS/X too, although I don't know it for sure) goes, x87 in long mode just works.
> 
> For now, maybe:
> 
> "Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel mode-only threads (a limitation that exists in the 32-bit version as well). The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode, but that they are deprecated and may cause compatibility problems in the future."  (From Wikipedia)

Sounds like political bullshit.
Neither Microsoft nor other major OS vendors will ever stop saving/restoring x87 state on task switch on x64 platform. Because disadvantages of doing so are too big and *technical* (as opposed to political) advantages are almost non-existing.
They (Microsoft) surely want to maximize similarity between x64 and other platforms, esp aarch64, and they surely would like to kill x87 because of that, but they simply can't do it.

> 
> Obviously, a package which results in garbage if some other thread running on the same CPU core is using MMX or x87 instructions is not something I want my name attached to.  That means that I have to either figure out a test* which guarantees that the x87 registers are saved and restored correctly, or mark the package as 32-bit compatibility code.

It is always saved/restored. It will continue to be saved/restored for as long as x64 platform is supported.

> 
> By the way, you may want to check which mode of code your compiler puts out.  For (AMD and Intel) x64 CPUs there are three: Legacy, Compatibility, and Long modes.  Long mode uses 64-bit addressing throughout. Compatibility mode allows users a 32-bit (4 Gigbyte) virtual address space per program/application.  A program can mix long and compatibility mode sections, and in practice most libraries are compatibility mode.

I don't understand what are you talking about.
In-process libraries, like those that we discuss, always run in the same mode as the rest of the process. At least that's how it works on Windows/Linux/*BSD on x64.
I heard that IBM z/OS works differently, but in context of x87 that particular bit trivia does not appear relevant.

> 
> * The test can be in the body of the package and executed only once per program execution.  If there were a bit I could check...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-18 10:09     ` already5chosen
  2016-12-18 14:19       ` Robert Eachus
@ 2016-12-19 23:11       ` Randy Brukardt
  2016-12-19 23:49         ` already5chosen
  1 sibling, 1 reply; 17+ messages in thread
From: Randy Brukardt @ 2016-12-19 23:11 UTC (permalink / raw)


<already5chosen@yahoo.com> wrote in message 
news:a6734b29-5ffc-4938-bbc2-453f7ae92325@googlegroups.com...
...
>> Creating a package which does the range reduction right, and passes
>> small values through to the hardware instructions is not all that hard.
>
>Such implementations are common, due the hype created by sensationalists.

Yeah, like the people at Intel who document how to use these instructions. 
:-) They recommend (at least in the documents I have seen) to do argument 
reduction before using the instructions.

There's also the little matter of meeting the Ada language requirements. The 
people who put accuracy requirements on Ada numeric libraries might have 
been "sensationalists"", but those of us implementing Ada have to abide by 
those requirements. (Argubly, one of the advantages of Ada is that there are 
requirements, so you have some asssurance out of the box that the numerics 
will work predicably, no matter what target you use.)

                                 Randy.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-19 23:11       ` Randy Brukardt
@ 2016-12-19 23:49         ` already5chosen
  2016-12-20  5:27           ` Niklas Holsti
  0 siblings, 1 reply; 17+ messages in thread
From: already5chosen @ 2016-12-19 23:49 UTC (permalink / raw)

On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
> <already5chosen@yahoo.com> wrote in message 
> news:a6734b29-5ffc-4938-bbc2-453f7ae92325@googlegroups.com...
> ...
> >> Creating a package which does the range reduction right, and passes
> >> small values through to the hardware instructions is not all that hard.
> >
> >Such implementations are common, due the hype created by sensationalists.
> 
> Yeah, like the people at Intel who document how to use these instructions. 
> :-) 
> They recommend (at least in the documents I have seen) to do argument 
> reduction before using the instructions.
> 

I don't know what documents you had seen.
In "Intel 64 and IA-32 Architectures Software Developer’s Manual" Intel recommend to use argument reduction when the argument is out of supported range [-2**63..+2**63].
Of course trigs with double precision argument outside of range [-2**54..+2**54] are no more than equivalents of bad PRNG, so Intel's advice does not have a lot of practical significance.
IMHO, for x outside of [-2**63..+2**63] it would be o.k. for sin(x) and cos(x) to return any value in range [-1..1]. They are all equally meaningless.

> There's also the little matter of meeting the Ada language requirements. The 
> people who put accuracy requirements on Ada numeric libraries might have 
> been "sensationalists"", but those of us implementing Ada have to abide by 
> those requirements. 

Unfortunately, I don't know where to look for Ada language requirement.
Can you tell me what are requirements for sin/cos in Ada numeric library?

> (Argubly, one of the advantages of Ada is that there are 
> requirements, so you have some asssurance out of the box that the numerics 
> will work predicably, no matter what target you use.)
> 
>                                  Randy.

Yes, before IEEE-754 that was a significant advantage*.
Today - less so.

----------
* - I wonder what happened in real world on machines with bad arithmetic, like CRAY-1? Did not it lead to situation where significant parts of Ada numeric library were so slow that users concerned with speed were rolling their own alternatives.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-19 23:49         ` already5chosen
@ 2016-12-20  5:27           ` Niklas Holsti
  2016-12-20  8:37             ` Simon Wright
  2016-12-20 18:01             ` already5chosen
  0 siblings, 2 replies; 17+ messages in thread
From: Niklas Holsti @ 2016-12-20  5:27 UTC (permalink / raw)


On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
> On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
 >> ...
>> There's also the little matter of meeting the Ada language requirements. The
>> people who put accuracy requirements on Ada numeric libraries might have
>> been "sensationalists"", but those of us implementing Ada have to abide by
>> those requirements.
>
> Unfortunately, I don't know where to look for Ada language requirement.
> Can you tell me what are requirements for sin/cos in Ada numeric library?

RM section G.2.4, Accuracy Requirements for the Elementary Functions.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-20  5:27           ` Niklas Holsti
@ 2016-12-20  8:37             ` Simon Wright
  2016-12-20  9:12               ` G.B.
  2016-12-20 18:01             ` already5chosen
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Wright @ 2016-12-20  8:37 UTC (permalink / raw)


Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
>> On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
>>> ...
>>> There's also the little matter of meeting the Ada language
>>> requirements. The people who put accuracy requirements on Ada
>>> numeric libraries might have been "sensationalists"", but those of
>>> us implementing Ada have to abide by those requirements.
>>
>> Unfortunately, I don't know where to look for Ada language
>> requirement.  Can you tell me what are requirements for sin/cos in
>> Ada numeric library?
>
> RM section G.2.4, Accuracy Requirements for the Elementary Functions.

http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-G-2-4.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-20  8:37             ` Simon Wright
@ 2016-12-20  9:12               ` G.B.
  0 siblings, 0 replies; 17+ messages in thread
From: G.B. @ 2016-12-20  9:12 UTC (permalink / raw)


On 20/12/2016 09:37, Simon Wright wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

>> RM section G.2.4, Accuracy Requirements for the Elementary Functions.
>
> http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-G-2-4.html
>

Also in

file:///GNAT_INSTALL_PATH/share/doc/gnat/txt/arm-12.txt

Similarly played in other installations of Ada. The standard
is freely available and will be found on your disk, and
should be accessible from within your IDE.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-20  5:27           ` Niklas Holsti
  2016-12-20  8:37             ` Simon Wright
@ 2016-12-20 18:01             ` already5chosen
  2016-12-21  1:20               ` Randy Brukardt
  1 sibling, 1 reply; 17+ messages in thread
From: already5chosen @ 2016-12-20 18:01 UTC (permalink / raw)

On Tuesday, December 20, 2016 at 7:27:32 AM UTC+2, Niklas Holsti wrote:
> On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
> > On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
>  >> ...
> >> There's also the little matter of meeting the Ada language requirements. The
> >> people who put accuracy requirements on Ada numeric libraries might have
> >> been "sensationalists"", but those of us implementing Ada have to abide by
> >> those requirements.
> >
> > Unfortunately, I don't know where to look for Ada language requirement.
> > Can you tell me what are requirements for sin/cos in Ada numeric library?
> 
> RM section G.2.4, Accuracy Requirements for the Elementary Functions.
> 
> -- 
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .

Thank you.

At first glance it seems that Randy Brukardt is correct.
Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requirements want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53)..exact_sin(x)*(1+d*2**(-53))] where d=2.
x87 SIN instruction by itself achieves specified precision in smaller range (~ up to abs(x) < 2^14).

It means that conforming implementations of  Ada libraries forced to spends a significant effort doing reduction of stupidly big arguments of sin/cos/tan. On the other hand The Requirements allow rather poor precision for small arguments (d=2) where better precision (d=1.25 or d=1.125) is both desirable and not especially hard to achieve.

There is a consolation, too - the range reduction in the minimal range required by the standard can be done without big tables. And it can be implemented relatively quickly if the hardware features fused multiply-add.

On somewhat related note, 
It seems to me that forward trigonometric functions with 'Cycle' argument are underspecified. I'd like the requirements for extremely useful special cases of 'Cycle' == exact power of 2 to be more pronounced.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-20 18:01             ` already5chosen
@ 2016-12-21  1:20               ` Randy Brukardt
  2016-12-21  9:29                 ` already5chosen
  0 siblings, 1 reply; 17+ messages in thread
From: Randy Brukardt @ 2016-12-21  1:20 UTC (permalink / raw)


<already5chosen@yahoo.com> wrote in message 
news:c3725fa2-7eaa-4020-b0ae-6ddcfc2a3d1d@googlegroups.com...
...
>At first glance it seems that Randy Brukardt is correct.

It seems likely, even though today I'd be hard pressed to explain why in any 
detail. We're talking about work done in the mid-1990s.

>Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requirements
>want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53) ..
>exact_sin(x)*(1+d*2**(-53))] where d=2.
>x87 SIN instruction by itself achieves specified precision in smaller range
> (~ up to abs(x) < 2^14).
>
>It means that conforming implementations of  Ada libraries forced to spends
>a significant effort doing reduction of stupidly big arguments of 
>sin/cos/tan.

Also on doing sanity checks of operands (but of course that's generally a 
strength of Ada). A version of GEF (Generic_Elementary_Functions) that used 
Ada 2012 preconditions could avoid much of that overhead, and would be an 
advantage for really speed-critical operations.

>On the other hand The Requirements allow rather poor precision for small
>arguments (d=2) where better precision (d=1.25 or d=1.125) is both
>desirable and not especially hard to achieve.

Keep in mind that the requirements were written by a team of numerical 
analysis experts in 1992-4 based on the state of the art at that time. 
(Probably the Cody/Waite algorithms.) Those of us who maintain the Standard 
today don't really have the expertise to make any informed changes to these 
rules, so for the most part we keep our hands off (the worst thing we could 
do would be to mess up carefully considered rules - but we have fixed a 
number of obvious errors).

>There is a consolation, too - the range reduction in the minimal range 
>required
>by the standard can be done without big tables. And it can be implemented
>relatively quickly if the hardware features fused multiply-add.

My recollection is that it isn't even that bad if one writes the entire 
algorithm in Ada. (Since our compiler generally keeps intermediate results 
in the 80-bit extended format, we tended to write as much as possible as a 
single large expression. Probably could do that even better today with Ada 
2012 conditional expressions.)

>On somewhat related note, it seems to me that forward trigonometric 
>functions
>with 'Cycle' argument are underspecified. I'd like the requirements for 
>extremely
>useful special cases of 'Cycle' == exact power of 2 to be more pronounced.

It's certainly possible. We'd welcome someone with substantial numeric 
experience contributing to the ARG for Ada Standards maintenance. Most of us 
know enough to understand the Ada numeric model and when something is 
incorrect for it, but we don't have anyone very good at the subtle details.

                                           Randy.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Trigonometric operations on x86 and x64 CPUs
  2016-12-21  1:20               ` Randy Brukardt
@ 2016-12-21  9:29                 ` already5chosen
  0 siblings, 0 replies; 17+ messages in thread
From: already5chosen @ 2016-12-21  9:29 UTC (permalink / raw)

On Wednesday, December 21, 2016 at 3:20:52 AM UTC+2, Randy Brukardt wrote:
> <already5chosen@yahoo.com> wrote in message 
> news:c3725fa2-7eaa-4020-b0ae-6ddcfc2a3d1d@googlegroups.com...
> ...
> >At first glance it seems that Randy Brukardt is correct.
> 
> It seems likely, even though today I'd be hard pressed to explain why in any 
> detail. We're talking about work done in the mid-1990s.
> 
> >Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requirements
> >want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53) ..
> >exact_sin(x)*(1+d*2**(-53))] where d=2.
> >x87 SIN instruction by itself achieves specified precision in smaller range
> > (~ up to abs(x) < 2^14).
> >
> >It means that conforming implementations of  Ada libraries forced to spends
> >a significant effort doing reduction of stupidly big arguments of 
> >sin/cos/tan.
> 
> Also on doing sanity checks of operands 

That's not the same.
Unlike argument reduction, range/Inf/NaN checks are computationally cheap.

> (but of course that's generally a 
> strength of Ada).

In specific case of numeric libraries, sanity checks are not unique to Ada. Right now I can't recollect the language that does *not* do sanity checks in its numeric lib. Except, of course, for functions like sin/cos where some values of input arguments do not make physical sense, but nevertheless all inputs are legal.

> A version of GEF (Generic_Elementary_Functions) that used 
> Ada 2012 preconditions could avoid much of that overhead, and would be an 
> advantage for really speed-critical operations.
> 
> >On the other hand The Requirements allow rather poor precision for small
> >arguments (d=2) where better precision (d=1.25 or d=1.125) is both
> >desirable and not especially hard to achieve.
> 
> Keep in mind that the requirements were written by a team of numerical 
> analysis experts in 1992-4 based on the state of the art at that time. 
> (Probably the Cody/Waite algorithms.) Those of us who maintain the Standard 
> today don't really have the expertise to make any informed changes to these 
> rules, so for the most part we keep our hands off (the worst thing we could 
> do would be to mess up carefully considered rules - but we have fixed a 
> number of obvious errors).
> 
> >There is a consolation, too - the range reduction in the minimal range 
> >required
> >by the standard can be done without big tables. And it can be implemented
> >relatively quickly if the hardware features fused multiply-add.
> 
> My recollection is that it isn't even that bad if one writes the entire 
> algorithm in Ada. (Since our compiler generally keeps intermediate results 
> in the 80-bit extended format, we tended to write as much as possible as a 
> single large expression. Probably could do that even better today with Ada 
> 2012 conditional expressions.)

80-bit format helps when underlying machine has 80-bit format. Many machines have not.
Also, I suppose that even on x386/AMD64 machines that do support 80-bit format,  modern code generators by default use SSE/AVX registers rather than x87 registers.
FMA makes argument reduction easier even when extended precision is not available.

Besides, it seems to me that at upper edge of our problematic range 80-bit precision of intermediate results is insufficient for really simple range reduction. To make things really simple, one needs intermediates with 53+26.5=80 bits of mantissa. 80-bit extended precision has only 64 bits, so you'll still need to do the reduction by several carefully measured steps. FMA, on the other hand, when used smartly, gives you equivalent of intermediate with 106-bit mantissa.

> 
> >On somewhat related note, it seems to me that forward trigonometric 
> >functions
> >with 'Cycle' argument are underspecified. I'd like the requirements for 
> >extremely
> >useful special cases of 'Cycle' == exact power of 2 to be more pronounced.
> 
> It's certainly possible. We'd welcome someone with substantial numeric 
> experience contributing to the ARG for Ada Standards maintenance. Most of us 
> know enough to understand the Ada numeric model and when something is 
> incorrect for it, but we don't have anyone very good at the subtle details.
> 
>                                            Randy.

I am certainly not enough of an expert.
All I can say is that when Cycle==1 then Sin(x, 1) becomes an equivalent of IEEE-754 sinPi(x), so, may be, you can copy IEEE requirements for sinPi() if not in general case than, at least for IEEE-754 based platforms.
On more general note, asking IEEE-754 committee for help sounds like a good idea.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-12-21  9:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16  0:38 Trigonometric operations on x86 and x64 CPUs Robert Eachus
2016-12-16 14:00 ` Luke A. Guest
2016-12-16 20:16 ` Randy Brukardt
2016-12-16 23:20   ` Robert Eachus
2016-12-18 10:09     ` already5chosen
2016-12-18 14:19       ` Robert Eachus
2016-12-18 15:45         ` hreba
2016-12-18 15:47         ` already5chosen
2016-12-19 23:11       ` Randy Brukardt
2016-12-19 23:49         ` already5chosen
2016-12-20  5:27           ` Niklas Holsti
2016-12-20  8:37             ` Simon Wright
2016-12-20  9:12               ` G.B.
2016-12-20 18:01             ` already5chosen
2016-12-21  1:20               ` Randy Brukardt
2016-12-21  9:29                 ` already5chosen
2016-12-16 20:50 ` Vadim Godunko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox