From: Bill Findlay <yaldnif.w@blueyonder.co.uk>
Subject: Slow? Ada??
Date: Fri, 12 Jul 2013 02:01:29 +0100
Date: 2013-07-12T02:01:29+01:00 [thread overview]
Message-ID: <CE0513F9.32436%yaldnif.w@blueyonder.co.uk> (raw)
In-Reply-To: krkq9u$fo9$1@loke.gir.dk
I was amused to read this ...
On 11/07/2013 00:21, in article krkq9u$fo9$1@loke.gir.dk, "Randy Brukardt"
<randy@rrsoftware.com> wrote:
...
> For things like Sine, the best one can do is copy an implementation from a
> cookbook -- it makes no sense to even imagine trying to create a new one.
...for reasons nothing to do with that controversy.
When debugging Whetstone Algol under my KDF9 emulator I looked at the KDF9
assembly code programming of the arctan function and was completely baffled
by it, so I asked my former colleague, Michael Jamieson to investigate.
He successfully reconstructed the mathemetics behind it, which are very
non-obvious (to put it mildly).
A couple of days ago I idly wondered how well this 50 years old algorithm
would compare with modern implementations, so I wrote a test program to race
it against GNAT GPL 2013's arctan. I expected to find that 2013 would spank
1963's botty. To my astonishment, the old algorithm was faster.
Digging down, I found that Ada.Numerics.Long_Elementary_Functions.arctan
does quite a bit of argument range reduction and result error checking, but
finally invokes the "fpatan" opcode of the x86_64 CPU.
The 1963 algorithm, expressed in Ada 2012 but compiled with aggressive
optimization, is only 17% slower than that single CPU opcode!
Note also that, although it was designed for a machine with 48-bit floating
point, it gets the same accuracy as the hardware method on a machine with
64-bit floating point.
Here is the code, with the observed results included as commentary:
with Ada.Numerics.Long_Elementary_Functions;
with Ada.Text_IO;
with CPU_Timing;
with System.Machine_Code;
use Ada.Numerics.Long_Elementary_Functions;
use Ada.Text_IO;
use CPU_Timing;
use System.Machine_Code;
procedure arctan_test64 is
-- typical output:
-- R = 100000000 evaluations
-- checksum = 5.00000005000000E+07, loop time per repetition = 6 ns
-- checksum = 4.38824577044418E+07, P51V15 time per evaluation = 49 ns
-- checksum = 4.38824577044418E+07, arctan time per evaluation = 53 ns
-- checksum = 4.38824577044418E+07, fpatan time per evaluation = 41 ns
-- P51V15 is the KDF9 algorithm used in Whetstone Algol, ca. 1963
-- See http://www.findlayw.plus.com/KDF9/Arctan%20Paper.pdf
type vector is array (0..5) of Long_Float;
V : constant vector := ( 28165298.0 / 1479104550.0,
28165300.0 / 1479104550.0,
56327872.0 / 1479104550.0,
113397760.0 / 1479104550.0,
179306496.0 / 1479104550.0,
1073741824.0 / 1479104550.0);
function P51V15 (x : Long_Float) return Long_Float with Inline;
function P51V15 (x : Long_Float)
return Long_Float is
A : Long_Float := 1.0;
S : Long_Float := V(0);
B : vector;
begin
B(0) := sqrt(x*x + 1.0);
-- 4 AGM (Arithmetic-Geometric Mean) cycles give a set of values ...
for i in 0..3 loop
A := (A+B(i)) * 0.5;
B(i+1) := sqrt(A*B(i));
end loop;
-- ... that is subjected to a convergence acceleration process:
for i in 1..5 loop
S := S + V(i)*B(i-1);
end loop;
return x / S;
end P51V15;
-- this is the hardware arctan function of the x86_64 Core i7 CPU
function fpatan (X : Long_Float) return Long_Float with Inline;
function fpatan (X : Long_Float)
return Long_Float is
Result : Long_Float;
begin
Asm(Template => "fld1"
& Character'Val(10) -- LF
& Character'Val(9) -- HT
& "fpatan",
Outputs => Long_Float'Asm_Output ("=t", Result),
Inputs => Long_Float'Asm_Input ("0", X));
return Result;
end fpatan;
R : constant := 1e8; -- number of loop repetitions
function ns_per_rep (c : CPU_Usage_in_Microseconds)
return Natural is
begin
return Natural(c * 1e3 / R);
end ns_per_rep;
x : Long_Float;
c : CPU_Timer;
l : CPU_Usage_in_Microseconds;
t : CPU_Usage_in_Microseconds;
begin
Put_Line("R =" & Integer'Image(R) & " evaluations");
-- determine the fixed overhead time
x := 0.0;
Reset_Timer(c);
for i in 1..R loop
x := x + Long_Float(i)/Long_Float(R);
end loop;
l := User_CPU_Time_Since(c);
Put_Line("checksum =" & x'Img & ", "
& "loop time per repetition ="
& Natural'Image(ns_per_rep(l)) & " ns");
x := 0.0;
Reset_Timer(c);
for i in 1..R loop
x := x + P51V15(Long_Float(i)/Long_Float(R));
end loop;
t := User_CPU_Time_Since(c) - l;
Put_Line("checksum =" & x'Img & ", "
& "P51V15 time per evaluation ="
& Natural'Image(ns_per_rep(t)) & " ns");
x := 0.0;
Reset_Timer(c);
for i in 1..R loop
x := x + arctan(Long_Float(i)/Long_Float(R));
end loop;
t := User_CPU_Time_Since(c) - l;
Put_Line("checksum =" & x'Img & ", "
& "arctan time per evaluation ="
& Natural'Image(ns_per_rep(t)) & " ns");
x := 0.0;
Reset_Timer(c);
for i in 1..R loop
x := x + fpatan(Long_Float(i)/Long_Float(R));
end loop;
t := User_CPU_Time_Since(c) - l;
Put_Line("checksum =" & x'Img & ", "
& "fpatan time per evaluation ="
& Natural'Image(ns_per_rep(t)) & " ns");
end arctan_test64;
The difference in time between Ada.Numerics.Long_Elementary_Functions.arctan
and the fpatan function above is due to the afore-mentioned range reduction
and checking. The KDF9 programmers in 1963 were less punctillious about
such matters than we rightly expect Ada to be nowadays.
--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;
next prev parent reply other threads:[~2013-07-12 1:01 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-08 17:56 The future of Spark . Spark 2014 : a wreckage vincent.diemunsch
2013-07-08 19:24 ` Simon Wright
2013-07-08 20:59 ` Florian Weimer
2013-07-09 7:40 ` Dmitry A. Kazakov
2013-07-09 8:30 ` Georg Bauhaus
2013-07-09 8:58 ` Dmitry A. Kazakov
2013-07-09 10:27 ` G.B.
2013-07-09 11:13 ` G.B.
2013-07-09 15:14 ` Adam Beneschan
2013-07-09 22:51 ` Robert A Duff
2013-07-10 7:49 ` Dmitry A. Kazakov
2013-07-10 8:21 ` Georg Bauhaus
2013-07-08 21:32 ` Randy Brukardt
2013-07-09 7:28 ` Dmitry A. Kazakov
2013-07-09 11:29 ` Simon Wright
2013-07-09 12:22 ` Dmitry A. Kazakov
2013-07-09 20:37 ` Randy Brukardt
2013-07-10 10:03 ` Dmitry A. Kazakov
2013-07-10 23:21 ` Randy Brukardt
2013-07-11 7:44 ` Dmitry A. Kazakov
2013-07-11 22:28 ` Randy Brukardt
2013-07-12 8:02 ` Dmitry A. Kazakov
2013-07-12 21:16 ` Randy Brukardt
2013-07-14 10:19 ` Dmitry A. Kazakov
2013-07-14 15:57 ` Georg Bauhaus
2013-07-16 0:16 ` Randy Brukardt
2013-07-17 1:30 ` Shark8
2013-07-17 23:08 ` Randy Brukardt
2013-07-18 7:19 ` Dmitry A. Kazakov
2013-07-19 4:36 ` Randy Brukardt
2013-07-16 0:13 ` Randy Brukardt
2013-07-16 8:37 ` Dmitry A. Kazakov
2013-07-16 22:13 ` Randy Brukardt
2013-07-17 7:59 ` Dmitry A. Kazakov
2013-07-17 23:26 ` Randy Brukardt
2013-07-18 7:37 ` Dmitry A. Kazakov
2013-07-19 4:32 ` Randy Brukardt
2013-07-19 7:16 ` Dmitry A. Kazakov
2013-07-20 5:32 ` Randy Brukardt
2013-07-20 9:06 ` Dmitry A. Kazakov
2013-07-12 1:01 ` Bill Findlay [this message]
2013-07-09 7:57 ` Stefan.Lucks
2013-07-09 20:46 ` Randy Brukardt
2013-07-10 8:03 ` Stefan.Lucks
2013-07-10 12:46 ` Simon Wright
2013-07-10 23:10 ` Randy Brukardt
2013-07-11 19:23 ` Stefan.Lucks
2013-07-12 0:21 ` Randy Brukardt
2013-07-12 9:12 ` Stefan.Lucks
2013-07-12 20:47 ` Randy Brukardt
2013-08-05 5:43 ` Paul Rubin
2013-07-10 12:19 ` vincent.diemunsch
2013-07-10 16:02 ` Simon Wright
2013-07-11 0:36 ` Randy Brukardt
2013-07-11 6:19 ` Simon Wright
2013-07-11 23:11 ` Randy Brukardt
2013-07-11 10:10 ` vincent.diemunsch
2013-07-11 14:23 ` Dmitry A. Kazakov
2013-07-12 0:07 ` Randy Brukardt
2013-07-12 0:00 ` Randy Brukardt
2013-07-12 7:25 ` Simon Wright
2013-07-12 20:07 ` Randy Brukardt
2013-07-12 14:23 ` Robert A Duff
2013-07-12 20:14 ` Randy Brukardt
2013-07-11 23:50 ` Shark8
2013-07-08 23:18 ` Peter C. Chapin
2013-07-09 7:34 ` Stefan.Lucks
2013-07-09 15:21 ` Adam Beneschan
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox