comp.lang.ada
 help / color / mirror / Atom feed
From: Bill Findlay <yaldnif.w@blueyonder.co.uk>
Subject: Slow?  Ada??
Date: Fri, 12 Jul 2013 02:01:29 +0100
Date: 2013-07-12T02:01:29+01:00	[thread overview]
Message-ID: <CE0513F9.32436%yaldnif.w@blueyonder.co.uk> (raw)
In-Reply-To: krkq9u$fo9$1@loke.gir.dk

I was amused to read this ...

On 11/07/2013 00:21, in article krkq9u$fo9$1@loke.gir.dk, "Randy Brukardt"
<randy@rrsoftware.com> wrote:
...
> For things like Sine, the best one can do is copy an implementation from a
> cookbook -- it makes no sense to even imagine trying to create a new one.

...for reasons nothing to do with that controversy.

When debugging Whetstone Algol under my KDF9 emulator I looked at the KDF9
assembly code programming of the arctan function and was completely baffled
by it, so I asked my former colleague, Michael Jamieson to investigate.
He successfully reconstructed the mathemetics behind it, which are very
non-obvious (to put it mildly).

A couple of days ago I idly wondered how well this 50 years old algorithm
would compare with modern implementations, so I wrote a test program to race
it against GNAT GPL 2013's arctan.  I expected to find that 2013 would spank
1963's botty.  To my astonishment, the old algorithm was faster.

Digging down, I found that Ada.Numerics.Long_Elementary_Functions.arctan
does quite a bit of argument range reduction and result error checking, but
finally invokes the "fpatan" opcode of the x86_64 CPU.

The 1963 algorithm, expressed in Ada 2012 but compiled with aggressive
optimization, is only 17% slower than that single CPU opcode!

Note also that, although it was designed for a machine with 48-bit floating
point, it gets the same accuracy as the hardware method on a machine with
64-bit floating point.

Here is the code, with the observed results included as commentary:

with Ada.Numerics.Long_Elementary_Functions;
with Ada.Text_IO;
with CPU_Timing;
with System.Machine_Code;

use  Ada.Numerics.Long_Elementary_Functions;
use  Ada.Text_IO;
use  CPU_Timing;
use  System.Machine_Code;

procedure arctan_test64 is

   -- typical output:
   -- R = 100000000 evaluations
   -- checksum = 5.00000005000000E+07, loop   time per repetition = 6 ns
   -- checksum = 4.38824577044418E+07, P51V15 time per evaluation = 49 ns
   -- checksum = 4.38824577044418E+07, arctan time per evaluation = 53 ns
   -- checksum = 4.38824577044418E+07, fpatan time per evaluation = 41 ns

   -- P51V15 is the KDF9 algorithm used in Whetstone Algol, ca. 1963
   -- See http://www.findlayw.plus.com/KDF9/Arctan%20Paper.pdf

   type vector is array (0..5) of Long_Float;

   V : constant vector := (  28165298.0 / 1479104550.0,
                             28165300.0 / 1479104550.0,
                             56327872.0 / 1479104550.0,
                            113397760.0 / 1479104550.0,
                            179306496.0 / 1479104550.0,
                           1073741824.0 / 1479104550.0);

   function P51V15 (x : Long_Float) return Long_Float with Inline;

   function P51V15 (x : Long_Float)
   return Long_Float is
      A : Long_Float := 1.0;
      S : Long_Float := V(0);
      B : vector;
   begin
       B(0) := sqrt(x*x + 1.0);
       -- 4 AGM (Arithmetic-Geometric Mean) cycles give a set of values ...
       for i in 0..3 loop
          A := (A+B(i)) * 0.5;
          B(i+1) := sqrt(A*B(i));
       end loop;
       -- ... that is subjected to a convergence acceleration process:
       for i in 1..5 loop
          S := S + V(i)*B(i-1);
       end loop;
       return x / S;
   end P51V15;

   -- this is the hardware arctan function of the x86_64 Core i7 CPU
   function fpatan (X : Long_Float) return Long_Float with Inline;

   function fpatan (X : Long_Float)
   return Long_Float is
      Result  : Long_Float;
   begin
      Asm(Template  => "fld1"
                     & Character'Val(10) -- LF
                     & Character'Val(9)  -- HT
                     & "fpatan",
          Outputs   => Long_Float'Asm_Output ("=t", Result),
          Inputs    => Long_Float'Asm_Input  ("0", X));
      return Result;
   end fpatan;

   R : constant := 1e8; -- number of loop repetitions

   function ns_per_rep (c : CPU_Usage_in_Microseconds)
   return Natural is
   begin
      return Natural(c * 1e3 / R);
   end ns_per_rep;

   x : Long_Float;
   c : CPU_Timer;
   l : CPU_Usage_in_Microseconds;
   t : CPU_Usage_in_Microseconds;

begin
   Put_Line("R =" & Integer'Image(R) & " evaluations");

   -- determine the fixed overhead time
   x := 0.0;
   Reset_Timer(c);
   for i in 1..R loop
      x := x + Long_Float(i)/Long_Float(R);
   end loop;
   l := User_CPU_Time_Since(c);
   Put_Line("checksum =" & x'Img & ", "
          & "loop   time per repetition ="
          & Natural'Image(ns_per_rep(l)) & " ns");

   x := 0.0;
   Reset_Timer(c);
   for i in 1..R loop
      x := x + P51V15(Long_Float(i)/Long_Float(R));
   end loop;
   t := User_CPU_Time_Since(c) - l;
   Put_Line("checksum =" & x'Img & ", "
          & "P51V15 time per evaluation ="
          & Natural'Image(ns_per_rep(t)) & " ns");

   x := 0.0;
   Reset_Timer(c);
   for i in 1..R loop
      x := x + arctan(Long_Float(i)/Long_Float(R));
   end loop;
   t := User_CPU_Time_Since(c) - l;
   Put_Line("checksum =" & x'Img & ", "
          & "arctan time per evaluation ="
          & Natural'Image(ns_per_rep(t)) & " ns");

   x := 0.0;
   Reset_Timer(c);
   for i in 1..R loop
      x := x + fpatan(Long_Float(i)/Long_Float(R));
   end loop;
   t := User_CPU_Time_Since(c) - l;
   Put_Line("checksum =" & x'Img & ", "
          & "fpatan time per evaluation ="
          & Natural'Image(ns_per_rep(t)) & " ns");

end arctan_test64;

The difference in time between Ada.Numerics.Long_Elementary_Functions.arctan
and  the fpatan function above is due to the afore-mentioned range reduction
and checking.  The KDF9 programmers in 1963 were less punctillious about
such matters than we rightly expect Ada to be nowadays.

-- 
Bill Findlay
with blueyonder.co.uk;
use  surname & forename;



  parent reply	other threads:[~2013-07-12  1:01 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 17:56 The future of Spark . Spark 2014 : a wreckage vincent.diemunsch
2013-07-08 19:24 ` Simon Wright
2013-07-08 20:59 ` Florian Weimer
2013-07-09  7:40   ` Dmitry A. Kazakov
2013-07-09  8:30     ` Georg Bauhaus
2013-07-09  8:58       ` Dmitry A. Kazakov
2013-07-09 10:27         ` G.B.
2013-07-09 11:13         ` G.B.
2013-07-09 15:14     ` Adam Beneschan
2013-07-09 22:51     ` Robert A Duff
2013-07-10  7:49       ` Dmitry A. Kazakov
2013-07-10  8:21         ` Georg Bauhaus
2013-07-08 21:32 ` Randy Brukardt
2013-07-09  7:28   ` Dmitry A. Kazakov
2013-07-09 11:29     ` Simon Wright
2013-07-09 12:22       ` Dmitry A. Kazakov
2013-07-09 20:37     ` Randy Brukardt
2013-07-10 10:03       ` Dmitry A. Kazakov
2013-07-10 23:21         ` Randy Brukardt
2013-07-11  7:44           ` Dmitry A. Kazakov
2013-07-11 22:28             ` Randy Brukardt
2013-07-12  8:02               ` Dmitry A. Kazakov
2013-07-12 21:16                 ` Randy Brukardt
2013-07-14 10:19                   ` Dmitry A. Kazakov
2013-07-14 15:57                     ` Georg Bauhaus
2013-07-16  0:16                       ` Randy Brukardt
2013-07-17  1:30                         ` Shark8
2013-07-17 23:08                           ` Randy Brukardt
2013-07-18  7:19                             ` Dmitry A. Kazakov
2013-07-19  4:36                               ` Randy Brukardt
2013-07-16  0:13                     ` Randy Brukardt
2013-07-16  8:37                       ` Dmitry A. Kazakov
2013-07-16 22:13                         ` Randy Brukardt
2013-07-17  7:59                           ` Dmitry A. Kazakov
2013-07-17 23:26                             ` Randy Brukardt
2013-07-18  7:37                               ` Dmitry A. Kazakov
2013-07-19  4:32                                 ` Randy Brukardt
2013-07-19  7:16                                   ` Dmitry A. Kazakov
2013-07-20  5:32                                     ` Randy Brukardt
2013-07-20  9:06                                       ` Dmitry A. Kazakov
2013-07-12  1:01           ` Bill Findlay [this message]
2013-07-09  7:57   ` Stefan.Lucks
2013-07-09 20:46     ` Randy Brukardt
2013-07-10  8:03       ` Stefan.Lucks
2013-07-10 12:46         ` Simon Wright
2013-07-10 23:10         ` Randy Brukardt
2013-07-11 19:23           ` Stefan.Lucks
2013-07-12  0:21             ` Randy Brukardt
2013-07-12  9:12               ` Stefan.Lucks
2013-07-12 20:47                 ` Randy Brukardt
2013-08-05  5:43                 ` Paul Rubin
2013-07-10 12:19   ` vincent.diemunsch
2013-07-10 16:02     ` Simon Wright
2013-07-11  0:36     ` Randy Brukardt
2013-07-11  6:19       ` Simon Wright
2013-07-11 23:11         ` Randy Brukardt
2013-07-11 10:10       ` vincent.diemunsch
2013-07-11 14:23         ` Dmitry A. Kazakov
2013-07-12  0:07           ` Randy Brukardt
2013-07-12  0:00         ` Randy Brukardt
2013-07-12  7:25           ` Simon Wright
2013-07-12 20:07             ` Randy Brukardt
2013-07-12 14:23           ` Robert A Duff
2013-07-12 20:14             ` Randy Brukardt
2013-07-11 23:50       ` Shark8
2013-07-08 23:18 ` Peter C. Chapin
2013-07-09  7:34   ` Stefan.Lucks
2013-07-09 15:21     ` Adam Beneschan
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox