Re: fixed point vs floating point

comp.lang.ada
 help / color / mirror / Atom feed

From: gwinn@res.ray.com (Joe Gwinn)
Subject: Re: fixed point vs floating point
Date: 1997/11/25
Date: 1997-11-25T00:00:00+00:00	[thread overview]
Message-ID: <gwinn-2511971025020001@dh5055083.res.ray.com> (raw)
In-Reply-To: dewar.880417337@merv

In article <dewar.880417337@merv>, dewar@merv.cs.nyu.edu (Robert Dewar) wrote:

> There is no problem in implementing fast integer or fractional integer
> multiply and divide, using the same techniques as are used for floating-point.
> However, apart from a few people on soapboxes, there is no demand for such
> instructions, let alonge fast implementations of them (the Transputer is
> one of the very few machines that provides a fractional binary scaled
> multiply instruction).

Robert also said in some other message that floating point is faster than
integer simply because the hardware is designed that way.  This is
certainly true in general, and was true for us.  Performance follows
resources.

I don't know what came over me.  I never agree with Robert.  It won't
happen again.

Until perhaps 5 years ago, we always used fixed point (scaled binary) in
the code, regardless of language (except Ada83, actually; see below),
because we couldn't afford to design good enough floating point hardware
into the computers.  In those days, we built our own purpose-built
single-board computers (SBCs) for major realtime projects.  What we
couldn't afford was SBC real estate, not money for components.  The FP
hardware would always be in the original plan, but soon got pushed off the
board by some other more important hardware components, most often added
on-board memory.  Push come shove, more memory always won.

Now, the issue is gone, for two reasons.  First, the floating point
hardware is generally built into the processor chips, so there is no real
estate to save.  Second, we no longer build our own SBCs for standard
busses such as VME, we buy them, unless cornered.

The fixed point arithmetic in the Ada83 compilers that we used and tested
in those days didn't work well or even correctly in many cases, so we
never used it.  I don't offhand recall doing an Ada job that didn't use
floating point; Ada83 was mostly used on the larger computers, such as
VAXes and Silicon Graphics boxes, so the issue never came up.   The
exception may have been XD Ada from System Designers (where are they now?)
on the 68020.

So, why do we prefer floating point, speed aside?  Because it's easier to
program with, as scaling issues go away, and so it's much easier to get
right.  Remember, the traditional comparison was with manually-implemented
scaled binary.  The fact that floating point arithmetic is approximate
isn't much of an issue for most physics-based mathematical code, and those
places that are exceptions are still *much* less trouble to deal with than
the traditional scaled binary.

A random note:  When declaring an angle in Ada, it's necessary to declare
the range as something like -0.1 to 360.1 degrees (not 0 to 360 degrees),
to prevent the fuzziness of floating point arithmetic from causing
random-appearing constraint errors.  The fuzz can extend ever so slightly
over the line, causing unnecessary and unexpected constraint errors if one
tries to set the range too tightly.

With Ada95 (which I assume has fixed the Ada83 problems with fixed-point
arithmetic) on modern machines, use of fixed point has never been
proposed, simply because use of floating point is still simpler to
program, because scaling is handled automatically, and because more
programmers are familiar with floating point than with scaled fixed
point.  One less thing to worry about, and to get wrong.

As for speed, float versus fixed, in practice it seems to depend more on
the compiler and its level of excessive generalization and also level of
code optimization (and level of overly paranoid and/or redundant range
checking) than the raw hardware speed of arithmetic.  This is especially
true of math library elements such as Sine and Cosine and Square Root,
which, being precompiled, don't usually respond to changes in the
optimization settings.  Again, performance follows engineering resources.

In one recent example, Square Root was taking 60 microseconds on a 50-MHz
68060; this should take no more than 2 uS.  Where did the time go?  To
range check after range check.  To internal use of 96-bit extended floats
(which must be done in the 68060 FP instruction emulation library rather
than the CPU hardware), all to implement a 32-bit float function.  To
functions within functions, all checking everything.  Etc.  Same kind of
story with the transcendental functions, only worse.  One needs to read
the generated assembly code to know what a given compiler is up to.

We only need a few functions, specifically Sine, Cosine, Arc Tangent, and
Square Root, speed is very much of the essence, and we need only 10e-4 to
10e-5 (16-bit) accuracy anyway.  So, we will write our own versions of
these functions, in Ada, C, or even assembly, as determined by performance
tests.

Someone asked how to compute the Sine in fixed point; nobody answered. 
There are three traditional methods, polynomial approximations,
brute-force table lookup, or table lookup with interpolation.  These
methods are approximately equal in speed when it's all said and done,
although brute-force table lookup can trade memory for speed better than
polynomials, while polynomials can be faster than table lookup on some
machines because polynomials can avoid float-fix conversions and index
arithmetic and range checks (to prevent accessing beyond the ends of the
table).  Any of these methods can outperform the standard mathlib
functions precisely because they do exactly what's asked, and nothing
more.

As for the polynomial approximations, the bible is to this day is 
"Approximations for Digital Computers"; Cecil Hastings, Jr.; Princeton
University Press; 1955.  This has been a classic since its publication. 
In those days, the computers were small, so the programmers had to be very
focused on performance.  For computation of polynomials in real systems,
convert the published polynomials into Horner's form.  This is discussed
in many texts on numerical methods.  In short,  a + bX + cX^2 becomes a +
X(b+cX), which is faster to compute, and has better numerical properties
than the standard power-series form.

Joe Gwinn

next prev parent reply	other threads:[~1997-11-25  0:00 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1997-11-22  0:00 fixed point vs floating point Matthew Heaney
1997-11-22  0:00 ` Tucker Taft
1997-11-22  0:00   ` Robert Dewar
1997-11-22  0:00     ` Matthew Heaney
1997-11-23  0:00 ` Geert Bosch
1997-11-23  0:00   ` Tom Moran
1997-11-25  0:00     ` John A. Limpert
1997-11-25  0:00       ` Robert Dewar
1997-11-25  0:00       ` Robert Dewar
1997-11-23  0:00   ` Matthew Heaney
1997-11-23  0:00     ` Robert Dewar
1997-11-24  0:00       ` Herman Rubin
1997-11-24  0:00         ` Robert Dewar
1997-11-25  0:00           ` Joe Gwinn [this message]
1997-11-25  0:00             ` Robert Dewar
1997-11-25  0:00               ` Joe Gwinn
1997-11-25  0:00                 ` Robert Dewar
1997-11-26  0:00                   ` Joe Gwinn
1997-11-26  0:00                     ` Robert Dewar
1997-12-01  0:00                       ` Joe Gwinn
1997-12-01  0:00                         ` Robert Dewar
1997-12-01  0:00                           ` Joe Gwinn
1997-12-03  0:00                           ` robin
1997-11-25  0:00             ` Matthew Heaney
1997-11-26  0:00             ` William A Whitaker
1997-11-24  0:00     ` Geert Bosch
1997-11-24  0:00 ` Vince Del Vecchio
1997-11-24  0:00 ` Vince Del Vecchio
1997-12-03  0:00 ` robin
  -- strict thread matches above, loose matches on Subject: below --
2011-09-29 10:25 RasikaSrinivasan@gmail.com
2011-09-29 10:49 ` AdaMagica
2011-09-29 13:38   ` Martin
2011-09-30 10:17 ` Stephen Leake
2011-09-30 16:25   ` tmoran
2011-09-30 16:52     ` Dmitry A. Kazakov
2011-10-01 11:09     ` Stephen Leake
2011-09-30 19:26   ` tmoran
2011-09-30 22:31   ` tmoran
2011-10-01 13:37   ` RasikaSrinivasan@gmail.com
2011-10-02 14:19     ` Stephen Leake
1997-12-02  0:00 Robert Dewar
1997-12-02  0:00 ` Joe Gwinn
1997-12-02  0:00   ` Ken Garlington
1997-12-03  0:00     ` Joe Gwinn
1997-12-04  0:00       ` Robert Dewar
1997-12-04  0:00         ` Shmuel (Seymour J.) Metz
1997-12-02  0:00   ` Robert Dewar
1997-12-02  0:00     ` Matthew Heaney
1997-12-03  0:00       ` Robert Dewar
1997-12-03  0:00     ` robin
1997-12-03  0:00       ` Robert Dewar
1997-12-03  0:00     ` Shmuel (Seymour J.) Metz
1997-12-03  0:00       ` Matthew Heaney
1997-12-04  0:00         ` Shmuel (Seymour J.) Metz
1997-12-04  0:00           ` Robert Dewar
1997-12-03  0:00       ` Robert Dewar
1997-12-03  0:00       ` Robert Dewar
1997-12-03  0:00 ` robin
1997-11-28  0:00 tmoran
1997-11-28  0:00 ` Robert Dewar
1997-11-27  0:00 tmoran
1997-11-27  0:00 ` Robert Dewar
1997-11-29  0:00   ` Tarjei T. Jensen
     [not found] <9711221603.AA03295@nile.gnat.com>
1997-11-22  0:00 ` Ken Garlington

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox