Re: Cannot summate small float values

comp.lang.ada
 help / color / mirror / Atom feed

From: Adam Beneschan <adam@irvine.com>
Subject: Re: Cannot summate small float values
Date: Mon, 22 Nov 2010 08:30:48 -0800 (PST)
Date: 2010-11-22T08:30:48-08:00	[thread overview]
Message-ID: <c2017d52-1cdd-4c12-aa84-c3d80ea227f3@k11g2000vbf.googlegroups.com> (raw)
In-Reply-To: d21c568c-fddd-4da6-8131-43d18610397a@y23g2000yqd.googlegroups.com

On Nov 21, 1:06 pm, tolkamp <f.tolk...@gmail.com> wrote:
> On 20 nov, 14:49, Niklas Holsti <niklas.hol...@tidorum.invalid> wrote:
>
>
>
>
>
> > tolkamp wrote:
> > > When I summate Float values smaller then 1.0E-6 then the summation is
> > > not done.
>
> > > Code Example:
>
> > > X, Dx : Float;
> > > X := 0.0;
> > > Dx := 1.0E-7;
> > > lwhile X <  1.0 loop
> > >     X = X + Dx;
> > >     Float_Io.Put(X, 3,9,0); New_Line;
> > > end loop;
>
> > Certainly the addition is done. Your program (after some small syntactic
> > corrections) prints:
>
> >    0.000000100
> >    0.000000200
> >    0.000000300
> >    0.000000400
> >    0.000000500
> >    0.000000600
> >    0.000000700
>
> > and so on. If your program prints out something else, please show the
> > source code of your whole program, exactly as you compile and run it.
> > Don't re-type it into your message.
>
> > However, when X approaches 1.0, at some point the addition of 1.0E-7 may
> > be lost in round-off, since it is close to the precision limit of the
> > Float type, relative to 1.0. On my system (Debian, Gnat) the X variable
> > does reach 1.0 and the program stops.
>
> > What are you really trying to do? There are probably safer and more
> > accurate ways of doing it.
>
> > Here is the program that I used:
>
> > with Ada.Text_IO;
> > with Ada.Float_Text_IO;
>
> > procedure Sums
> > is
> >     use Ada.Text_IO, Ada.Float_Text_IO;
> >     X, Dx : Float;
> > begin
> >     X := 0.0;
> >     Dx := 1.0E-7;
> >     while X <  1.0 loop
> >         X := X + Dx;
> >         Put(X, 3,9,0); New_Line;
> >     end loop;
> > end Sums;
>
> Thank you your reaction.
> Using your procedure Sums I found out that when the start value of X <
> 0.24 the summation works correct with Dx = 1.0E-8
> When start X > 0.25 the summation remains 0.250000000.

You seem to lack a fundamental understanding of how floating-point
works.

In a 32-bit float, 23 of those bits are the "mantissa"; the rest are
used for the sign and exponent.  If the 23 bits are bbbb---bbbb, then
the value represented by the 32-bit float is

  [possibly negative] 1.bbbb---bbbb * (2**exp)

where exp is the exponent (represented by the other bits of the
float).  The 1.bbbb---bbbb is in binary notation, so that the first
"b" represents 2**-1, the second is 2**-2, etc. and the last is
2**-23.

Since 2**-23 is about 1.192E-7, this means that the ratio between
1.0000----0000 and 1.0000----0001 will be 1 + 1.192E-7.  So if you
start with the number 0.25, the smallest number greater than 0.25 that
can be represented is 0.25 + (0.25 * 1.192E-7).  This last part is
2.98E-8, which is a lot more than the 1E-8 that you're trying to add,
which is why 1E-8 is too small to make a difference when added.

                               -- Adam

     prev parent reply	other threads:[~2010-11-22 16:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-20 12:47 Cannot summate small float values tolkamp
2010-11-20 13:49 ` Niklas Holsti
2010-11-21 21:06   ` tolkamp
2010-11-21 21:18     ` Niklas Holsti
2010-11-22  1:23     ` Gautier write-only
2010-11-22  8:35     ` Julian Leyh
2010-11-22 16:30     ` Adam Beneschan [this message]

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox