From: Adam Beneschan <adam@irvine.com>
Subject: Re: Cannot summate small float values
Date: Mon, 22 Nov 2010 08:30:48 -0800 (PST)
Date: 2010-11-22T08:30:48-08:00 [thread overview]
Message-ID: <c2017d52-1cdd-4c12-aa84-c3d80ea227f3@k11g2000vbf.googlegroups.com> (raw)
In-Reply-To: d21c568c-fddd-4da6-8131-43d18610397a@y23g2000yqd.googlegroups.com
On Nov 21, 1:06 pm, tolkamp <f.tolk...@gmail.com> wrote:
> On 20 nov, 14:49, Niklas Holsti <niklas.hol...@tidorum.invalid> wrote:
>
>
>
>
>
> > tolkamp wrote:
> > > When I summate Float values smaller then 1.0E-6 then the summation is
> > > not done.
>
> > > Code Example:
>
> > > X, Dx : Float;
> > > X := 0.0;
> > > Dx := 1.0E-7;
> > > lwhile X < 1.0 loop
> > > X = X + Dx;
> > > Float_Io.Put(X, 3,9,0); New_Line;
> > > end loop;
>
> > Certainly the addition is done. Your program (after some small syntactic
> > corrections) prints:
>
> > 0.000000100
> > 0.000000200
> > 0.000000300
> > 0.000000400
> > 0.000000500
> > 0.000000600
> > 0.000000700
>
> > and so on. If your program prints out something else, please show the
> > source code of your whole program, exactly as you compile and run it.
> > Don't re-type it into your message.
>
> > However, when X approaches 1.0, at some point the addition of 1.0E-7 may
> > be lost in round-off, since it is close to the precision limit of the
> > Float type, relative to 1.0. On my system (Debian, Gnat) the X variable
> > does reach 1.0 and the program stops.
>
> > What are you really trying to do? There are probably safer and more
> > accurate ways of doing it.
>
> > Here is the program that I used:
>
> > with Ada.Text_IO;
> > with Ada.Float_Text_IO;
>
> > procedure Sums
> > is
> > use Ada.Text_IO, Ada.Float_Text_IO;
> > X, Dx : Float;
> > begin
> > X := 0.0;
> > Dx := 1.0E-7;
> > while X < 1.0 loop
> > X := X + Dx;
> > Put(X, 3,9,0); New_Line;
> > end loop;
> > end Sums;
>
> Thank you your reaction.
> Using your procedure Sums I found out that when the start value of X <
> 0.24 the summation works correct with Dx = 1.0E-8
> When start X > 0.25 the summation remains 0.250000000.
You seem to lack a fundamental understanding of how floating-point
works.
In a 32-bit float, 23 of those bits are the "mantissa"; the rest are
used for the sign and exponent. If the 23 bits are bbbb---bbbb, then
the value represented by the 32-bit float is
[possibly negative] 1.bbbb---bbbb * (2**exp)
where exp is the exponent (represented by the other bits of the
float). The 1.bbbb---bbbb is in binary notation, so that the first
"b" represents 2**-1, the second is 2**-2, etc. and the last is
2**-23.
Since 2**-23 is about 1.192E-7, this means that the ratio between
1.0000----0000 and 1.0000----0001 will be 1 + 1.192E-7. So if you
start with the number 0.25, the smallest number greater than 0.25 that
can be represented is 0.25 + (0.25 * 1.192E-7). This last part is
2.98E-8, which is a lot more than the 1E-8 that you're trying to add,
which is why 1E-8 is too small to make a difference when added.
-- Adam
prev parent reply other threads:[~2010-11-22 16:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-20 12:47 Cannot summate small float values tolkamp
2010-11-20 13:49 ` Niklas Holsti
2010-11-21 21:06 ` tolkamp
2010-11-21 21:18 ` Niklas Holsti
2010-11-22 1:23 ` Gautier write-only
2010-11-22 8:35 ` Julian Leyh
2010-11-22 16:30 ` Adam Beneschan [this message]
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox