From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,86d4e48d5a9b02a1
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!postnews.google.com!k11g2000vbf.googlegroups.com!not-for-mail
From: Adam Beneschan <adam@irvine.com>
Newsgroups: comp.lang.ada
Subject: Re: Cannot summate small float values
Date: Mon, 22 Nov 2010 08:30:48 -0800 (PST)
Organization: http://groups.google.com
Message-ID: 
 <c2017d52-1cdd-4c12-aa84-c3d80ea227f3@k11g2000vbf.googlegroups.com>
References: 
 <b93fb676-b1f7-4b1c-966a-ef5076ad173f@k22g2000yqh.googlegroups.com>
 <8kq1usFojgU1@mid.individual.net>
 <d21c568c-fddd-4da6-8131-43d18610397a@y23g2000yqd.googlegroups.com>
NNTP-Posting-Host: 66.126.103.122
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1290443448 15156 127.0.0.1 (22 Nov 2010 16:30:48
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 22 Nov 2010 16:30:48 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: k11g2000vbf.googlegroups.com; posting-host=66.126.103.122;
 posting-account=duW0ogkAAABjRdnxgLGXDfna0Gc6XqmQ
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64;
 SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR
 3.5.30729; .NET CLR 3.0.30618; .NET4.0C),gzip(gfe)
Xref: g2news1.google.com comp.lang.ada:15636
Date: 2010-11-22T08:30:48-08:00
List-Id: <comp.lang.ada>

On Nov 21, 1:06=A0pm, tolkamp <f.tolk...@gmail.com> wrote:
> On 20 nov, 14:49, Niklas Holsti <niklas.hol...@tidorum.invalid> wrote:
>
>
>
>
>
> > tolkamp wrote:
> > > When I summate Float values smaller then 1.0E-6 then the summation is
> > > not done.
>
> > > Code Example:
>
> > > X, Dx : Float;
> > > X :=3D 0.0;
> > > Dx :=3D 1.0E-7;
> > > lwhile X < =A01.0 loop
> > > =A0 =A0 X =3D X + Dx;
> > > =A0 =A0 Float_Io.Put(X, 3,9,0); New_Line;
> > > end loop;
>
> > Certainly the addition is done. Your program (after some small syntacti=
c
> > corrections) prints:
>
> > =A0 =A00.000000100
> > =A0 =A00.000000200
> > =A0 =A00.000000300
> > =A0 =A00.000000400
> > =A0 =A00.000000500
> > =A0 =A00.000000600
> > =A0 =A00.000000700
>
> > and so on. If your program prints out something else, please show the
> > source code of your whole program, exactly as you compile and run it.
> > Don't re-type it into your message.
>
> > However, when X approaches 1.0, at some point the addition of 1.0E-7 ma=
y
> > be lost in round-off, since it is close to the precision limit of the
> > Float type, relative to 1.0. On my system (Debian, Gnat) the X variable
> > does reach 1.0 and the program stops.
>
> > What are you really trying to do? There are probably safer and more
> > accurate ways of doing it.
>
> > Here is the program that I used:
>
> > with Ada.Text_IO;
> > with Ada.Float_Text_IO;
>
> > procedure Sums
> > is
> > =A0 =A0 use Ada.Text_IO, Ada.Float_Text_IO;
> > =A0 =A0 X, Dx : Float;
> > begin
> > =A0 =A0 X :=3D 0.0;
> > =A0 =A0 Dx :=3D 1.0E-7;
> > =A0 =A0 while X < =A01.0 loop
> > =A0 =A0 =A0 =A0 X :=3D X + Dx;
> > =A0 =A0 =A0 =A0 Put(X, 3,9,0); New_Line;
> > =A0 =A0 end loop;
> > end Sums;
>
> Thank you your reaction.
> Using your procedure Sums I found out that when the start value of X <
> 0.24 the summation works correct with Dx =3D 1.0E-8
> When start X > 0.25 the summation remains 0.250000000.

You seem to lack a fundamental understanding of how floating-point
works.

In a 32-bit float, 23 of those bits are the "mantissa"; the rest are
used for the sign and exponent.  If the 23 bits are bbbb---bbbb, then
the value represented by the 32-bit float is

  [possibly negative] 1.bbbb---bbbb * (2**exp)

where exp is the exponent (represented by the other bits of the
float).  The 1.bbbb---bbbb is in binary notation, so that the first
"b" represents 2**-1, the second is 2**-2, etc. and the last is
2**-23.

Since 2**-23 is about 1.192E-7, this means that the ratio between
1.0000----0000 and 1.0000----0001 will be 1 + 1.192E-7.  So if you
start with the number 0.25, the smallest number greater than 0.25 that
can be represented is 0.25 + (0.25 * 1.192E-7).  This last part is
2.98E-8, which is a lot more than the 1E-8 that you're trying to add,
which is why 1E-8 is too small to make a difference when added.

                               -- Adam