From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,608f4b25931220fc X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-01-27 16:22:34 PST Path: archiver1.google.com!news2.google.com!news.maxwell.syr.edu!elnk-pas-nf1!elnk-nf2-pas!newsfeed.earthlink.net!attbi_feed3!attbi.com!attbi_s52.POSTED!not-for-mail From: tmoran@acm.org Newsgroups: comp.lang.ada Subject: Re: OT: Incremental statistics functions References: X-Newsreader: Tom's custom newsreader Message-ID: NNTP-Posting-Host: 67.161.24.134 X-Complaints-To: abuse@comcast.net X-Trace: attbi_s52 1075249353 67.161.24.134 (Wed, 28 Jan 2004 00:22:33 GMT) NNTP-Posting-Date: Wed, 28 Jan 2004 00:22:33 GMT Organization: Comcast Online Date: Wed, 28 Jan 2004 00:22:33 GMT Xref: archiver1.google.com comp.lang.ada:4958 Date: 2004-01-28T00:22:33+00:00 List-Id: > ... eventually the difference >between the sum of the squares and n times x-bar squared will be much >smaller than those two numbers. And if you are using floating point, >that ratio determines how much significance you have lost. But if you divide by N before the subtraction, the *average* of the squares does not grow, nor does the square of the average, so subtracting one from the other does not lead to increasing inaccuracy. The problem arises first in adding a new term to a sum (x**2 to the sum of squares, or x to the running sum of x). In the least demanding (non-trivial) case where the x's are a non-zero constant, this is like adding 1.0 to float'(N), for a loss of log2(N) bits. With 4 byte IEEE float, N = N+1.0, ie, *all* is lost, at N=2**24 or 16 million, so the sum becomes stuck, though the divisor increases, and the average thus heads toward zero. Even at N = one million you have only 4 good bits or one solid decimal digit. An iterative formula ((count-1)*sum+x)/count of course has exactly the same problem (plus it's more operations). But the iterative estimate at least is stuck at something close to the correct value.