comp.lang.ada
 help / color / mirror / Atom feed
From: "Robert I. Eachus" <rieachus@comcast.net>
Subject: Re: OT: Incremental statistics functions
Date: Mon, 26 Jan 2004 22:37:08 -0500
Date: 2004-01-26T22:37:08-05:00	[thread overview]
Message-ID: <MMednVmf58F6QYjd4p2dnA@comcast.com> (raw)
In-Reply-To: <86k73e9uzk.fsf@lucretia.kaos>

Mats Karlssohn wrote:

> Basically I don't want to keep a (limited) buffer of samples but would like
> to add the values one at a time when they are calculated.
> 
> Probably I googled bad, since I didn't find anything I could understand.
> 
> Any suggestions please?

Yes, be very, very careful.  The problem/issue is that for the general 
case computing the mean and then the standard deviation has very good 
mathematical properties.  Computing them incrementally does not.  If you 
have an 'expected' value for the mean (call it u) you can use it to 
accumulate (Xi-u)**2 and if u is 'close' to the mean, the numerical 
characteristics will be fine.  (To put that technically, if your 
estimate u, is less than half a standard deviation from the sample mean 
you should have nothing to worry about.)

To put all this in perspective, say you are monitoring daily low 
temperatures in New Hampshire in January.  There is no problem, the day 
to day differences are larger than difference between zero and the 
average.  Try the same thing in Iraq in July, and you won't do as well. 
  Even if you are monitoring low temperatures, eventually the difference 
between the sum of the squares and n times x-bar squared will be much 
smaller than those two numbers. And if you are using floating point, 
that ratio determines how much significance you have lost.  Your 
estimate of the variance or standard deviation may have just a few 
significant bits (one significant digit) or worse, no significant bits 
or digits.

You can guard against this to some extent by using IEEE double or 
extended for computing the sum of the squares, but that only postpones 
when you run out of significance, it doesn't prevent it.


-- 
                                           Robert I. Eachus

"The war on terror is a different kind of war, waged capture by capture, 
cell by cell, and victory by victory. Our security is assured by our 
perseverance and by our sure belief in the success of liberty." -- 
George W. Bush




  parent reply	other threads:[~2004-01-27  3:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
2004-01-27  1:50 ` tmoran
2004-01-27  2:13 ` Stephen Leake
2004-01-27  3:37 ` Robert I. Eachus [this message]
2004-01-27  4:56   ` tmoran
2004-01-28  0:22   ` tmoran
2004-01-28 19:56     ` OT: large sums; was " tmoran
2004-01-27  3:39 ` Steve
2004-01-27 16:22   ` Robert I. Eachus
2004-01-27 15:48 ` Joachim Schr�er
2004-01-28  0:22   ` tmoran
2004-01-27 23:44 ` OT: " Mats Karlssohn
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox