From: "Robert I. Eachus" <rieachus@comcast.net>
Subject: Re: OT: Incremental statistics functions
Date: Mon, 26 Jan 2004 22:37:08 -0500
Date: 2004-01-26T22:37:08-05:00 [thread overview]
Message-ID: <MMednVmf58F6QYjd4p2dnA@comcast.com> (raw)
In-Reply-To: <86k73e9uzk.fsf@lucretia.kaos>
Mats Karlssohn wrote:
> Basically I don't want to keep a (limited) buffer of samples but would like
> to add the values one at a time when they are calculated.
>
> Probably I googled bad, since I didn't find anything I could understand.
>
> Any suggestions please?
Yes, be very, very careful. The problem/issue is that for the general
case computing the mean and then the standard deviation has very good
mathematical properties. Computing them incrementally does not. If you
have an 'expected' value for the mean (call it u) you can use it to
accumulate (Xi-u)**2 and if u is 'close' to the mean, the numerical
characteristics will be fine. (To put that technically, if your
estimate u, is less than half a standard deviation from the sample mean
you should have nothing to worry about.)
To put all this in perspective, say you are monitoring daily low
temperatures in New Hampshire in January. There is no problem, the day
to day differences are larger than difference between zero and the
average. Try the same thing in Iraq in July, and you won't do as well.
Even if you are monitoring low temperatures, eventually the difference
between the sum of the squares and n times x-bar squared will be much
smaller than those two numbers. And if you are using floating point,
that ratio determines how much significance you have lost. Your
estimate of the variance or standard deviation may have just a few
significant bits (one significant digit) or worse, no significant bits
or digits.
You can guard against this to some extent by using IEEE double or
extended for computing the sum of the squares, but that only postpones
when you run out of significance, it doesn't prevent it.
--
Robert I. Eachus
"The war on terror is a different kind of war, waged capture by capture,
cell by cell, and victory by victory. Our security is assured by our
perseverance and by our sure belief in the success of liberty." --
George W. Bush
next prev parent reply other threads:[~2004-01-27 3:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
2004-01-27 1:50 ` tmoran
2004-01-27 2:13 ` Stephen Leake
2004-01-27 3:37 ` Robert I. Eachus [this message]
2004-01-27 4:56 ` tmoran
2004-01-28 0:22 ` tmoran
2004-01-28 19:56 ` OT: large sums; was " tmoran
2004-01-27 3:39 ` Steve
2004-01-27 16:22 ` Robert I. Eachus
2004-01-27 15:48 ` Joachim Schr�er
2004-01-28 0:22 ` tmoran
2004-01-27 23:44 ` OT: " Mats Karlssohn
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox