comp.lang.ada
 help / color / mirror / Atom feed
* OT: Incremental statistics functions
@ 2004-01-26 22:59 Mats Karlssohn
  2004-01-27  1:50 ` tmoran
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mats Karlssohn @ 2004-01-26 22:59 UTC (permalink / raw)


Hi

I'm trying to find algorithms to calculate standard-deviation and variance
_without_ keeping a buffer of the samples. I'm not really shure if I'm right
to call them incremental algorithms but I couldn't find a better expression.
Basically I don't want to keep a (limited) buffer of samples but would like
to add the values one at a time when they are calculated.

Probably I googled bad, since I didn't find anything I could understand.

Any suggestions please?

-- 
Mats Karlssohn (SM5TFX), sm5tfx@telia.com
"Without mistakes there is no forgiving, without forgiving there is no love"



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
@ 2004-01-27  1:50 ` tmoran
  2004-01-27  2:13 ` Stephen Leake
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: tmoran @ 2004-01-27  1:50 UTC (permalink / raw)


>I'm trying to find algorithms to calculate standard-deviation and variance
>_without_ keeping a buffer of the samples. I'm not really shure if I'm right
  I don't think I understand what you want.  You need only keep the
current count, sum, and sum of squares, as long as N is small and loss of
accuracy when adding a small number to a large sum isn't a problem.  If
that's all you need, I can send you my q&d regression package with spec:

package Regress is
  type Accumulator_Type is private;

  procedure Reset(Acc : in out Accumulator_Type);

  procedure Datum(X, Y : in     Float;
                  Acc  : in out Accumulator_Type);

  function N(Acc : Accumulator_Type) return Natural;

  function Avg_X(Acc : Accumulator_Type) return Float;

  function Avg_Y(Acc : Accumulator_Type) return Float;

  function Sd_X(Acc : Accumulator_Type) return Float;

  function Sd_Y(Acc : Accumulator_Type) return Float;

  function R(Acc : Accumulator_Type) return Float;

  function A(Acc : Accumulator_Type) return Float;

  function B(Acc : Accumulator_Type) return Float;
private
  type Accumulator_Type is record
    N : Natural := 0;
    Sumx, Sumy, Sumxy, Sumx2, Sumy2 : Float := 0.0;
  end record;
end Regress;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
  2004-01-27  1:50 ` tmoran
@ 2004-01-27  2:13 ` Stephen Leake
  2004-01-27  3:37 ` Robert I. Eachus
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Stephen Leake @ 2004-01-27  2:13 UTC (permalink / raw)
  To: comp.lang.ada

Mats Karlssohn <sm5tfx@telia.com> writes:

> I'm trying to find algorithms to calculate standard-deviation and variance
> _without_ keeping a buffer of the samples. I'm not really shure if I'm right
> to call them incremental algorithms but I couldn't find a better expression.
> Basically I don't want to keep a (limited) buffer of samples but would like
> to add the values one at a time when they are calculated.

I'm not sure what you mean by "variance", but give SAL a try:

http://www.toadmail.com/~ada_wizard/ada/sal.html

The package sal.gen_math.gen_stats accumulates mean, min, max, and
standard deviation one item at a time.

-- 
-- Stephe




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
  2004-01-27  1:50 ` tmoran
  2004-01-27  2:13 ` Stephen Leake
@ 2004-01-27  3:37 ` Robert I. Eachus
  2004-01-27  4:56   ` tmoran
  2004-01-28  0:22   ` tmoran
  2004-01-27  3:39 ` Steve
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Robert I. Eachus @ 2004-01-27  3:37 UTC (permalink / raw)


Mats Karlssohn wrote:

> Basically I don't want to keep a (limited) buffer of samples but would like
> to add the values one at a time when they are calculated.
> 
> Probably I googled bad, since I didn't find anything I could understand.
> 
> Any suggestions please?

Yes, be very, very careful.  The problem/issue is that for the general 
case computing the mean and then the standard deviation has very good 
mathematical properties.  Computing them incrementally does not.  If you 
have an 'expected' value for the mean (call it u) you can use it to 
accumulate (Xi-u)**2 and if u is 'close' to the mean, the numerical 
characteristics will be fine.  (To put that technically, if your 
estimate u, is less than half a standard deviation from the sample mean 
you should have nothing to worry about.)

To put all this in perspective, say you are monitoring daily low 
temperatures in New Hampshire in January.  There is no problem, the day 
to day differences are larger than difference between zero and the 
average.  Try the same thing in Iraq in July, and you won't do as well. 
  Even if you are monitoring low temperatures, eventually the difference 
between the sum of the squares and n times x-bar squared will be much 
smaller than those two numbers. And if you are using floating point, 
that ratio determines how much significance you have lost.  Your 
estimate of the variance or standard deviation may have just a few 
significant bits (one significant digit) or worse, no significant bits 
or digits.

You can guard against this to some extent by using IEEE double or 
extended for computing the sum of the squares, but that only postpones 
when you run out of significance, it doesn't prevent it.


-- 
                                           Robert I. Eachus

"The war on terror is a different kind of war, waged capture by capture, 
cell by cell, and victory by victory. Our security is assured by our 
perseverance and by our sure belief in the success of liberty." -- 
George W. Bush




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
                   ` (2 preceding siblings ...)
  2004-01-27  3:37 ` Robert I. Eachus
@ 2004-01-27  3:39 ` Steve
  2004-01-27 16:22   ` Robert I. Eachus
  2004-01-27 15:48 ` Joachim Schr�er
  2004-01-27 23:44 ` OT: " Mats Karlssohn
  5 siblings, 1 reply; 12+ messages in thread
From: Steve @ 2004-01-27  3:39 UTC (permalink / raw)


An incremental algorithm for standard deviation is easy.  An incremental
algorithm for variance (as far as I know) is not possibile.  It is my
understanding that standard deviation is really an approximation of variance
that is easier to calculate.

As I recall (it's been a while):
  Variance is the average deviation from the average value of a set of
values.

To find variance you must first find the average of all values.  Then you
must take the absolute value of the average of the difference between the
average and each individual value.

  avg = sum of values / number of values
  variance = (sum over i ( abs( value(i) - avg ) )) / number of values

That "abs" in the formula makes things tough for incremental algorithms.

The standard deviation approximates variance as

  std dev = sqrt( sum of i(( value(i) - avg)^2) )

Which transforms to:

  std dev = sqrt( (avg of value^2) - (avg of value)^2 )

To calculate an incremental standard deviation you keep track of:
   the sum of all values
   the sum of the squares of values
   the number of values
Then do a straight calculation after each new value is added.

I hope this helps,
Steve
(The Duck)


"Mats Karlssohn" <sm5tfx@telia.com> wrote in message
news:86k73e9uzk.fsf@lucretia.kaos...
> Hi
>
> I'm trying to find algorithms to calculate standard-deviation and variance
> _without_ keeping a buffer of the samples. I'm not really shure if I'm
right
> to call them incremental algorithms but I couldn't find a better
expression.
> Basically I don't want to keep a (limited) buffer of samples but would
like
> to add the values one at a time when they are calculated.
>
> Probably I googled bad, since I didn't find anything I could understand.
>
> Any suggestions please?
>
> -- 
> Mats Karlssohn (SM5TFX), sm5tfx@telia.com
> "Without mistakes there is no forgiving, without forgiving there is no
love"





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-27  3:37 ` Robert I. Eachus
@ 2004-01-27  4:56   ` tmoran
  2004-01-28  0:22   ` tmoran
  1 sibling, 0 replies; 12+ messages in thread
From: tmoran @ 2004-01-27  4:56 UTC (permalink / raw)


>Yes, be very, very careful.  The problem/issue is that for the general
>case computing the mean and then the standard deviation has very good
>mathematical properties.  Computing them incrementally does not.  If you
>have an 'expected' value for the mean (call it u) you can use it to
>accumulate (Xi-u)**2 and if u is 'close' to the mean, the numerical
  If your incoming data is random, you could use the average of the first
bunch of values as u.  Or if N is large and the incoming data is really
random, you might want to act like a pollster and just take a subsample.
If the data has a pattern, it might be worth using other Monte Carlo
techniques.
  Even the average can be a problem if N is large.  Say you use 6 decimal
digit floating point for the running sum and the first million values are
all 10.0, followed by a million 9.0s so the correct average is 9.5 After
adding the million 10.0s your floating sum will lose its units position
and all the additions of 9.0s will change nothing.  So after all the data
is in, the sum is ten million, divided by N of 2 million, gives you a
value of 5.0, not the correct 9.5
  If you are just calculating the average and variance (standard deviation
squared) of the heights of people in your office, all these considerations
are irrelevant and you probably don't need a statistician. :)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
                   ` (3 preceding siblings ...)
  2004-01-27  3:39 ` Steve
@ 2004-01-27 15:48 ` Joachim Schr�er
  2004-01-28  0:22   ` tmoran
  2004-01-27 23:44 ` OT: " Mats Karlssohn
  5 siblings, 1 reply; 12+ messages in thread
From: Joachim Schr�er @ 2004-01-27 15:48 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 29558 bytes --]

Maybe the following package is of some help:

--$Id: math-generic_numerics-generic_statistics.ads,v 1.2 2003/07/11
08:50:32 schroeer Exp $
--==========================================================================
===
--             Copyright (C) EADS Dornier GmbH, Friedrichshafen
--==========================================================================
===
--
-- Filename         : math-generic_numerics-generic_statistics.ads
-- Type             : package spec
-- Language         : Ada 95
-- Author           : J. Schr�er
-- Created On       : 28-Mai-2003 <13:50>
-- Last Modified By : $Author: schroeer $
-- Last Modified On : $Date: 2003/07/11 08:50:32 $
-- Update Count     : $Revision: 1.2 $
-- Status           :
--==========================================================================
===
-- Description      : Package for simple statistics and error calculation.

with Ada.Strings.Unbounded;
with Ada.Numerics.Generic_Elementary_Functions;
with Ada.Text_Io;
with Interfaces;
with Math.Generic_Arrays;

generic
  with package Text_Io is new Ada.Text_Io.Float_Io(Float_Type);

  with package Elementary_Functions is new
    Ada.Numerics.Generic_Elementary_Functions(Float_Type);

  with package Arrays is new Math.Generic_Arrays
              (Element_Type => Float_Type, Zero => 0.0);

package Math.Generic_Numerics.Generic_Statistics is

  -- Types and procedures for calculation of some statistic parameters.
  --
  -- See e.g.: J.S. Bendat and A.G. Piersol,
  --           Engineering Applications of Correlation and Spectral
Analysis.
  --           Paragraph 2.2.3 Data Averaging Operations.
  --
  -- x(t)     : "signal" to calculate statistics parameters of.
  -- x(ti)    : sample at "time" ti, (ti in general an independent
variable).
  -- Mean     = E[x(t)] = lim(N->inf) 1/N Sum(i=1..N) x(ti)
  -- Variance = E[(x(ti)-Mean)**2] = lim(N->inf) 1/N Sum(i=1..N)
(x(ti)-Mean)**2
  -- Rms      = Sqrt(Variance)
  --
  -- In general t must not be the time. You may also analyze one or two
dimensional
  -- arrays of data. In the latter case you have x(t1,t2).

  type Index_Array_Type is array(1 .. 2) of Integer;
  subtype Delta_Array_Type is Arrays.Vector(Index_Array_Type'range);

  type Statistics_Type is record
    Name        : Ada.Strings.Unbounded.String_Access;
    -- For documentation in put.
    Min         : Float_type := Float_type'Last;
    Max         : Float_type := Float_type'First;
    Min_Index   : Interfaces.Unsigned_64 := 0;
    Max_Index   : Interfaces.Unsigned_64 := 0;
    Difference  : Float_type := 0.0; -- Max - Min
    -- Indices incremented in each version of Update below, starting with 0.
    Mean        : Float_type := 0.0;
    Rms         : Float_Type := 0.0;
    Count       : Interfaces.Unsigned_64 := 0;  -- Number of samples.
    Float_Count : Float_type := 0.0; -- used if count overflows.
    Min_Indices,
    Max_Indices : Index_Array_Type := (others => Integer'First);
    -- See different Update procedures below.
    -- If Update with parameter Values : vector is called then
    -- Min_Indices(1) is the real index of the minimum value in values,
    -- dito. Max_Indices(1). If Update with parameter Values : matrix is
called
    -- then Min_Indices(1..2) and Max_Indices(1..2) are the row/col indices
of the
    -- min and max values.
    Deltas      : Delta_Array_Type := (others => 0.0);
    -- Used in put to calculate values of t (or t1, t2 in case of 2-dim data
x)
    -- for min_index/max_index. Minimum of x(t) at t=Deltas(1)*Min_Index(1).
    -- Minimum of x(t1,t2) at t1=Deltas(1)*Min_Index(1),
t2=Deltas(2)*Min_Index(2).
    -- Dito Maximum.
  end record;

  --------------------------------------------------------------------------
---
  -- Data types for simple error analysis.
  -- Used to statistically investigate the precision of algorithms.
  -- A "correct" Value is compared to the algorithm output reference Ref
  -- See update procedure below.
  -- Absolute: Statistic parameters of "signal"   x = |Value - Ref|
  -- Relative: Statistic parameters of "signal"   x = |Value - Ref| / Value
  --           (Only used when Value /= 0)

  type String_Array is array(Positive range <>) of
Ada.Strings.Unbounded.String_Access;

  No_String_Array : constant String_Array(1 .. 0) := (others => null);

  type Statistics_Array is array(Positive range <>) of Statistics_Type;

  type Error_Type is record
    Absolute : Statistics_Type;
    Relative : Statistics_Type;
  end record;

  type Error_Array  is array(Positive range <>) of Error_Type;

  --------------------------------------------------------------------------
---

  procedure Initialize(Item   : in out Statistics_Type;
                       Name   : in     String := "";
                       Deltas : in     Delta_Array_Type := (others => 0.0));

  -- Resets Item components to values as given in type decl.
  -- Item.Name is only altered if Name /= ""

  function Statistics_With(Name   : in String := "";
                           Deltas : in Delta_Array_Type := (others => 0.0))
    return Statistics_Type;

  -- Returns initialized Statistics_Type.

  No_Index   : constant Integer := Integer'First;


  procedure Update(Item    : in out Statistics_Type;
                   Value   : in     Float_Type;
                   Index1,
                   Index2  : in     Integer := No_Index;
                   Rms     : in     Boolean := True);

  -- Calculate statistic parameters only from the current value xi without
storing
  -- the whole data array.
  -- Increments count, calculates min,max,mean,rms if Rms=True.
  --
  -- Iteration formulas:
  -- Mean = (Mean * (count - 1) + Value) / count
  -- Rms  = (Rms**2 * (count - 1) + (value - mean) ** 2) / count
  --
  -- Rms only approximated because it is calculated with the actual Mean.
  -- (The Mean calculated from x1 .. xi, xi = value.
  --  The correct Mean value is known after the last call).
  --
  -- If you want the correct values of t for min(x(t)), max(x(t)) give
Deltas in
  -- Initialize and/or Statistics_With and give Index1 ( and Index2 in case
x(t1,t2) ).


  procedure Update(Item    : in out Statistics_Type;
                   Values  : in     Arrays.Vector;
                   Rms     : in     Boolean := True);

  -- Increments count, calculates
min,max,mean,rms,min_indices(1),max_indices(1).
  -- Rms correct.

  procedure Update(Item   : in out Statistics_Type;
                   Values : in     Arrays.Matrix;
                   Rms    : in     Boolean := True);

  -- Increments count, calculates
min,max,mean,rms,min_indices(1..2),max_indices(1..2).
  -- Rms correct.

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Statistics_Type;
                Aft  : in Ada.Text_Io.Field := Float_type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3;
                Rms  : in Boolean           := True);

  -- Writes item to file.

  --------------------------------------------------------------------------
---
  -- Versions for vector "signals" x.

  procedure Initialize(Item   : in out Statistics_Array;
                       Names  : in     String_Array     := No_String_Array;
                       Deltas : in     Delta_Array_Type := (others => 0.0));

  -- Resets Item components to values as given in type decl.
  -- If Names = No_String_Array:
  -- Generated Names: Item(I).Name = Name_I.
  -- Deltas used for all components of array.

  function Statistics_With(Names  : in String_Array;
                           Deltas : in Delta_Array_Type := (others => 0.0))
    return Statistics_Array;

  -- Returns initialized Statistics_array.

  procedure Update(Item    : in out Statistics_Array;
                   Value   : in     Arrays.Vector;
                   Index1,
                   Index2  : in     Integer := No_Index;
                   Rms     : in     Boolean := True);

  -- Increments count, calculates min,max,mean,rms.

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Statistics_Array;
                Aft  : in Ada.Text_Io.Field := Float_type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3;
                Rms  : in Boolean           := True);

  -- Writes item to file.

  --------------------------------------------------------------------------
---
  -- Procedures for simple error analysis.

  procedure Initialize(Item   : in out Error_Type;
                       Name   : in     String := "";
                       Deltas : in     Delta_Array_Type := (others => 0.0));

  -- Resets Item components to values as given in type decl.

  function Error_With(Name   : in String := "";
                      Deltas : in Delta_Array_Type := (others => 0.0))
    return Error_Type;

  -- Returns initialized error_type.

  procedure Update(Item    : in out Error_Type;
                   Value,
                   Ref     : in     Float_type;
                   Index1,
                   Index2  : in     Integer := No_Index);

  -- Increments count, calculates min,max,mean,rms of abs and relative
deviation
  -- of Value and Ref.

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Error_Type;
                Aft  : in Ada.Text_Io.Field := Float_type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3);

  -- Writes item to file.

  --------------------------------------------------------------------------
---
  -- Versions for vector "signals" x.

  procedure Initialize(Item   : in out Error_Array;
                       Names  : in     String_Array     := No_String_Array;
                       Deltas : in     Delta_Array_Type := (others => 0.0));

  -- Resets Item components to values as given in type decl.

  function Error_With(Names  : in String_Array;
                      Deltas : in Delta_Array_Type := (others => 0.0))
    return Error_Array;

  -- Returns initialized error_array.

  procedure Update(Item    : in out Error_Array;
                   Value,
                   Ref     : in     Arrays.Vector;
                   Index1,
                   Index2  : in     Integer := No_Index);

  -- Increments count, calculates min,max,mean,rms.

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Error_Array;
                Aft  : in Ada.Text_Io.Field := Float_type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3);

  -- Writes item to file.

  --------------------------------------------------------------------------
---
  -- For testing: To generate adjacent model-numbers around a critical
number.

  type Adjacent_Numbers_Array is array(Positive range <>) of
Arrays.Vector_Access;

  function Adjacent_Numbers(Number : in Float_type;
                            Count  : in Positive := 1) return Arrays.Vector;

  -- Returns an array with 2*count+1 values.
  -- (.., float_type'pred(number), number, float_type'succ(number), ..).

  function Adjacent_Numbers(Numbers : in Arrays.Vector;
                            Count   : in Positive := 1)
    return Adjacent_Numbers_Array;

  -- Returns (.., float_type'pred(number), number, float_type'succ(number),
..) for all numbers.

end Math.Generic_Numerics.Generic_Statistics;

--======================================================================
--                              -*- Mode: Ada -*-
-- Filename        : math-generic_numerics-generic_statistics.ads
-- Description     : Package for simple statistics calculation
-- Author          : J. Schr�er
-- Created On      : 23.01.2003
-- Last Modified By:
-- Last Modified On:
-- Update Count    : 1
-- Status          :

with Utilities.String_Utilities;
with Utilities.Logging;

package body Math.Generic_Numerics.Generic_Statistics is

  package String_Utilities renames Utilities.String_Utilities;
  package Logging          renames Utilities.Logging;

  use type Interfaces.Unsigned_64;
  use type Arrays.Vector;
  use type Ada.Strings.Unbounded.String_Access;
  use Elementary_Functions;

  --========================================================================
===

  procedure Put(File : in Ada.Text_IO.File_Type;
                Text : in String;
                Item : in Float_Type;
                Fore,
                Aft,
                Exp  : in Ada.Text_io.Field) is
  begin
    Ada.Text_Io.Put(File, Text);
    Ada.Text_Io.Set_Col(File, 10);
    Ada.Text_Io.Put(File, ":");
    Text_Io.Put(File, Item, Fore, Aft, Exp);
    Ada.Text_Io.New_Line(File);
  end Put;

  --========================================================================
===

  procedure Initialize(Item   : in out Statistics_Type;
                       Name   : in     String := "";
                       Deltas : in     Delta_Array_Type := (others => 0.0))
is
  begin
    Item := (Name        => null,
             Min         => Float_Type'Last,
             Max         => Float_Type'First,
             Difference  => 0.0,
             Min_Index   => 0,
             Max_Index   => 0,
             Mean        => 0.0,
             Rms         => 0.0,
             Count       => 0,
             Float_Count => 0.0,
             Min_Indices |
             Max_Indices => (others => Integer'First),
             Deltas      => Item.Deltas);

    if Name /= "" then
      String_Utilities.Allocate(Item.Name, 1, Name'Length);
      Item.Name.All := Name;
    end if;
    if Deltas /= (Deltas'range => 0.0) then
      Item.Deltas := Deltas;
    end if;
  end Initialize;

  --========================================================================
===

  function Statistics_With(Name   : in String := "";
                           Deltas : in Delta_Array_Type := (others => 0.0))
    return Statistics_Type is

    Item : Statistics_Type;
  begin
    Initialize(Item, Name, Deltas);
    return Item;
  end Statistics_With;

  --========================================================================
===

  procedure Update(Item   : in out Statistics_Type;
                   Value  : in     Float_Type;
                   Index1,
                   Index2 : in     Integer := No_Index;
                   Rms    : in     Boolean := True) is
    ------------------------------------------------------------------------
---
    function Average_Of(Average,
                        Value : in Float_Type;
                        Count : in Interfaces.Unsigned_64) return Float_Type
is
    begin
      return (Average * Float_Type(Count - 1) + Value) / Float_Type(Count);
    end Average_Of;
    ------------------------------------------------------------------------
---
--    function Average_Of(Average, Value, Count : in Float_Type) return
Float_Type is
--    begin
--      return (Average * (Count - 1.0) + Value) / Count;
--    end Average_Of;
    ------------------------------------------------------------------------
---
    Min_Max_Altered : Boolean := False;
  begin
--    if Item.Count = Interfaces.Unsigned_64'Last then
--      Item.Float_Count := Item.Float_Count + 1.0;
--    else
    Item.Count := Item.Count + 1;
    --    end if;

    -- Assign also No_Index values. Check done in Put below.
    if Value < Item.Min then
      Item.Min            := Value;
      Item.Min_Index      := Item.Count;
      Item.Min_Indices(1) := Index1;
      Item.Min_Indices(2) := Index2;
      Min_Max_Altered     := True;
    end if;

    if Value > Item.Max then
      Item.Max            := Value;
      Item.Max_Index      := Item.Count;
      Item.Max_Indices(1) := Index1;
      Item.Max_Indices(2) := Index2;
      Min_Max_Altered     := True;
    end if;

    if Min_Max_Altered then
      Item.Difference := Item.Max - Item.Min;
    end if;

    Item.Mean  := Average_Of(Average => Item.Mean,
                             Value   => Value,
                             Count   => Item.Count);

    if Rms then
      Item.Rms := Sqrt(Average_Of(Average => Item.Rms ** 2,
                                  Value   => (Item.Mean - Value) ** 2,
                                  Count   => Item.Count));
    end if;
  end Update;

  --========================================================================
===

  procedure Update(Item   : in out Statistics_Type;
                   Values : in     Arrays.Vector;
                   Rms    : in     Boolean := True) is
  begin
    if Values'Length = 0 then
      return;
    end if;

    for I in Values'range loop
      Item.Count := Item.Count + 1;
      Item.Mean  := Item.Mean + Values(I);

      if Values(I) < Item.Min then
        Item.Min            := Values(I);
        Item.Min_Index      := Item.Count;
        Item.Min_Indices(1) := I;
      end if;

      if Values(I) > Item.Max then
        Item.Max            := Values(I);
        Item.Max_Index      := Item.Count;
        Item.Max_Indices(1) := I;
      end if;
    end loop;

    Item.Difference := Item.Max - Item.Min;
    Item.Mean       := Item.Mean / Float_Type(Item.Count);

    if Rms then
      for I in Values'range loop
        Item.Rms := Item.Rms + (Item.Mean - Values(I)) ** 2;
      end loop;

      Item.Rms := Sqrt(Item.Rms / Float_Type(Item.Count));
    end if;
  end Update;

  --========================================================================
===

  procedure Update(Item    : in out Statistics_Type;
                   Values  : in     Arrays.Matrix;
                   Rms     : in     Boolean := True) is
  begin
    if Values'Length(1) = 0 or else Values'Length(2) = 0 then
      return;
    end if;

    for I in Values'range(1) loop
      for J in Values'range(2) loop
        Item.Count := Item.Count + 1;
        Item.Mean  := Item.Mean + Values(I, J);

        if Values(I, J) < Item.Min then
          Item.Min            := Values(I, J);
          Item.Min_Index      := Item.Count;
          Item.Min_Indices(1) := I;
          Item.Min_Indices(2) := J;
        end if;

        if Values(I, J) > Item.Max then
          Item.Max            := Values(I, J);
          Item.Max_Index      := Item.Count;
          Item.Max_Indices(1) := I;
          Item.Max_Indices(2) := J;
        end if;
      end loop;
    end loop;

    Item.Difference := Item.Max - Item.Min;
    Item.Mean       := Item.Mean / Float_Type(Item.Count);

    if Rms then
      for I in Values'range(1) loop
        for J in Values'range(2) loop
          Item.Rms := Item.Rms + (Item.Mean - Values(I, J)) ** 2;
        end loop;
      end loop;

      Item.Rms := Sqrt(Item.Rms / Float_Type(Item.Count));
    end if;
  end Update;

  --========================================================================
===

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Statistics_Type;
                Aft  : in Ada.Text_Io.Field := Float_Type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3;
                Rms  : in Boolean           := True) is

    Fore : Ada.Text_Io.Field := Text_Io.Default_Fore;
    Dim  : Natural := 0;
    ------------------------------------------------------------------------
---
    procedure Put(Index : in Interfaces.Unsigned_64;
                  Text  : in String) is
    begin
      Ada.Text_Io.Put(File, Text);
      Ada.Text_Io.Put(File, Interfaces.Unsigned_64'Image(Index));
      if Item.Deltas(1) /= 0.0 then
        Ada.Text_Io.Put(File, " -> ");
        Text_Io.Put(File, Float_Type(Index) * Item.Deltas(1),
                    Fore, Aft, Exp);
      end if;
      Ada.Text_Io.New_Line(File);
    end Put;
    ------------------------------------------------------------------------
---
    procedure Put(Indices : in Index_Array_Type;
                  Text    : in String) is
    begin
      Ada.Text_Io.Put(File, Text);
      for I in 1 .. Dim loop
        Ada.Text_Io.Put(File, Integer'Image(Indices(I)));

        if Item.Deltas(I) /= 0.0 then
          Ada.Text_Io.Put(File, " -> ");
          Text_Io.Put(File, Float_Type(Indices(I)) * Item.Deltas(I),
                      Fore, Aft, Exp);
        end if;

        if I < Dim then
          Ada.Text_Io.Put(File, ",");
        end if;
      end loop;
      Ada.Text_Io.New_Line(File);
    end Put;
    ------------------------------------------------------------------------
---
  begin
    if Exp = 0 then
      Fore := Float_Type'digits - 1;
    end if;

    if Item.Name /= null then
      Ada.Text_Io.Put_Line(File,  Item.Name.All);
    end if;

    Put(File, "Min", Item.Min,  Fore, Aft, Exp);
    Put(File, "Max", Item.Max,  Fore, Aft, Exp);

    Put(Item.Min_Index, "Min_Index:");
    Put(Item.Max_Index, "Max_Index:");

    Put(File, "Diff",Item.Difference, Fore, Aft, Exp);
    Put(File, "Mean",Item.Mean,       Fore, Aft, Exp);

    if Rms then
      Put(File, "Rms", Item.Rms, Fore, Aft, Exp);
    end if;

    Ada.Text_Io.Put_Line(File, "Count    :" &
      Interfaces.Unsigned_64'Image(Item.Count));

    if Item.Float_Count > 0.0 then
      Put(File, "Float_Count", Item.Float_Count, Fore, Aft, Exp);
    end if;

    for I in reverse Item.Min_Indices'range loop
      if Item.Min_Indices(1) /= No_Index then
        Dim := I;
        exit;
      end if;
    end loop;

    if Dim > 0 then
      Ada.Text_Io.Put_Line(File, "Min-Max-Indices in array");
      Put(Item.Min_Indices, "Min_Index:");
      Put(Item.Max_Indices, "Max_Index:");
    end if;
  end Put;

  --========================================================================
===

  procedure Initialize(Item   : in out Statistics_Array;
                       Names  : in     String_Array     := No_String_Array;
                       Deltas : in     Delta_Array_Type := (others => 0.0))
is
  begin
    for I in Item'range loop
      if I in Names'range and then Names(I) /= null then
        Initialize(Item(I), Names(I).all, Deltas);
      else
        Initialize(Item(I), Natural'Image(I), Deltas);
      end if;
    end loop;
  end Initialize;

  --========================================================================
===

  function Statistics_With(Names  : in String_Array;
                           Deltas : in Delta_Array_Type := (others => 0.0))
    return Statistics_Array is

    Item : Statistics_Array(Names'range);
  begin
    Initialize(Item, Names, Deltas);
    return Item;
  end Statistics_With;

  --========================================================================
===

  procedure Update(Item    : in out Statistics_Array;
                   Value   : in     Arrays.Vector;
                   Index1,
                   Index2  : in     Integer := No_Index;
                   Rms     : in     Boolean := True) is
  begin
    for I in Item'range loop
      Update(Item(I), Value(I), Index1, Index2, Rms);
    end loop;
  end Update;

  --========================================================================
===

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Statistics_Array;
                Aft  : in Ada.Text_Io.Field := Float_Type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3;
                Rms  : in Boolean           := True) is
  begin
    for I in Item'range loop
      Put(File, Item(I), Aft, Exp, Rms);
      Ada.Text_Io.Put_Line(File,
"-------------------------------------------");
    end loop;
  end Put;

  --========================================================================
===

  procedure Initialize(Item   : in out Error_Type;
                       Name   : in     String := "";
                       Deltas : in     Delta_Array_Type := (others => 0.0))
is
  begin
    Initialize(Item.Absolute, "abs-Error " & Name, Deltas);
    Initialize(Item.Relative, "rel-Error " & Name, Deltas);
  end Initialize;

  --========================================================================
===

  function Error_With(Name   : in String := "";
                      Deltas : in Delta_Array_Type := (others => 0.0))
    return Error_Type is

    Item : Error_Type;
  begin
    Initialize(Item, Name, Deltas);
    return Item;
  end Error_With;

  --========================================================================
===

  procedure Update(Item    : in out Error_Type;
                   Value,
                   Ref     : in     Float_Type;
                   Index1,
                   Index2  : in     Integer := No_Index) is

    Err : constant Float_Type := abs(Value - Ref);
    Rel : Float_Type;
    ------------------------------------------------------------------------
---
    procedure Put(File : in Ada.Text_Io.File_Type) is
    begin
      Ada.Text_Io.New_Line(File);
      Ada.Text_Io.Put_Line(File,
        "*** Statistics.Update, Abs_Err=|Value-Ref|,
Rel_Err=Abs_Err/|Value|, Value = 0");
      Ada.Text_Io.Put(File, "Ref = " & Float_Type'Image(Ref));
      if Index1 /= No_Index then
        Ada.Text_Io.Put(File, ", Index1=" & Integer'Image(Index1));
        if Index2 /= No_Index then
          Ada.Text_Io.Put(File, ", Index2=" & Integer'Image(Index2));
        end if;
      end if;
      Ada.Text_Io.New_Line(File);
    end Put;
    ------------------------------------------------------------------------
---
  begin
    Update(Item.Absolute, Err, Index1, Index2);

    if Value /= 0.0 then
      Rel := Err / abs Value;
      Update(Item.Relative, Rel, Index1, Index2);
    elsif Ada.Text_Io.Is_Open(Logging.Log_File) then
      Put(Logging.Log_File);
    else
      Put(Ada.Text_Io.Standard_Output);
    end if;
  end Update;

  --========================================================================
===

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Error_Type;
                Aft  : in Ada.Text_Io.Field := Float_Type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3) is

    Fore : Ada.Text_Io.Field := Text_Io.Default_Fore;
  begin
    if Exp = 0 then
      Fore := Float_Type'digits - 1;
    end if;
    Put(File, Item.Absolute, Aft, Exp);
    Ada.Text_Io.New_Line(File);
    Put(File, Item.Relative, Aft, Exp);
  end Put;

  --========================================================================
===

  procedure Initialize(Item   : in out Error_Array;
                       Names  : in     String_Array     := No_String_Array;
                       Deltas : in     Delta_Array_Type := (others => 0.0))
is
  begin
    for I in Item'range loop
      if I in Names'range and then Names(I) /= null then
        Initialize(Item(I), Names(I).all, Deltas);
      else
        Initialize(Item(I), Natural'Image(I), Deltas);
      end if;
    end loop;
  end Initialize;

  --========================================================================
===

  function Error_With(Names  : in String_Array;
                      Deltas : in Delta_Array_Type := (others => 0.0))
    return Error_Array is

    Item : Error_Array(Names'Range);
  begin
    Initialize(Item, Names, Deltas);
    return Item;
  end Error_With;

  --========================================================================
===

  procedure Update(Item    : in out Error_Array;
                   Value,
                   Ref     : in     Arrays.Vector;
                   Index1,
                   Index2  : in     Integer := No_Index) is
  begin
    for I in Item'range loop
      Update(Item(I), Value(I), Ref(I), Index1, Index2);
    end loop;
  end Update;

  --========================================================================
===

  procedure Put(File : in Ada.Text_Io.File_Type;
                Item : in Error_Array;
                Aft  : in Ada.Text_Io.Field := Float_Type'Digits - 1;
                Exp  : in Ada.Text_Io.Field := 3) is
  begin
    for I in Item'range loop
      Ada.Text_Io.Put_Line(File,
"-------------------------------------------");
      Put(File, Item(I), Aft, Exp);
    end loop;
    Ada.Text_Io.Put_Line(File,
"-------------------------------------------");
  end Put;

  --========================================================================
===

  function Adjacent_Numbers(Number : in Float_Type;
                            Count  : in Positive := 1) return Arrays.Vector
is

    A : Arrays.Vector(-Count .. Count);
  begin
    A(0) := Number;
    for I in reverse -Count .. -1 loop
      A(I) := Float_Type'Pred(A(I + 1));
    end loop;
    for I in 1 .. Count  loop
      A(I) := Float_Type'Succ(A(I - 1));
    end loop;

    return A;
  end Adjacent_Numbers;

  --========================================================================
===

  function Adjacent_Numbers(Numbers : in Arrays.Vector;
                            Count   : in Positive := 1)
    return Adjacent_Numbers_Array is

    A : Adjacent_Numbers_Array(1 .. Numbers'Length);
  begin
    for I in A'range loop
      A(I) := new Arrays.Vector'
              (Adjacent_Numbers(Number => Numbers(Numbers'First + I - 1),
                                Count  => Count));
    end loop;

    return A;
  end Adjacent_Numbers;

  --========================================================================
===

end Math.Generic_Numerics.Generic_Statistics;

"Mats Karlssohn" <sm5tfx@telia.com> schrieb im Newsbeitrag
news:86k73e9uzk.fsf@lucretia.kaos...
> Hi
>
> I'm trying to find algorithms to calculate standard-deviation and variance
> _without_ keeping a buffer of the samples. I'm not really shure if I'm
right
> to call them incremental algorithms but I couldn't find a better
expression.
> Basically I don't want to keep a (limited) buffer of samples but would
like
> to add the values one at a time when they are calculated.
>
> Probably I googled bad, since I didn't find anything I could understand.
>
> Any suggestions please?
>
> --
> Mats Karlssohn (SM5TFX), sm5tfx@telia.com
> "Without mistakes there is no forgiving, without forgiving there is no
love"





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incremental statistics functions
  2004-01-27  3:39 ` Steve
@ 2004-01-27 16:22   ` Robert I. Eachus
  0 siblings, 0 replies; 12+ messages in thread
From: Robert I. Eachus @ 2004-01-27 16:22 UTC (permalink / raw)


Steve wrote:

> An incremental algorithm for standard deviation is easy.  An incremental
> algorithm for variance (as far as I know) is not possibile.  It is my
> understanding that standard deviation is really an approximation of variance
> that is easier to calculate.

No the variance is the square of the standard deviation, and the 
standard deviation is the (positive) square root of the variance. 
However, note that there are two ways to calculate the variance/standard 
deviation and only one is right.  If you are calculating the variance of 
a complete set of values, or using an externally defined mean, you 
divide by the number of values.  But if you are using a sample, and 
estimating the mean from the sample you divide by n-1. (This says you 
can't estimate the variance from one sample, which is as it should be.)


-- 
                                           Robert I. Eachus

"The war on terror is a different kind of war, waged capture by capture, 
cell by cell, and victory by victory. Our security is assured by our 
perseverance and by our sure belief in the success of liberty." -- 
George W. Bush




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
                   ` (4 preceding siblings ...)
  2004-01-27 15:48 ` Joachim Schr�er
@ 2004-01-27 23:44 ` Mats Karlssohn
  5 siblings, 0 replies; 12+ messages in thread
From: Mats Karlssohn @ 2004-01-27 23:44 UTC (permalink / raw)


OK, thanks a lot, guys. I'll have to digest all that info for a few hours,
then I'll try to implement it for the 8051-target at hand. I have quite
well behaved data-set with most samples very near the average, and the
nuber of samples is also bound to not become very large. The main problem
is the lack of computational resources, but that I think I'll be able to
handle now.

Thanks again.

-- 
Mats Karlssohn (SM5TFX), sm5tfx@telia.com
"Without mistakes there is no forgiving, without forgiving there is no love"



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OT: Incremental statistics functions
  2004-01-27  3:37 ` Robert I. Eachus
  2004-01-27  4:56   ` tmoran
@ 2004-01-28  0:22   ` tmoran
  2004-01-28 19:56     ` OT: large sums; was " tmoran
  1 sibling, 1 reply; 12+ messages in thread
From: tmoran @ 2004-01-28  0:22 UTC (permalink / raw)


>  ... eventually the difference
>between the sum of the squares and n times x-bar squared will be much
>smaller than those two numbers. And if you are using floating point,
>that ratio determines how much significance you have lost.
  But if you divide by N before the subtraction, the *average* of the
squares does not grow, nor does the square of the average, so subtracting
one from the other does not lead to increasing inaccuracy.  The problem
arises first in adding a new term to a sum (x**2 to the sum of squares, or
x to the running sum of x).  In the least demanding (non-trivial) case
where the x's are a non-zero constant, this is like adding 1.0 to
float'(N), for a loss of log2(N) bits.  With 4 byte IEEE float, N = N+1.0,
ie, *all* is lost, at N=2**24 or 16 million, so the sum becomes stuck,
though the divisor increases, and the average thus heads toward zero.
Even at N = one million you have only 4 good bits or one solid decimal digit.
  An iterative formula ((count-1)*sum+x)/count of course has exactly the
same problem (plus it's more operations).  But the iterative estimate
at least is stuck at something close to the correct value.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incremental statistics functions
  2004-01-27 15:48 ` Joachim Schr�er
@ 2004-01-28  0:22   ` tmoran
  0 siblings, 0 replies; 12+ messages in thread
From: tmoran @ 2004-01-28  0:22 UTC (permalink / raw)


I'm curious why the Rms boolean is changeable on each data point, rather
than being set at Initialize?

> -- Rms  = (Rms**2 * (count - 1) + (value - mean) ** 2) / count
  Not very good for small N.  For data 1,0,1 actual std.dev.=0.4714,
this Rms estimate is 0.3469



^ permalink raw reply	[flat|nested] 12+ messages in thread

* OT: large sums; was Incremental statistics functions
  2004-01-28  0:22   ` tmoran
@ 2004-01-28 19:56     ` tmoran
  0 siblings, 0 replies; 12+ messages in thread
From: tmoran @ 2004-01-28 19:56 UTC (permalink / raw)


  If you add many similar numbers with a non-zero average, say in
calulating an average or the sum of squares for a variance, you soon find
that a value is being added to about N* that value, which loses log2(N)
bits in floating point.  For instance, using 4 byte IEEE floating point,
summing 8000 times the number  111111.0, then dividing by 8000.0,
yields an incorrect average of 111105.0

  If you take the sum of the first two values, then the sum of the
next two, and then add those two sums, you get a more accurate
value for the sum of all four.  Building on that, here's an algorithm
for calculating a large sum.  This algorithm is new to me; I would be
interested in seeing references to it.

  type Registers is record
    Full    : Boolean := False;
    Content : Float;
  end record;

  -- To add N values we need log2(N) registers.
  subtype Register_Indices is Integer range 1 .. 64; -- allow 2**64 values

  type Register_Sets is array (Register_Indices range <>) of Registers;

  type Accumulators(N : Register_Indices) is record
    Reg : Register_Sets(1 .. N);
  end record;

  procedure Reset(Acc : in out Accumulators) is
  begin
    Acc.Reg := (others => (Full => False, Content => 0.0));
  end Reset;

  procedure Add(Acc : in out Accumulators;
                X   : in     Float) is
    Partial_Sum: Float := X;
  begin
    for I in Acc.Reg'range loop
      if not Acc.Reg(I).Full then
        Acc.Reg(I).Content := Partial_Sum;
        Acc.Reg(I).Full := True;
        return;
      end if;
      Partial_Sum := Partial_Sum + Acc.Reg(I).Content;
      Acc.Reg(I).Full := False;
    end loop;
    raise Constraint_Error; -- overflow
  end Add;

  function Sum(Acc : Accumulators) return Float is
  -- Return current sum in Acc.
  -- Best performance when N is a power of two, worst when it's a bit larger.
    Result  : Float := 0.0;
  begin
    for I in Acc.Reg'range loop -- start by adding the small values
      if Acc.Reg(I).Full then
        Result := Result + Acc.Reg(I).Content;
      end if;
    end loop;
    return Result;
  end Sum;



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-01-28 19:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-26 22:59 OT: Incremental statistics functions Mats Karlssohn
2004-01-27  1:50 ` tmoran
2004-01-27  2:13 ` Stephen Leake
2004-01-27  3:37 ` Robert I. Eachus
2004-01-27  4:56   ` tmoran
2004-01-28  0:22   ` tmoran
2004-01-28 19:56     ` OT: large sums; was " tmoran
2004-01-27  3:39 ` Steve
2004-01-27 16:22   ` Robert I. Eachus
2004-01-27 15:48 ` Joachim Schr�er
2004-01-28  0:22   ` tmoran
2004-01-27 23:44 ` OT: " Mats Karlssohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox