From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,2bfbf78d65c8859c,start
X-Google-Attributes: gid103376,public
From: Mok-kong Shen <mok-kong.shen@stud.uni-muenchen.de>
Subject: random number generation
Date: 1997/12/19
Message-ID: <349AC365.45277E06@stud.uni-muenchen.de>
X-Deja-AN: 308639370
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Distribution: world
Content-Type: text/plain; charset=us-ascii
Organization: Studenten; LRZ; Universitaet Muenchen
Newsgroups: comp.lang.ada
Date: 1997-12-19T00:00:00+00:00
List-Id: <comp.lang.ada>


Hello,

If you are interested in random number sequences, I like to invite you
to look at a combined pseudo-random number generator designed by me.
The underlying algorithm, which to my knowledge has not appeared
previously in the literature, is in fact extremely simple:

   j:=0; do output(G(j)); j:=trunc(n*G(j)) od;

where G(j) (0<=j<n) are n pseudo-random number generators (with output
in [0,1)) that can be arbitrarily given by the user, i.e. no particular
requirements of statistical qualities are imposed on them. These n
constituent generators can be of any kind; their parameters can be,
in particular, randomly chosen with the help of one additional pseudo-
random number generator.

The basic idea is the following: Each of the n constituent generators
produces an output stream that is, depending on its quality, more or
less random. If one randomly interleaves these streams, then the
result should be more random. In other words, the auto-correlations
that exist in the n individual output streams get lost through the
mixing process since elements that are at distance d (d>=1) apart
from one another in any particular one of the n output streams are no
longer at any fixed distance, say d', apart on the combined output.
Besides, half of the output of each generator is consumed internally
and does not appear as the output of the combined generator. From
the algorithm it can be seen that auto-correlations of elements at
distance d=1 apart of the n individual output streams are totally
lost on the combined output stream.

I found through experiments that the combined generator thus built
behaves, as expected, indeed very satisfactorily in both the frequency
and the serial test. Its qualities very quickly asymptotically improve
with larger n. Since the computing cost pro each output element of the
combined generator is independent of n, meaning that n can be chosen
to be fairly large without penality, I suppose that this scheme could
be very useful for practical applications.

Please give your comments and critiques. A well-documented Fortran 90 
implementation is available on

http://www.stud.uni-muenchen.de/~mok-kong.shen/

See also the attachment below.

------------------------------------------------------------
The following is a fragment taken from an old unpublished report of
mine. The experiment (referred to as 'a small-scale test run' in my Web
document) was done at a time when the PCs were of 20MHZ, hence the test
sequences were short. Later experiments confirmed the superior
statistical behaviour previously found of my algorithm. (Note that the
generation of the constituent generators in the computer program
provided in the Web document is done somewhat differently from what is
described below.)


   In order to quantitatively verify the performance of our algorithm as
predicted from the theoretical considerations above, an experiment was
done on a set of generators that were each internally based on a LCPRNG
of the mixed type. The coefficients of these LCPRNG's were themselves
generated by a random number generator named RANDOM provided by the UNIX
operating system. In order to obtain large periods of the sequences
generated, we posed the requirement that the modulus of each LCPRNG be a
prime lying between 2**18 and 2**30. Otherwise, all coefficients of the
congruence relations involved were randomly chosen, subject to the
single necessary constraint that no arithmetic overflow should ever
occur on a computer of 32 bit word length. The sequence of integers
obtained from each LCPRNG were divided by the corresponding modulus so
that the output of each generator was real numbers in the range [0,1).
We obtained in this manner 12 generators to serve as constituent
generators of our scheme. Then we applied our algorithm for n=1 to the
first generator obtained. The system output was multiplied by 32 and
then truncated so that a sequence of random integers lying between 0 and
31 was obtained. On this sequence, of length 10000, the serial test
(non-overlapping) was applied, leading to the values of the test
statistic given for n=1 in the table below for three different values of
the distance u. Next we applied our algorithm for n=2 to the set
consisting of the first and the second generator, leading to the values
of the test statistic given for n=2 in the table. The other entries of
the table were obtained in analogous fashion.

   Note that for n=1 and u=1 the value of the test statistic obtained is
very large. In fact, even larger values were obtained with the serial
test applied directly to the output of any of the 12 randomly chosen
random number generators (test done on integer sequences in [0,31] as
above). This is in conformance to the well-known fact that it is
extremely difficult to find coefficients for a LCPRNG such that its
output is satisfactory in respect of statistical correlations. For n>=9
the test statistic of our combined generator has apparently attained
values that cannot be further improved, i.e. values that would also have
been obtained even from truly random sequences.

             n     u=1      u=2      u=3
             1   31251.6   5740.9   1104.2
             2    8641.3   3402.1   1706.0
             3    3158.0   1674.8   1354.6
             4    1917.3   1264.4   1198.1
             5    1610.9   1252.1   1114.1
             6    1489.7   1139.5   1096.0
             7    1461.9   1144.0   1090.7
             8    1362.7   1044.9   1106.3
             9    1185.0   1162.9   1043.2
            10    1155.9    975.2   1022.7
            11    1155.5   1016.6   1001.4
            12    1085.8   1067.0   1070.6

   For comparison purposes, we also ran the serial test (in the same
manner as above) directly on the output of the generator RANDOM of the
UNIX operating system, the generator G05CAF of the NAG library and the
generator RNUNF of the IMSL library. The values of the test statistic
obtained are given below. It can be clearly seen by comparing the two
tables that through combining some 10 fairly bad random number
generators according to our algorithm we have succeeded to obtain output
sequences whose quality is comparable to that of the three well-known
good random number generators examined.

                    u=1      u=2      u=3
          RANDOM   1001.1    978.9    979.7
          G05CAF   1078.5   1033.4   1036.2
          RNUNF    1028.1   1016.2    947.3