Non-Random Randomness - Part 2: Innocent Until Proven Guilty

Last month we established the generally accepted form of a random number generator (RNG). It is a recursive function defined by a pair of equations as follows:

(R1) a₀ = seed

(R2) a_n₊₁ = f (a_n)

We noted that it is essential that the number "seed" on the right hand side of (R1), the first number in the sequence of random numbers, be chosen in a completely random fashion. This is generally accomplished by using a real time clock and some command to set the seed by noting the current time as this command is executed. Once the seed is set, the rest of the sequence is generated recursively using (2) and the process is completely deterministic. For this reason, as noted in last month's article, the infinite sequence a₀, a₁, a₂, a₃, ... is referred to as a sequence of pseudo random numbers. Finally, as I briefly mentioned last month, the trick here is to choose the function (algorithm, formula, procedure, instruction) f to be complicated enough that the resulting sequence of pseudo random numbers "looks and acts" like a truly random sequence.

It is this last sentence in the previous paragraph that is at the nub of a truly thorny issue. What does a truly random sequence look like? How does it act? What are its properties? The more you think about these issues, the worse it gets. To be specific, let us suppose that our sequence is intended to randomly produce 1000 integers 1 through 10. Probably the first thing we would think of is that each integer should occur approximately 1/10 of the time; that is, each number should occur approximately 100 times. Suppose that our function f produced each integer exactly 100 times? Does this seem random? No, it does not. What if the number 2 only occurred 65 times and the number 7 occurred 135 times? Does this seem like proper behavior for a truly random sequence? How do we decide? Frustrating, isn't it? It gets worse.

In a truly random sequence, we would expect streaks to occur, that is, we might expect to see the same number appearing three times in a row every now and then. What if the number frequencies look reasonable but no triples occur or a whole bunch of triples occur. Is our sequence random? In the former case probably not and in the latter it depends upon how large a "whole bunch" is and how do we decide? The same type of frustrating questions occur when it comes to gaps, that is, sequences of numbers wherein a particular number does not occur.

As you can see, the question of what comprises a random sequence is not an easy one. It is not a totally mathematical or probabilistic question but a question that involves philosophy and judgment. Difficult as it is, the field of Statistics has provided us with some remarkably useful tools in helping us to make these judgments. One of them, called the Chi Square distribution, will be described below and I think it will make sense to you. First though, let's return to the question of choosing the function f.

Let me first make it clear that nowhere is it written in stone that f has to have a particular form. If you have a candidate for f that looks random to you it could well produce a viable RNG. As we will see, however, your opinion on the matter is not evidence enough nor does it settle the issue. Last month I mentioned the pioneering work of Donald Knuth [3] in the field of RNGs. Knuth not only made specific proposals about how f could be constructed but developed many statistical tests for evaluating his (and others') choices. Since then others have proposed many more RNGs, many of which are unfamiliar to me. One of Knuth's main ideas, however, was called a linear congruential generator and generates integers. The general form looks like this:

a_n₊₁ = (ra_n + c) mod m (1)

The RNGs designer chooses the integers r, c, and m. The formula on the right of (1) is shorthand for the following procedure. The algorithm multiplies the nth integer in the sequence, a_n, by the integer r and adds the integer c to the result. This integer is then divided by the integer m and the integer remainder after the division is completed is the integer a_n₊₁. An example would be helpful here. Suppose that r = 7, c = 5, m = 13. If seed = 4 then the second number in the sequence would be (7 x 4 + 5) mod 13 or 33 mod 13. 33 divided by 13 is 2 with a remainder of 7 so a₁ = 7. The next number is (7 x 7 + 5) mod 13 or 54 mod 13. 54 divided by 13 is 4 with a remainder of 2 so a₂ = 2. Continuing in this fashion we obtain the sequence 4, 7, 2, 6, 8, 9, 3, 0, 5, 1, 12, 11, 4, 7, 2, 6, ... .You see what has happened; the sequence has started to repeat. In fact, if you think about it, a recursion of the form in (1) can only produce the integers 0 through 12 so the sequence has to repeat in 13 terms or less. Worse, suppose that we had started with seed = 10. The next term would also be 10. And the next. The number of terms before a repetition is called the period of the RNG. In our example the seed 4 produced a period of 12 whereas the seed 10 produced a period of 1. The point here is that the period depends not only upon the choices of r, c, and m but very much upon the choice of the seed. Put another way, these numbers should depend upon the choice of the seed.

Notice that any time it happens in (R2) that a_n₊₁ = seed the sequence will repeat. For some choices of f this may never happen. For others, as we have just seen, this always happens. Since exact repetition is certainly not a property of a truly random sequence, you may ask why anyone would use a function f that is sure to repeat? A very good question. The answer is that by using very large numbers in (1) and making use of some theorems in number theory one can produce sequences that we know for sure will not repeat for a billion or more terms. Slight modifications of (1) can help in this regard. In the excellent article [4] my editor John Robison presents an RNG by Park and MIller that adjusts things according to the choice of seed and produces over 2 billion numbers before repeating. According to Kilby and Fox [2] the RNG used by IGT generates over 4 billion numbers before repeating. Recalling the Corriveau story from last month, such assurances are very important.

Okay, we know we have to consider the problem of repeating sequences but we have yet to address the phrase "looks and acts like a random sequence." I think it is easiest to address this by using an example. I am going to use an RNG that is included with one of my programming languages. It is called, not surprisingly, RND. I have no idea how RND is constructed. It might be a linear congruential generator of some kind or could easily be something completely different. I am assuming that the problem of repeating sequences has been addressed by the creator of RND. What we want to address here is the question: Does RND act like a random sequence?

Suppose that I want to use RND to drive a 20-character reel on a slot machine. This particular reel has cherries at stops 1, 3, 5, 7, 9, 11, 15, and 19; it has oranges at stops 2, 4, 12, 17, and 20; it has plums at stops, 6, 8, 13, and 18; bars at stops 14 and 16, and the familiar red 7 at stop 7. This gives us five classes:

Class	Symbol	Probability
1	Cherry	8/20
2	Orange	5/20
3	Plum	4/20
4	Bar	2/20
5	7	1/20

Table 1
Symbol Probabilities

Note that the probabilities in Table 1 are those one would get if RND does indeed generate equiprobable numbers. The first thing we want to do is a frequency check to see if RND does indeed approximate these probabilities.

Suppose that we used RND to select the above five symbols 2000 times. If p_s represents the theoretical probability for class s and n represents the number of trials, then np_s represents the theoretical number of times that we would expect to see the symbol in class s in n trials. So for 2000 trials, if RND is working right, we would expect to see cherries (8/20)x 2000 or 800 times. Of course, we don't expect to see exactly 800 cherries but some figure close to 800. If we let Z_s represent the actual number of times the symbol in class s occurs in n trials, then the number

dev = (Z₁ - np₁)² + (Z₂ - np₂)² + (Z₃ - np₃)² + ... + (Z_v - np_v)² (2)

would be one way to measure of the deviation of the actual frequency of the symbol in each class from its theoretical frequency without regard to sign. v represents the number of classes; in our example v = 5. There is one disturbing thing about the number dev though. Referring to our example, notice that the cherry symbol is expected to appear 8 times more often than the 7, so deviations from the expected number of cherries would probably have a larger impact on dev than would deviations from the expected number of 7s. That doesn't seem quite right. Since the 7 represents a jackpot symbol with big payouts we would like dev to adequately reflect the impact that this symbol has on deviations from the norm. We can remedy the situation by weighting the terms in (2) so that term 1, for example, has only one eighth of the effect on dev that term 5 has. This is easy to do. We simply divide the sth term in (2) by np_s. The weighted deviation is therefore

wt. dev = (Z₁ - np₁)²/np₁+ (Z₂ - np₂)²/np₂+ ... + (Z_v - np_v)²/np_v (3)

Notice that if wt. dev is small (near zero) it means that it is not exhibiting enough deviation from the theoretical to be truly random and if it is large then the frequencies are grossly skewed from being uniformly distributed. But how small is too small and how large is too large? Let's look at our example. I actually ran 2000 trials and here are the results:

Class	Z_v	np_v	(Z_v - np_v)²	(Z_v -np_v)²/np_v
1	776	800	576	0.7200
2	473	500	729	1.4580
3	429	400	841	2.1205
4	212	200	144	0.7200
5	110	100	100	1.0000
Totals -	2000	2000	---	6.0005

Table 2
Calculation of Wt. Dev

So what do we do with the number 6.0005? Is it too big or too small? Here is where some mathematics help us out.

Notice that so far we have not really done any mathematics save for a little arithmetic. The formula in (3) was not derived from any prior theorems or formulas; it was simply a matter of opinion. It was an attempt to condense into a single number (called a statistic, by the way) the deviations from the expected number of outcomes using, what we hope, is good judgment. Now we have to interpret what we have done. Here comes the good news.

There is a distribution in probability theory known as the Chi Squared distribution and we know exactly what it looks like. It's shape depends upon a single parameter u called the degrees of freedom. Any book of mathematical tables will contain a table of these numbers and their corresponding probabilities. Here is what a typical pair of entries looks like:

Prob. -> Deg of Fr	0.01	0.10	0.90	0.99
4	0.297	1.06	7.78	13.3
24	10.9	15.7	33.2	43.0

Table 3
Chi Squared Examples

The table is easy to read; let me show you how it works for 4 degrees of freedom. It says that 1% of the time the Chi Square statistic will be less than 0.297, 10% of the time it will be less than 1.06, 90% of the time it will be less than 7.78, and 99% of the time it will be less than 13.3%. Okay, so what does this have to do with our statistic in (3)? Simply this. One can prove that as the number of trials becomes large, the statistic in (3) becomes closer and closer to the Chi Squared statistic with v - 1 degrees of freedom. This means that our statistic in (3) is approximated by the Chi Squared statistic. How good is the approximation? The rule of thumb is that if every np_v is 5 or greater the approximation is a good one.

Back to our example. The smallest np_v here is 100 so we're fine on the approximation. Our statistic of 6.0005 falls right between 1.06 and 7.78 or in the central 80% range. Had it been larger than 13.3 or smaller than 0.297, each of which should happen only 1% of the time, RND would be highly suspect. Had it been between 0.297 and 1.06 or between 7.78 and 13.3 it would have been suspect but not highly so since we would expect such results 18% of the time. Do we rest on this single calculation of 6.0005? No, it is prudent to do more tests and see if we generally fall within the central 80% range. Here are the results of 20 tests: 2.18325, 1.18125, 3.8925, 0.777, 1.8555, 2.4605, 6.00325, 3.6725, 6.3805, 4.37175, 8.822, 5.4758, 1.8255, 8.8588, 3.72575, 1.0055, 1.08675, 4.13175, 10.74675, 3.48675. Notice that four of them are outside of the 80% range and this is exactly consistent with the theory. So far RND is doing very well.

The above frequency test is not the only test one should perform. For example, one should check the frequency of Z_sZ_k where Z_k is the result immediately following Z_s. This test checks to see if the occurrence of any symbol influences the occurrence of the next symbol. Here there are 5² or 25 classes so the degrees of freedom would be 24. I'm not going to drag you through another test but do want you to note the following from Table 3. This time our statistic (calculated as in Table 2 but with 25 rows) should fall predominantly between 15.7 and 33.2 with occasional forays between 10.9 and 15.7 or 33.2 and 43.0. If any statistic is smaller than 10.9 or larger than 43.0 then RND would be highly suspect and only extensive further testing could prevent rejection.

The test you have just seen is called a Two Tailed Chi Square Test. There are many, many test formats that are used to evaluate RNGs. For example, one might use the RNG to simulate a 52-card deck of playing card and then deal out Poker hands (with replacement) to see if the frequencies match those of real Poker hands. This is actually done and is called (not surprisingly) the Poker Test. Not all tests are evaluated using the Chi Squared Test although it is the one that is most often employed. It is interesting to note that when state gaming commissions describe what constitutes acceptable randomness, some seem to leave it up to the experts and others (Colorado, for example) specifically mention the use of the Chi Squared statistic [1]. Here are the names of some of the other tests that are used: Gap Test, Coupon Collector's Test, Permutation Test, Spectral Test, Collision Test, and Correlation Test.

Notice that we never specifically said what "looks and acts like a truly random sequence" meant. This is not at all surprising for it is a deep and puzzling question that has many philosophers and mathematicians scratching their heads. In the January 2002 issue of the American Mathematical Monthly there was an article by Sérgio B. Volchan entitled What is a Random Sequence? [5] The article was 18 pages long and was very deep reading, at least for me. Near the end of the article he admits that even with all of the theory that has been developed on this question that one "cannot algorithmically decide whether a certain string is random." Rather, he says, one settles for a more pragmatic principle: "if it acts randomly, then it is random." Indeed, that is exactly the best we can do at this point in time.

So if an RNG candidate passes m statistical tests for randomness isn't it always possible that it could fail on the m+1st test? Yes, indeed. The hope is that the candidate RNG passes enough tests that we feel comfortable that its pseudo random behavior really does act like a random sequence. At this time in history that's how it works; it is innocent until proven guilty.

See you next month.

References:

[1]    Hannum, Robert & Cabot, Anthony, Practical Casino Math, Institute
        for the Study of Gambling and Commercial Gaming, Reno, NV, 2001
        pp 213-225

[2] Kilby, Jim & Fox, Jim, Casino Operations Management, John Wiley
& Sons, 1998, p 128

[3]    Knuth, Donald, The Art of Computer Programming, Vol. II,
        Seminumerical Algorithms, 3rd Edition, Addison Wesley, Reading
        MA, 1998

[4]    Robison, John, Everything You Ever Wanted to Know About the
        RNG, The New Chance and Circumstance, Issue #22, Winter 2000
        pp 27-30

[5] Volchan, Sérgio B., What is a Random Sequence?, American
Mathematical Monthly, Vol. 109, No. 1 January 2002, pp 46-63

Non-Random Randomness - Part 2: Innocent Until Proven Guilty

Books by Donald Catlin:

Books by Donald Catlin: