Sample size

Hi Frank,

I recently re-read your book Beat the Craps out of the Casinos before I took a trip to Tunica for some entertainment, gambling, and visiting with family. As always the 5 count eliminates those rollers who won't make any money for me anyway and greatly enhances my chances of having a bankroll to wait at the table until it warms up. After reading your chapter about the mathematical analysis performed by Dr. Catlin and my own observations at the craps table, I began to wonder if the following information might be available. Hopefully this might turn into an idea for you to include in an upcoming book you may be in the process of writing.

Given Dr. Catlin performed analysis on 200 million random shooters, which is great for ensuring research is thorough, how many rolls of the dice would be required to simulate the analysis with a 95% and 99% statistical accuracy compared to the larger analysis by Dr. Catlin? I am guessing that a sample size of somewhere in the range of 30 to 50 rolls of the dice will be relatively close to the larger population analyzed but I don't know and I'm not a statistical wizard.

Within the 200 million random shooters, did Dr. Catlin perform an analysis of the average peak or decline, which could be condensed to a 2 or 3-hour session at the table? What I think would be valuable is to understand if a shooter who stands at the table for 3 hours can expect to see their initial bankroll see a peak of X% and at some point a valley of Y%. This would help the average shooter know what to expect, and when would be a good time to walk away from the table.

Additionally, by employing the 5 count – and let's assume a shooter bets on the pass line with full odds and one come bet with full odds as the betting style – the likelihood grows of being able to have enough funds/bankroll to stand at the table and experience a peak of X% from (an) initial bankroll.

My observations and experiences over the last 2 to 3 years after having read Beat the Craps Out of the Casinos and The Craps Underground is I'm winning much more often and the wins are larger and although I play the same style – I believe it is due to the 5 count and the bankroll issue of being able to weather a downturn and take advantage when a hot roll shows up. Research by Dr. Catlin may be able to back this up.

Your thoughts, experience, and comments would be appreciated.

Best of luck,
Jim Christa

Wow, Jim, that's quite a plateful. Let me first address the topics with which I can't help you. I don't see how to define mathematical quantities that correspond to your X% and Y%. There is, of course, the classical Gambler's Ruin problem, which determines probabilities of crossing various bankroll thresholds, but that doesn't seem to be what you are asking. I'll keep thinking about it but I'm not making any promises.

I'm happy that you have had such good luck with the 5 count. You should realize, however, that as I showed and Frank has openly reiterated, unless you are using a controlled shot, or some other player at the table is using a controlled shot, your expected result will be a 1.414...% loss of your Pass Line wagers (the odds are even money). The 5 count simply reduces the number of such wagers in a given time period.

Now, about the sample size, let's look at a problem for which we already know the answer, namely, a Pass Line wager with a random shooter. The question is how many hands (not rolls) must we play to have a (say) 95% chance of having an empirical estimate of the correct 1.414...% that is within error e? The sample space is {Pass, Don't Pass}. We define a sequence of random variables X_i where i = 1, 2, ..., n as follows: X_i = 1 if the outcome is Pass and 0 if the outcome is Don't Pass. Similarly we define a sequence Y_i such that Y_i = 1 if the outcome is Don't Pass and 0 if the outcome is Pass. If the expected value E(X_i) of X_i is p and the expected value E(Y_i) = q then we know from direct calculation (see my article The Pass Line in the archives) that p = 244/495 and q = 251/495. The expected return for the Pass Line is p – q = -7/495, which is approximately -1.414%.

Clearly if we form the sum X₁ + X₂ + ... + X_n this represents the number of units won in our n hands. Similarly Y₁ + Y₂ + ... + Y_n represents the amount lost. The difference in these two sums represents the net won (actually lost) and this number divided by n is an estimate of p – q. Now X₁ + X₂ + ... + X_n divided by n represents the average or mean of the wins and will be denoted by X; similarly Y will denote the average of the losses. Thus X – Y is an estimator for p – q. Now for each i it is easy to see that X_i+ Y_i = 1 so if we add all of these terms up and divide by n we have X + Y = 1 so Y = 1 – X. Thus we can replace Y by 1 – X in the expression X – Y and obtain 2X – 1. The expected return for this random variable is clearly 2p – 1.

Using the above facts we can now state our objective. We want 2X – 1 to differ from 2p – 1 by less than error e. In symbols

2p – 1 – e < 2X – 1 < 2p – 1 + e (1)

If we add 1 to the three terms in (1) and then divide by 2 we obtain

p – e/2 < X < p + e/2 (2)

or subtracting p throughout (2)

-e/2 < X – p < e/2 (3)

The sequence of random variables X_i is an independent sequence so the variance of the sum is the sum of the variances. Since E(X_i²) = p1² + q0² = p we have

Var(X_i) = E( (X_i – p)²) = E(X_i²) – 2pE(X_i) + p² (4)

Var(X_i) = p -2p² + p² = p - p² = p(1- p) (5)

It follows from (5) that

Var( (1/n)(X₁ + X₂ + ... + X_n) =(1/n)²np(1 – p) = p(1 – p)/n (6)

Thus the variance of X is the expression in (6) so the standard deviation of X is the square root of that expression.

The sequence we have been dealing with is called a sequence of binomial random variables and this sequence can be approximated very accurately with the normal distribution. The expression (X – p)/sqr(p(1 – p)/n), where sqr represents the square root, is a standard normal random variable. From a standard normal table we can look up the value for 95%; it is 1.96. Hence we have the assertion that

P( - 1.96 < (X – p)/sqr(p(1 – p)/n) < + 1.96) = 0.95 (7)

where P represents probability. Multiplying expression (7) through by sqr(p(1 – p)/n) we obtain

P( -1.96 sqr(p(1 – p)/n) < X – p < 1.96 sqr(p(1 – p)/n)) = 0.95 (8)

Comparing expression (8) with (3) we see that if we want the difference between X and p to be within e/2 with probability 0.95 then we better set

e/2 = 1.96 sqr(p(1 – p)/n)) (9)

Solving (9) for n we obtain

n = 4(1.96)²p(1 – p)/e² (10)

So, we know p and can calculate p(1 – p); it is 0.24995. If we want our estimate to be within a tenth of a percent then e = 0.001. Substituting these into expression (10) we get 3,840,831. I guess my 200 million was sufficient but I don't think 30 to 50 trials will do the job. In fact 200 million gives me an accuracy of better than 0.0003. Incidentally, in (10) if you want 99% confidence, replace 1.96 by 2.525.

If any of you have questions I can be reached at 711cat@comcast.net. See you next month.

This article is provided by the Frank Scoblete Network. Melissa A. Kaplan is the network's managing editor. If you would like to use this article on your website, please contact Casino City Press, the exclusive web syndication outlet for the Frank Scoblete Network. To contact Frank, please e-mail him at fscobe@optonline.net.

Sample size

Books by Donald Catlin:

Books by Donald Catlin: