Stay informed with the Stay informed with the
Recent Articles
Best of Donald Catlin  # Cardano's Gaff Lives On

26 May 1999

In the Sunday December 27, 1998 issue of Parade Magazine, the following letter appeared in Marilyn vos Savant's very popular column 'Ask Marilyn':

"At a monthly 'casino night,' there is a game called Chuck-a-Luck: Three dice are rolled in a wire cage.  You place a bet on any number from 1 to 6.  If any one of the three dice comes up with your number, you win the amount of your bet.  (You also get your original stake back.)  If more than one die come up with your number, you win the amount of your bet for each match.  For example, if you had a \$1 bet on number 5, and each of the dice came up with 5, you would win \$3.
It appears that the odds of winning are 1 in 6 for each of the three dice, for a total of 3 out of 6 - or 50%.  Adding the possibility of having more than one die come up with your number, the odds would seem to be in the gambler's favor.  What are the odds of winning this game?  I can't believe that a casino game would favor the gambler."

Indeed!  The 'Ask Marilyn' writer (let's call him 'Chuck' for Chuck-a-Luck) made a mistake that was first made over 400 years ago by an Italian by the name of Gerolamo Cardano.  Cardano (1501 - 1576) was a 'jack of all trades': philosopher, physician, scientist, and so on, and one of his favorite avocations was gambling.  In 1663 a book written by Cardano, Liber de Ludo Alae, Book on Games of Chance, was published posthumously.  His book is the first known attempt by man to formalize the notion of chance events.  In his book he originated the idea of thinking of probability as a frequency ratio and he also conceived of a rather vague notion of expected value (ideas which will be discussed  in subsequent articles at this site).  He was very interested in dice and cards (not surprisingly) and posed the problem of determining the number of throws of two dice necessary to roll a total of two on at least one roll with an even chance.  His answer was 18 which is, unfortunately, wrong.  We'll see why this is so later in this article and we'll also see that Chuck made the very same error that Cardano made.

It is interesting to note that around the same time that Cardano was working on his dice question, a similar dice question was posed to the French mathematician Blaise Pascal (1623 - 1662) by a gambler and roué by the name of Antoine Gombauld, the Chevalier de Mere.  Pascal consulted with another contemporary, Pierre de Fermat, and correctly answered de Mere's question.  For this reason, and the fact that Cardano's work was largely unknown until the 20th century, Pascal is credited with being the founder of probability theory (and rightly so, I believe).  Cardano was, nevertheless, a colorful character and we can use his mistake to help us to understand how to do things correctly.  Thanks, Cardano, and thanks to Chuck as well.

It would be hard to explain cheese to someone who had never heard of milk.  In the 16th century, Cardano was in a situation analogous to this; no one had ever heard of probability theory.  Since both Chuck's and Cardano's questions involve probability theory, I guess we had better dredge up some of these ideas which, fortunately for us, do exist today.  The basic building block in probability theory is the sample space.  Whenever someone carries out an experiment, trial, throw, draw, random pick, and so on, the set of all possible outcomes is called the sample space.  For example, if we throw a pair of dice, the set of all outcomes S can be listed in an array such as: (1)
It might help you to think of the dice as being different colors, say red and white.  For the outcome (x, y), the x represents the number on the red die and the y represents the number on the white die.  It is standard mathematical notation when specifying a set to list its elements in curly brackets such as {a, b, c, d, e}.  Sometimes, however, the sets involved are so large that we use a shorthand notation known as set builder notation.  If I were to write S in (1) using set builder notation, I would write something like
 S  = { (x, y) | x and y are each integers 1 through 6} (2)
The line in (2) is read as follows: "S is the set of all pairs of numbers (x, y) such that x and y are each integers 1 through 6."  In other words the general format is
 S = { generic set element | condition for generic element to be in the set} (3)
We'll find this notation handy to have around.  For example, if we think of the three dice in the Chuck-a-Luck game as being identified as die 1, die 2, and die 3, the set of outcomes can be described as follows:
 S = { (x1, x2, x3) | each xi is an integer 1 through 6} (4)
Here, of course, xi represents the outcome on die i.  How big is this sample space?  Well, there are 6 choices for x1 and for each of those there are 6 choices for x2, for a total of 6 x 6 or 36 ways to choose the first two; see the sample space in (1).  Now, for each of these 36 choices, there are 6 ways to choose x3, so there are 36 x 6 or 216 elements in S.  Now you can see why (4) is so convenient; we wouldn't really want to list all 216 of these outcomes.

What about probability?  We'll talk more about this in my next article, but for now suffice it to say that the probability of an outcome is a number between 0 and 1 intended to represent the chance or frequency of that outcome occurring.  How do we determine such a number?  Well, in very symmetric situations such as dice or coins, the assignment of a probability to each outcome goes all the way back to 1812 and a paper entitled Théorie Analytique des Probabilitiés written by Pierre Simon de Laplace (1749 - 1827).  In his paper Laplace put forth his Principle of Sufficient Reason

Essentially, Laplace said that when faced with alternative outcomes, one should assign them the same probability unless there is sufficient reason to do otherwise.  Nowadays, with fancy subjects like Information Theory available, this idea has been upgraded to say that probability assignments should contain no more information than is inherent in the situation under study.  As far as we're concerned, (phrasing it in the current jargon) we're not going there; Laplace will suffice for us.  So, for the roll of a pair of dice, for example, we would assign the probability 1/36 to each of the outcomes.  For the Chuck-a-Luck game, we would use 1/216.  This assignment is called a probability density and for any outcome z, it is denoted by p(z).  In the dice example (1), therefore, we would write p((x, y)) = 1/36 for each outcome (x, y).

There is a story about a little girl who was asked to write a book report about a book on penguins.  Her report consisted of one sentence: "This book told me more about penguins than I ever really cared to know."  Often the individual outcomes in a sample space tell us more than we really care to know about the result.  For example, if rolling for a point on the Pass Line at Craps, we really don't care if a particular outcome is (2,5), (5,2), or (4,3).  All we care about is that the total is seven.  In fact, as I think you will see, I can define a subset of the set S described in (1) that represents the event of rolling a seven.  It is

 E7 = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)} (5)
For finite sample spaces like we are studying, any subset of the sample space is called an event.  This terminology is standard.  As we move along, the idea of identifying events with subsets of S will seem quite natural to you.  We can also easily talk about the probability of  E7 occurring.  Since there are 6 outcomes in E7 out of 36 possible outcomes in S, we can say that the probability of E7 occurring is 6/36 or 1/6.  In general, we define the following:  If E is an event in S and E = {e1, e2, e3, ..., em}, then we define the probability measure of E, written P(E), as follows:
 P(E) = p(e1) + p(e2) + p(e3) + ... + p(em) (6)
(If E is empty, we define P(E) to be zero.)  In other words, to calculate the probability of event E, we simply add up the probabilities of all of the outcomes in E.  For probability densities that are uniform (that is, are equal on each outcome), (6) simply amounts to counting the outcomes in E and dividing by the number of outcomes in SP(E) is read "the probability of event E occurring."

Well, what does all of this have to do with Chuck?  Believe it or not, I think Chuck's problem has to do with the little word 'or.'  You probably never thought about it, but there are two different 'ors' in the English language and only one word to describe both.  This can cause some confusion.  One is the 'exclusive or' as in "Marry me or I'll kill myself."  Here the intent is that one or the other of the events described will occur, but not both.  Computer programmers often use the symbol 'eor' to specify this type of  'or'.  The other is the 'inclusive or' as in "Either I'm crazy or you're crazy."  Hey, maybe we're both crazy!  This latter 'or' is often written, much to the annoyance of editors, as 'and/or' for the purpose of making clear to the reader that the writer is using the inclusive 'or'; that is, either or both of the events described are true.  In probability theory, it is this latter 'or' that is meant when we write 'or', and it is this 'or' that both Chuck and Cardano implicitly used in formulating their questions.  I suspect, however, that when calculating they were both thinking 'eor'.

Suppose in our dice example (1) we define F to be the event of rolling a total of 4 and H to be the event of rolling a hardway total, that is, both dice show the same number.  Then

 F  = {(1,3), (2,2), (3,1)}, (4,4), (5,5), (6,6) H  = {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)} (7)
What about the event F or H, or being the probabilistic 'or'?  This is the event
 F or H = {(1,1), (1,3), (2,2), (3,1), (3,3), (4,4), (5,5), (6,6)} (8)
since this set contains all outcomes in either or both sets.  According to (6), we would have P(F or H) = 8/36 since there are 8 outcomes in the set F or H.  But again by (6), P(F) = 3/36 and P(H) = 6/36 so one thing we know for sure is that the inequality P(F or H) < P(F) + P(H) holds ('<' means 'less than').  The reason is clear.  When we calculate the right hand side of this last inequality, the number p((2,2)) gets counted twice, whereas on the left side it gets counted only once.  What if the two sets had no points in common?  Clearly in this case we would get an equality.  Let me summarize this observation.

A collection of events E1, E2, E3, ... , En is said to be pairwise disjoint provided no pair of them have any points in common.  The two sets F and H in (7) do not form a pairwise disjoint collection.  Given that a collection  of events E1, E2, E3, ... , En is pairwise disjoint, then we can always conclude that

 P(E1 or E2 or E3 or ... or En) = P(E1) + P(E2) + P(E3) + ... + P(En) (9)
The hypothesis of pairwise disjointness is essential.  If any pair of sets have even one point in common, the right side of (9) will be larger than the left.

Okay, let's get back to Chuck.  Let me define three sets in the sample space S given by (4).  The event Di will mean that die i shows a five.  Chuck correctly noted that P(Di) = 1/6 for each i = 1, 2, or 3.  Chuck also wanted to calculate the probabilty of winning, that is, of getting at least one five when rolling.  In other words, he wanted to calculate P(D1 or D2 or D3), the probability that either the first die shows a 5 or the second shows a 5 or the third shows a 5.  The trouble was he used (9) and the collection D1, D2, D3 is not a pairwise disjoint collection.  For example, the outcome (5, 5, 3) is in both D1 and D2

Cardano made the same error.  To see this, define Ei to mean that the dice shooter rolls a total of 2 on the ith roll.  Cardano wanted to find a formula for P(E1 or E2 or E3 or ... or En) and then determine n so that this expression is 1/2.  Unfortunately, Cardano (mentally) used (9) and decided that

 P(E1 or E2 or E3 or ... or En) = n/36 (10)
Since 18/36 = 1/2, we see how Cardano got his answer.  Unfortunately, equality (10) is wrong because the events involved are not pairwise disjoint events; one could roll a 2 on several rolls.

Now, Cardano's problem is a lot harder than Chuck's.  We'll eventually solve Cardano's problem in these articles, as well as several more dice problems, but not until we develop some more tools.  Chuck's we can handle with ease.

Let me define three events in the sample space given in (4)  by saying that Ei is the event that exactly i of the three dice show a 5, i = 1, 2, or 3.  Now, there is one way to roll a 5 on a die and five ways to roll a non-five.  So there are 1 x 5 x 5 or 25 ways to roll an outcome of the form (5, no 5, no 5), 5 x 1 x 5 or 25 ways to roll an outcome of the form (no 5, 5, no 5), and 5 x 5 x 1 or 25 ways to roll an outcome of the form (no 5, no 5, 5).  Altogether, then, there are 25 + 25 + 25 or 75 ways of obtaining exactly one 5.  In other words, we have shown that  P(E1) = 75/216.  Similarly, it is easy to argue that P(E2) = 15/216 and P(E3) = 1/216.  The probability of a win at Chuck-a-Luck is just  P(E1or E2 or E3), that is, the probability we either get one five or two fives or three fives.  Note, however, that the collection  E1, E2, E3 is a pairwise disjoint collection.  "Why?", you say.  Well, I'll bet you can't roll exactly one five and exactly two fives on the same roll without the aid of several beers and really poor eyesight.  So we can use (9):

 P(E1 or E2 or E3) =  P(E1) + P(E2) + P(E3) = 75/216 + 15/216 + 1/216 = 91/216
Aha, the probability of winning is considerably less than one half.  The irony here is that after all of this work, this number isn't really what we want to know nor is it what Chuck should have wanted to know.  Next month I'll continue with this game and show you that to ask for the probability of winning a gambling game is to ask the wrong question.  Then, in the same article we'll figure out what the right question is.  What is more, we'll see how Ms. vos Savant answered Chuck.

Cardano, rest his soul, was a bright man.  He simply did not have all of the modern concepts with which to formulate his problem.  We, on the other hand, have over 400 years of concepts and ideas on which to draw.

It has been said that an expert in a field is a person who knows most of the mistakes one can make and how to avoid making them.  As I tell my students, since we know about it, let's not make Cardano's error again.  Do they listen?  Some of them.  I hope the rest stay out of the casinos.  See you next month.

Recent Articles
Best of Donald Catlin
Donald Catlin Don Catlin is a retired professor of mathematics and statistics from the University of Massachusetts. His original research area was in Stochastic Estimation applied to submarine navigation problems but has spent the last several years doing gaming analysis for gaming developers and writing about gaming. He is the author of The Lottery Book, The Truth Behind the Numbers published by Bonus books.

#### Books by Donald Catlin:

Lottery Book: The Truth Behind the Numbers
Donald Catlin
Don Catlin is a retired professor of mathematics and statistics from the University of Massachusetts. His original research area was in Stochastic Estimation applied to submarine navigation problems but has spent the last several years doing gaming analysis for gaming developers and writing about gaming. He is the author of The Lottery Book, The Truth Behind the Numbers published by Bonus books.

#### Books by Donald Catlin:

Lottery Book: The Truth Behind the Numbers