Thursday, June 12, 2008

Let’s make a deal problem

Those of us who grew up with the show Let’s Make a Deal can understand the gyst of the let’s make a deal problem right away. For those of you too young (or old) to remember, here is a summary.

Monty Hall, the host, allows you to choose one of three curtains. Behind one of the curtains is a new car or another big prize, while behind the other two is a year’s supply of shampoo or the equivalent. You choose Curtain 1. Monty opens Curtain 2 and shows you it has a year’s supply of the shampoo. Then he gives you a choice:

a) stick with your original decision, or

b) switch to Curtain 3.

The intuitive conclusion is that it doesn’t matter: there are two curtains remaining, and they are equally likely to contain the prize. However, in this case, the intuition is wrong. If we assume Monty

1) always shows a curtain with the shampoo behind it;

2) never reveals the curtain you chose; and

3) randomly decides which of the remaining two curtains to reveal if the curtain you chose contains the car,

then switching to Curtain 3 gives you a 2/3 chance of winning while sticking to Curtain 1 gives you a 1/3 chance of winning.

Why?

This problem, like many probability problems, is one of information. Initially, you have no information about any of the three curtains so each choice gives you a 1/3 chance of winning. By showing you the curtain with the shampoo, you have learned nothing new about the curtain you originally chose—because there was no way, whether your curtain had the car or the shampoo, that Monty was going to show you what was behind your curtain. Your curtain had, and still has (as far as you know), a 1/3 chance of containing the car. However, you did get information about Curtain 3: Monty did not choose to reveal it. This could mean one of two things:

A. Curtain 3 has the car, and therefore Monty had to show you Curtain 2, as he would never reveal the curtain with the car (1/3 chance, calculated by taking the 1/3 chance that Curtain 3 has the car and multiplying by the 100% chance that he reveals Curtain 2 when the car is behind Curtain 3); or

B. Curtain 1 has the car, and Monty chose to reveal Curtain 2 (1/6th chance, calculated by taking the 1/3 chance that curtain 1 has the car and multiplying by the ½ chance that Monty reveals curtain Number 2 when the car is behind Curtain 1).

These probabilities do not sum to 1, because we are excluding the outcomes, now impossible, where Monty reveals Curtain Number 3. In order to revise the probabilities to take into account what was revealed by Monty, we need to divide the probabilities in A (1/3) and B (1/6) above by the chances of the two possible remaining outcomes (1/3 plus 1/6 = 1/2). Thus, outcome A (car is behind Curtain 3) has a probability of (1/3) / (½)= 2/3, while outcome B (car is behind Curtain 1) has a probability of (1/6)/(1/2) = 1/3.

The intuition is as follows: Monty always reveals Curtain 2 or 3 when you choose 1, so you do not get any more information about whether it is behind 1 by this revelation, but you do gain information about 2 and 3 from this revelation, since he never reveals 3 if the car is behind it but does sometimes reveal 3 if the car is not behind it. Thus, the fact that Monty did not reveal Curtain 3 tells you something.

[Note: this problem has been around for awhile, but was made famous by Marilyn Vos Savant’s discussion of it and the subsequent outcry by those who insisted her answer, the correct one, was wrong. See, for example: http://www.letsmakeadeal.com/problem.htm]

Technical Explanation

There are a whole class of problems in probability that involve updating the chances based on new information. These problems are solved according to Bayes’ Rule, after a law in probability that specifies how to update probabilities with new information (for a full discussion, including discussion of whether the Reverend Bayes was actually the first to discover this theorem, see the Wikipedia entry: http://en.wikipedia.org/wiki/Bayes'_theorem).

To understand Bayes’ Rule, we need to first know the notation used for conditional probability. We use the vertical line ( | ) to denote a condition and, as in prior blogs, P(A) is the probability that event A occurs. Thus, P(A|R2) is the probability that A occurs, given that R2 already occurred. Bayes' Rule is:

P(A|R2) = P(R2|A)*P(A) / P(R2)

So let:

A=event that prize is under Curtain 3

R2= event that Monty reveals the curtain 2 contents

C=event that prize is under Curtain 1

Now we can figure out the right side of the Bayes’ Rule equation, in order to figure out P(A| R2).

We know P(R2|A) = 1, because Monty won’t reveal curtain 3 when it contains the prize and he won’t reveal curtain 1 because you chose curtain 1.

P(A) = P(C) = 1/3 ==> remember, this one is unconditional, so given three curtains, there’s a 1/3 chance of the prize being behind each.

To figure out P(R2), it is useful to note that for any events R2 and A, P(R2 and A) = P(A) * P(R2|A)

In our case, the P(R2) is the sum of the probabilities of 2 exclusive events:

1) prize is under curtain 3 (event A) and Monty reveals curtain 2 (event R2): 1/3 * 1=1/3

2) prize is under curtain 1 (event C) and Monty reveals curtain 2 (event R2) 1/3*1/2 = 1/6.

This sum, 1/3 plus 1/6 is ½=P(R2).

Thus, by Bayes Rule, P(A| R2) = (1*1/3) / ½ = 2/3

Just for fun, now you can compute P(C|R2) = P(prize is under Curtain 1 given that Curtain 2 is revealed) = 1/3 using Bayes’ Rule.


False Positives in Cancer Diagnoses


The outcome of Bayes’ Rule can be very confusing, and is important to keep in mind in more important problems than the Let’s Make a Deal problem. For example, suppose an MRI for breast cancer has a false negative rate of 1/100, meaning that the test will incorrectly indicate that you do not have cancer when you in fact do 1 in 100 times. Similarly, the test might also have a false positive rate of 1 in 100, meaning that the test will incorrectly indicate that you do have cancer when in fact you do not 1 in 100 times (false positive rates for MRIs over time can be much higher, because they are frequently done once or twice a year: see the recent article about a study of false positives in MRIs for breast cancer screening, which were around 25% over time.

Suppose your MRI result just came out positive for breast cancer. What are the chances you actually have breast cancer?

First, it’s useful to know that around 250,000 women a year get breast cancer (see this site) and there are about 60 million women above the age of 40 (see census site), when most cases occur. This represents an annual infection rate of nearly 1 in 200.

Let’s define the probabilities:

P(C) = Probability of breast cancer in a given year = 1/200 = 0.005

P(D| not C) = Probability that MRI diagnosed cancer given that you do NOT have cancer = false positive = 1/100 =0.01

P(N|C) = Probability of MRI did not diagnose given that you have cancer = false negative = 1/100= .01

P(D|C) = 1-P(N|C) = Probability that MRI diagnosed cancer given that you have cancer = 99/100 =0.99

We want P(C|D) = Probability of cancer, given a cancer diagnosis by MRI.

Before using Bayes’ Rule, we can first define P(D) as the sum of the probabilities of all exclusive events that include D. In English, the chance of diagnosis is the sum of 1) the chance that you have cancer and are diagnosed and 2) the chance that you do not have cancer and are diagnosed. Thus, P(D) = P(D|C)*P(C) + P(D|not C)* P(not C) = 0.99*0.005 + 0.01* .995 =.0149

Using Bayes’ Rule:

P(have cancer given the MRI result shows cancer) = P(C|D) = P(D|C)*P(C)/ P(D) = 0.99 * .005 / .0149 = .33 or about 1/3.

Thus, a very effective MRI test for cancer, which gives the wrong result only 1% of the time, is still suspect when it gives a result of cancer. In fact, an MRI diagnosis of cancer indicates only a 1/3 probability of actually having cancer (keep in mind while there are indications that false positives I used here for the MRI are made up, though they do appear to be at least in the 1% range).

It’s easy to understand what happens logically when you imagine that 200 women come in for screening. Only 1 will probably have cancer, since the cancer rate is about 1 in 200. The MRI will almost surely diagnose her (99% chance). For the other 199, the MRI will indicate no cancer for all but about 1%, which means it will indicate cancer for about 2 of them. Thus, of the 3 cases where the MRI indicates cancer, 2 of them will be false indications.