What’s the point of Bayes’ rule? This web page by Eliezer S. Yudkowsky gives a long intuitive explanation (thanks to Keith Frankish for pointing to it). This blog post is an attempt at a slightly shorter version with a bit more maths, and a bit of rambling about interpretation.
The information in the example problem given there is as follows:
- 1% of women at age forty who participate in routine screening have breast cancer.
- 80% of women with breast cancer will get positive mammographies.
- 9.6% of women without breast cancer will also get positive mammographies.
The task: A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
The general problem solved by Bayes’ rule is that if you know the probability of if A, then B, how do you work out the probability of if B, then A? More precisely if you know P(B|A), what is P(A|B)?
Here B|A denotes the conditional event, a simultaenously easy and difficult concept. One way to think of it is as follows.
Consider a fair die with six sides. It’s thrown. What’s the probability of a six given that a side showing an even number lands upwards? (Van Frassen, 1976 used an example like this to explain the conditional event interpretation of the natural language if-then.) This is P(lands six|lands even). The idea is that you only consider cases where it’s showing an even number (2, 4, or 6). Assuming they’re all equally probable, then P(lands six|lands even) = 1/3.
The first stage of solving problems like that above is interpretating the problem in the language of the mathematical theory you want to use.
Let denote “has cancer”, denote “does not have cancer”, denote “shows a positive test result”, and denote “shows a negative test result”.
Let’s take each item of information individually.
1% of women at age forty who participate in routine screening have breast cancer.
There’s a mix of information here: a percentage of people (1%), from a particular sub-population (women, aged 40, who participate in routine screening), and a property they have. From the problem it is clear that the interpretation is supposed to be:
But one can imagine a more complicated formalisation, for instance if the population of interest contains women of many different ages, some, but not all, of whom were screened because they had some worry about their health.
80% of women with breast cancer will get positive mammographies.
This is an instance of
X% of people with property A have property B
The intended interpretation is P(B|A) = X%, but this might not be obvious to all readers. Take some:
Some people with property A have property B
If this is interpreted as an existential quantifier, then it also follows that some people with property B have property A. The conditional event, B|A, is in general not reversable in this way, so would not be suitable for the interpretation of an existential “some”. Consider the following statement:
All people with property A have property B
This is not (in general) reversable. The percentage quantifier (used in the problem description) is also not reversible. So there’s quite a lot of trickiness involved in interpreting this innocent looking statement. Given some background knowledge (we know the article is about Bayes’ rule, and about conditional probabilities), the intended interpretation of the original information is:
The idea is that if we choose a person at random from the population of interest, who has cancer (i.e., we know for sure she has cancer), then the probability of her having a positive test result is .8.
Then similarly for the last sentence:
9.6% of women without breast cancer will also get positive mammographies.
The formalisation is:
Here is the summary:
Now the problem statement:
A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
We have to infer . Note how this is a reversal of the conditional statements we encounted in the information given about the test.
Now comes the calculation. A good place to start when thinking about conditional probability is the ratio formula for the probability of a condititional event:
Take an interpretatation of “If it is raining, then I have an umbrella” as the conditional event expression:
I have an umbrella | it is raining
The probability of this is the probability that I have an umbrella and it is raining, divided by the probability that it is raining.
This can easily be rewritten to
So if you know the probability of rain, and the probability that I have an umbrella when it rains, then you can multiply them to infer the probability that it is raining and I have an umbrella.
One step towards Bayes’ rule begins with:
- [ in (this) probability theory, so it does not matter what order you write them]
From 2 we can infer , which slots into 1 to give
Now use the same variables as in the original problem
We can already fill in the numerator (top row) with and , but not yet the denominator (bottom row).
Let’s work a bit further then. We can infer as follows:
Which is easily calculated from the rewrite of the conditional probability above:
One more thing: . So this gives:
Now we have everything we need: