Tagged: Probabilistic

Can probability theory be used to model epistemology in mathematics?

Chater and Oaksford (2009) write:

“… logic, and indeed the whole of mathematics, concern determining what beliefs can consistently be held – a mathematical proof shows, given a set of axioms, some theorem must follow; i.e., that the axioms and the negation of the theorem are not consistent. But it seems rather unnatural to see the whole of mathematics as a special case of probability theory!”

First problem: mathematics is bigger than formal logic, as Gödel proved with his incompleteness (really incompletability) theorem. Mathematical proofs as constructed by real mathematicians are not formal. Philosophers debate endlessly what a proof is.

Leaving Gödel aside, yes, classical logic can be used to characterise relations between premises and a conclusion in mathematical proof (a model of epistemology in mathematics which probably isn’t terribly accurate). Even today in the post-Gödel world many researchers are working on formalising segments of mathematics, e.g., in the Isabelle and Coq proof assistants. A big example is the attempt to formalise a proof of Kepler’s conjecture.

So far so good for (non-probabilistic) logic as a model of epistemology.

Consider Fermat’s last conjecture. Before Wiles proved the conjecture, I bet many mathematicians were pretty sure, but not certain, that it held. Or consider the more general case of how mathematicians decide what conjectures to try to prove in the first place. They must use some cues to decide whether the probability of the conjecture is high before spending a lot of time embarking on a proof. Why not use probability to model epistemology in mathematics? I have no idea if such a probability logic of mathematical conjectures (quantifiers, useful structures, etc, plugged into probability) is feasible in practice.

I see no problem in viewing classical logic as a special case of probability theory without a conditional event, and suppositional logic with its “defective” truth-tabled conditional as a special case of probability with the conditional event. A strong reason for this belief is that (as far as I understand) there are proofs of these relationships between logic and probability (or perhaps better phraseology is certain versus uncertain logic).

Complication: the following is a deductive relationship:

P(B1|A1) = p_1,
P(B2|A2) = p_2,

P(Bn-1|An-1) = p_n-1
P(Bn|An) = p_n

Another complication: the logic used in the mathematical vernacular looks like a bit like informal classical logic, even when used at the meta-level to reason about non-classical logics.


Chater, N., & Oaksford, M. (2009). Local and global inferential relations: Response to Over (2009). Thinking and Reasoning, 15, 439–446.


Reasoning to an interpretation before applying Bayes’ rule

What’s the point of Bayes’ rule?  This web page by Eliezer S. Yudkowsky gives a long intuitive explanation (thanks to Keith Frankish for pointing to it).  This blog post is an attempt at a slightly shorter version with a bit more maths, and a bit of rambling about interpretation.

The information in the example problem given there is as follows:

  1. 1% of women at age forty who participate in routine screening have breast cancer.
  2. 80% of women with breast cancer will get positive mammographies.
  3. 9.6% of women without breast cancer will also get positive mammographies.

The task: A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

The general problem solved by Bayes’ rule is that if you know the probability of if A, then B, how do you work out the probability of if B, then A?  More precisely if you know P(B|A), what is P(A|B)?

Here B|A denotes the conditional event, a simultaenously easy and difficult concept.  One way to think of it is as follows.

Consider a fair die with six sides.  It’s thrown.  What’s the probability of a six given that a side showing an even number lands upwards? (Van Frassen, 1976 used an example like this to explain the conditional event interpretation of the natural language if-then.)  This is P(lands six|lands even).  The idea is that you only consider cases where it’s showing an even number (2, 4, or 6). Assuming they’re all equally probable, then P(lands six|lands even) = 1/3.


The first stage of solving problems like that above is interpretating the problem in the language of the mathematical theory you want to use.

Let C denote “has cancer”, \neg C denote “does not have cancer”, T denote “shows a positive test result”, and \neg T denote “shows a negative test result”.

Let’s take each item of information individually.

1% of women at age forty who participate in routine screening have breast cancer.

There’s a mix of information here: a percentage of people (1%), from a particular sub-population (women, aged 40, who participate in routine screening), and a property they have.  From the problem it is clear that the interpretation is supposed to be:

P(C) = .01

But one can imagine a more complicated formalisation, for instance if the population of interest contains women of many different ages, some, but not all, of whom were screened because they had some worry about their health.

Next sentence:

80% of women with breast cancer will get positive mammographies.

This is an instance of

X% of people with property A have property B

The intended interpretation is P(B|A) = X%, but this might not be obvious to all readers.  Take some:

Some people with property A have property B

If this is interpreted as an existential quantifier, then it also follows that some people with property B have property A.  The conditional event, B|A, is in general not reversable in this way, so would not be suitable for the interpretation of an existential “some”.  Consider the following statement:

All people with property A have property B

This is not (in general) reversable. The percentage quantifier (used in the problem description) is also not reversible.  So there’s quite a lot of trickiness involved in interpreting this innocent looking statement. Given some background knowledge (we know the article is about Bayes’ rule, and about conditional probabilities), the intended interpretation of the original information is:

P(T|C) = .8

The idea is that if we choose a person at random from the population of interest, who has cancer (i.e., we know for sure she has cancer), then the probability of her having a positive test result is .8.

Then similarly for the last sentence:

9.6% of women without breast cancer will also get positive mammographies.

The formalisation is:

P(T|\neg C) = .096

Here is the summary:

P(C) = .01
P(T|C) = .8
P(T|\neg C) = .096

Now the problem statement:

A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

We have to infer P(C|T). Note how this is a reversal of the conditional statements we encounted in the information given about the test.


Now comes the calculation. A good place to start when thinking about conditional probability is the ratio formula for the probability of a condititional event:

P(B|A) = \frac{P(A \& B) }{P(A)}

Take an interpretatation of “If it is raining, then I have an umbrella” as the conditional event expression:

I have an umbrella  |  it is raining

The probability of this is the probability that I have an umbrella and it is raining, divided by the probability that it is raining.

This can easily be rewritten to

P(A \& B) = P(B|A) P(A)

So if you know the probability of rain, and the probability that I have an umbrella when it rains, then you can multiply them to infer the probability that it is raining and I have an umbrella.

One step towards Bayes’ rule begins with:

  1. P(B|A) = P(A \& B) / P(A)
  2. P(A|B) = P(A \& B) / P(B) [A \& B = B \& A in (this) probability theory, so it does not matter what order you write them]

From 2 we can infer P(A \& B) = P(A|B)P(B), which slots into 1 to give

P(B|A) = \frac{P(A|B) P(B)}{P(A)}

Now use the same variables as in the original problem

P(C|T) = \frac{P(T|C) P(C)}{P(T)}

We can already fill in the numerator (top row) with P(T|C) = .8 and P(C) = .01, but not yet the denominator (bottom row).

Let’s work a bit further then. We can infer P(T) as follows:

P(T) = P(T \& C) + P(T \& \neg C)

Which is easily calculated from the rewrite of the conditional probability above:

P(T) = P(T|C) P(C) + P(T|\neg C) P(\neg C)

One more thing: P(\neg A) = 1 - P(A).  So this gives:

P(T) = P(T|C) P(C) + P(T|\neg C) P(\neg C)
= .8 \times .01 + .096 \times (1 - .01) = .10304

Now we have everything we need:

P(C|T) = \frac{.8 \times .01}{.10304} = .078.