Recognizing textual entailment with natural logic

February 4, 2010 by Andy

How do you work out whether a segment of natural language prose entails a sentence?

There are two extreme positions on how to model what’s going on.  One is to translate the natural language into a logic of some kind, then apply a theorem prover to draw conclusions.  The other is to use algorithms which work directly on the original text, using no knowledge of logic, for instance applying lexical or syntactic matching between premises and putative conclusion.

The main problem with the translation approach is that it’s very hard, as anyone who has tried manually to formalise some prose will agree.  The main problem with approaches processing the text in a shallow fashion is that they can be easilly tricked,  e.g., by negation, or systematically replacing quantifiers.

Bill MacCartney and Christopher D. Manning (2009) report some work from the space in between using so-called natural logics, which work by annotating the lexical elements of the original text in a way that allows inference. One example of such a logic familar to those in the psychology of reasoning community is described by Geurts (2003).

The general idea is finding a sequence of edits, guided by the logic, which try to transform the premises into the conclusion.  The edits are driven solely by the lexical items and require no context.

Seems promising for many cases, easily beating both the naive lexical comparisons and attempts automatically to formlalise and prove properties in first-order logic.

References

Bill MacCartney and Christopher D. Manning (2009). An extended model of natural logic.  The Eighth International Conference on Computational Semantics (IWCS-8), Tilburg, Netherlands, January 2009.

Geurts, B. (2003). Reasoning with quantifiers. Cognition, 86, 223-251.

Embracing individual differences

January 31, 2010 by Andy

This week I finished teaching/facilitating a course entitled Embracing individual differences in thinking and reasoning.  I was asked to give the gist of what this was about.

There’s a bunch of individual differences in how people solve reasoning problems. One way of thinking about this is that some people are very good at reasoning problems and others are not so good, with a continuum in between. But there’s evidence that people are interpreting the tasks in different (and reasonable) ways, and succeeding in reasoning from their interpretation. We examined these sorts of issues on the course.

A simple example is the meaning of “some”, discussed by J S Mill in his 1867, An examination of Sir William Hamilton’s philosophy: And of the principal philosophical questions discussed in his writings. “I saw some of your children today” implicates that I didn’t see all of them (if I had, then I’d have said so), even though the “all” conclusion is compatible with some.

There are many other examples: the degree to which people are affected by how information is presented; whether people can suspend their beliefs and reason from premises which are obviously false; whether people are sensitive to alternative causes of effects, or factors which can disable a relationship between a cause and effect.

The waters are muddied somewhat by complicated relationships with intelligence. So for instance people with higher intelligence (for several of the standard psychological ways to operationalize the concept as IQ) are more likely to go for the normative answer on some (but not all) tasks. But then one can wonder what exactly the IQ tests are measuring.

Things get particularly interesting when people with clinical conditions, such as autism, actually are more likely to give the normative answer for some tasks. There’s a nice example of where their ability to do so was used as an argument for why the normative answer was wrong. One researcher blogged:

Autistics were shown to perform with enhanced logical consistency, avoiding irrational and irrelevant biases that distorted decision-making in their nonautistic controls. However, autistics’ enhanced performance in this study was interpreted by the authors as a litany of autistic failures, imbalances, impairments, deficits, reduced capacities, weaknesses, and impoverishments (several invocations of some of these), none of which were actually found. [...] In years to come, we can look forward to interventions designed to overcome this core autistic deficit and to ensure that autistics become as irrational as nonautistics.

There were plenty of issues to debate…

Thinking when you think you’re not thinking—again

October 30, 2009 by Andy

I really enjoyed the 2006 Science paper by Dijksterhuis, Bos, Nordgren and van Baaren on deliberation without attention.  Then came Acker (2008) with a meta-review and the suggestion that there is “little evidence” of an advantage of deliberation without attention.

Today, from the latest issue of Judgment and Decision Making: Are complex decisions better left to the unconscious? Further failed replications of the deliberation-without-attention effect by Dustin P. Calvillo and Alan Penaloza.

The summary seems to be that deciding without deliberation, immediately after the stimulus is presented, might sometimes be better than deliberation.  But not with distraction post stimulus.

The moderators of the effect—there seems to be something going on in a few studies!—are still not well understood.

rggobi

October 23, 2009 by Andy

Playing around with rggobi at the moment, the R package which interfaces with the interactive graphics package GGobi.  Installation very easy; just needed the command:

source("http://www.ggobi.org/downloads/install.r")

Followed by a restart of R.

To get the first session going:

require(rggobi)
g = ggobi(mtcars)

Reasoning to an interpretation before applying Bayes’ rule

October 12, 2009 by Andy

What’s the point of Bayes’ rule?  This web page by Eliezer S. Yudkowsky gives a long intuitive explanation (thanks to Keith Frankish for pointing to it).  This blog post is an attempt at a slightly shorter version with a bit more maths, and a bit of rambling about interpretation.

The information in the example problem given there is as follows:

  1. 1% of women at age forty who participate in routine screening have breast cancer.
  2. 80% of women with breast cancer will get positive mammographies.
  3. 9.6% of women without breast cancer will also get positive mammographies.

The task: A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

The general problem solved by Bayes’ rule is that if you know the probability of if A, then B, how do you work out the probability of if B, then A?  More precisely if you know P(B|A), what is P(A|B)?

Here B|A denotes the conditional event, a simultaenously easy and difficult concept.  One way to think of it is as follows.

Consider a fair die with six sides.  It’s thrown.  What’s the probability of a six given that a side showing an even number lands upwards? (Van Frassen, 1976 used an example like this to explain the conditional event interpretation of the natural language if-then.)  This is P(lands six|lands even).  The idea is that you only consider cases where it’s showing an even number (2, 4, or 6). Assuming they’re all equally probable, then P(lands six|lands even) = 1/3.

Interpretation

The first stage of solving problems like that above is interpretating the problem in the language of the mathematical theory you want to use.

Let C denote “has cancer”, \neg C denote “does not have cancer”, T denote “shows a positive test result”, and \neg T denote “shows a negative test result”.

Let’s take each item of information individually.

1% of women at age forty who participate in routine screening have breast cancer.

There’s a mix of information here: a percentage of people (1%), from a particular sub-population (women, aged 40, who participate in routine screening), and a property they have.  From the problem it is clear that the interpretation is supposed to be:

P(C) = .01

But one can imagine a more complicated formalisation, for instance if the population of interest contains women of many different ages, some, but not all, of whom were screened because they had some worry about their health.

Next sentence:

80% of women with breast cancer will get positive mammographies.

This is an instance of

X% of people with property A have property B

The intended interpretation is P(B|A) = X%, but this might not be obvious to all readers.  Take some:

Some people with property A have property B

If this is interpreted as an existential quantifier, then it also follows that some people with property B have property A.  The conditional event, B|A, is in general not reversable in this way, so would not be suitable for the interpretation of an existential “some”.  Consider the following statement:

All people with property A have property B

This is not (in general) reversable. The percentage quantifier (used in the problem description) is also not reversible.  So there’s quite a lot of trickiness involved in interpreting this innocent looking statement. Given some background knowledge (we know the article is about Bayes’ rule, and about conditional probabilities), the intended interpretation of the original information is:

P(T|C) = .8

The idea is that if we choose a person at random from the population of interest, who has cancer (i.e., we know for sure she has cancer), then the probability of her having a positive test result is .8.

Then similarly for the last sentence:

9.6% of women without breast cancer will also get positive mammographies.

The formalisation is:

P(T|\neg C) = .096

Here is the summary:

P(C) = .01
P(T|C) = .8
P(T|\neg C) = .096

Now the problem statement:

A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

We have to infer P(C|T). Note how this is a reversal of the conditional statements we encounted in the information given about the test.

Calculation

Now comes the calculation. A good place to start when thinking about conditional probability is the ratio formula for the probability of a condititional event:

P(B|A) = \frac{P(A \& B) }{P(A)}

Take an interpretatation of “If it is raining, then I have an umbrella” as the conditional event expression:

I have an umbrella  |  it is raining

The probability of this is the probability that I have an umbrella and it is raining, divided by the probability that it is raining.

This can easily be rewritten to

P(A \& B) = P(B|A) P(A)

So if you know the probability of rain, and the probability that I have an umbrella when it rains, then you can multiply them to infer the probability that it is raining and I have an umbrella.

One step towards Bayes’ rule begins with:

  1. P(B|A) = P(A \& B) / P(A)
  2. P(A|B) = P(A \& B) / P(B) [A \& B = B \& A in (this) probability theory, so it does not matter what order you write them]

From 2 we can infer P(A \& B) = P(A|B)P(B), which slots into 1 to give

P(B|A) = \frac{P(A|B) P(B)}{P(A)}

Now use the same variables as in the original problem

P(C|T) = \frac{P(T|C) P(C)}{P(T)}

We can already fill in the numerator (top row) with P(T|C) = .8 and P(C) = .01, but not yet the denominator (bottom row).

Let’s work a bit further then. We can infer P(T) as follows:

P(T) = P(T \& C) + P(T \& \neg C)

Which is easily calculated from the rewrite of the conditional probability above:

P(T) = P(T|C) P(C) + P(T|\neg C) P(\neg C)

One more thing: P(\neg A) = 1 - P(A).  So this gives:

P(T) = P(T|C) P(C) + P(T|\neg C) P(\neg C)
= .8 \times .01 + .096 \times (1 - .01) = .10304

Now we have everything we need:

P(C|T) = \frac{.8 \times .01}{.10304} = .078.

Reporting standardised/simple effect size

September 18, 2009 by Andy

I’ve moaned a bit about (what felt at the time to be a religion of) “effect size”. Recently Thom Baguley has published a paper on the topic, comparing standardised effects measures, which involve scaling with respect to the sample variance, with simple effects measures, which are expressed in the original units of measurement.

Baguley reviews some of the problems with standardised measures, all related to factors affecting sample variance. In general he advises reporting simple effect sizes, and preferably with confidence intervals.  If you really want to use standardised measures, for instance to compare conceptually similar measures on different scales, then he advises against reporting absolute and “canned” judgements like “small”, “medium”, and “large”, arguing instead in favour of descriptions about the relative size of effects.

I like his Tukey quote:

“… being so disinterested in our variables that we do not care about their units can hardly be desirable.”

It does seem odd to focus on how much variance is explained rather than actually characterising the nature of relationships between variables.

Reference

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617.

Working on your quirk

September 16, 2009 by Andy

Another brief interlude from the stats and cog psych, but related to individual differences in reasoning I think! Interesting aside in a book by Robert Sutton on dealing with assholes (technical term here) in the workplace. He clarifies his defintion of the particular breed of asshole (technical term) in whom he is interested (pp. 16–17):

My focus is squarely on screening, reforming, and getting rid of people who demean and damage others, especially others with relatively little power. [...] I am a firm believer in the virtues of conflict, even noisy arguments.

Here’s a special case he describes (pp. 18–19) of people who can occasionally appear to be assholes (technical term), but are not:

I also want to put in a good word for socially awkward people [...]. I was struck by how many successful leaders of high-tech companies and creative organizations like advertising agencies, graphic design firms, and Hollywood production companies, had learned to ignore job candidates’ quirks and strange mannerisms, to downplay socially inappropriate remarks, and instead, to focus on what the people could actually do.

Examples he gives include autistic people and those with Tourette’s syndrome.

Reference

Sutton, R  (2007). The No Asshole Rule. NY: Business Plus.

Walk with Vaughan Bell between the Maudsley and the Tavistock clinic

September 15, 2009 by Andy

The important details:

“11am, near the Maudsley Hospital in Denmark Hill, Saturday 19th September, to walk to the Tavistock Clinic in the leafy suburb of Belsize Park for about 4-5ish.”

And:

The walk is about 8 miles but I’m planning for a few minor detours for interesting sites (grounds of the old Bedlam Hospital, now the Imperial War Museum, St Thomas’ Hospital and the like) and with stops for lunch and maybe the occasional pint.

More info over here.

Wish I was in London to join this!

Bozo Sapiens by Michael and Ellen Kaplan

September 11, 2009 by Andy

(Conflict of interest: I received my copy as a freebie from the publisher.)

Michael and Ellen Kaplan’s book, Bozo Sapiens, begins with the observation that (always other…) people tend to make stupid mistakes, by their own measures of stupidity.  PopSci books related to your research topic can be painful to read (the combination of results not being reported in detail with the realisation that they can’t be reported in detail), but the reward tends to be a reminder of what initially attracted you to the field and the occasional anecdote for teaching. So, off I went.

Where the book really shines is in its many examples of reasoning and decision making in the wild. For instance, how a pilot with too much (but unfortunately just recently out of date) knowledge of the air conditioning system on a 737 contributed to the death of 47 passengers (p. 117). Examples of the way pilots and air traffic control communicate successfully with each other in times of crisis (p. 142). A transcript of the events leading up to the mistaken shelling of friendly forces in Iraq (p. 91) revealing the conflicting decision processes and realisation of the error. The book is packed with great examples.

There were a couple of annoyances. The introduction of (classical) logical validity on p. 7 is wrong. They confuse it with consistency. Take this example.

All dogs have five legs.
Rex is a dog.
Therefore Rex has five legs.

The argument is valid because if the premises are true, then so is the conclusion. Now take a description of people at a party:

Some of the cute girls were tipsy
Some of the tipsy people were German
Therefore some of the cute girls were German

The conclusion is consistent with the premises: it is possible for the conclusion to be true if the premises are true. For instance if there were 50 people at the party, and 7 of them were cute German girls who were tipsy. The argument is, however, not valid: it is not necessary that the conclusion is true if the premises are. Suppose all the cute girls were Austrian, and none of the other girls were cute. A few of the Austrians were tipsy. There were Germans at the party, but none of them were female.

There are some fun psychological things going on with these kinds of sentences related to assumptions of cooperativeness and sensitivity to information ordering (one of which is called… the figural effect).

Wason’s 1966 selection task is introduced very briefly (pp. 115–116). There’s a mass of literature studying the task, and much of it was declared a waste of space by Sperber and Girotto (2002) [Use or misuse of the selection task? Rejoinder to Fiddick, Cosmides and Tooby. Cognition, 85, 277-290] and others. Still, it would have been nice to have a few words on the different interpretations people have. How some of these may be reasonable. How people with high g tend to give the answer Wason expected. The effect denotic content, e.g., about drinking laws, has on people’s performance. And so forth. But I guess the point was, as is often the application of the task, to demonstrate that the reader is stupid.

There are a few similar glosses on lab tasks which don’t really do them justice.  However the endnotes are very detailed so you can follow up references and see what the original papers said.  There are many good choices in there, e.g., a paper by Kemp and Tenenbaum on structure learning.

So overall, the book is great as a collection of examples and anecdotes, and might encourage people to learn more about the details.

SEM again

September 10, 2009 by Andy

As yet I haven’t convinced myself that SEM is a good idea—or at least not in the examples I’ve seen in psychology. Two reasons for starters: (i) fit statistic madness and (ii) weird analyses driven almost entirely by correlations or at best with vague theorizing based on very trival analyses of the tasks (it has a bit of this and bit of that…).

Anyway, recently Andrew Gelman noted:

“… there’s a research paradigm in which you fit a model—maybe a regression, maybe a structural equations model, maybe a multilevel model, whatever—and then you read off the coefficients, with each coefficient telling you something. You gather these together and those are your conclusions.

“My paradigm is a bit different. I sometimes say that each causal inference requires its own analysis and maybe its own experiment. I find it difficult to causally interpret several different coefficients from the same model.”

He goes on in the discussion to add:

“I have no problem with multiple-equation models, including measurement-error models, multilevel models, and instrumental variables. But I’m skeptical of trying to answer several casual questions by fitting a single model to a dataset.”

Interesting discsussion starting over here.