Chi-square in SEM

Playing around again with SEM. Just where does that \chi^2 come from? Here’s a brain dump of the gist.

You start with the sample covariance matrix (S) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between S and the “implied” covariance matrix (i.e., the one predicted by the model, C) is minimised and out pops the final set of estimates. Then you multiply that difference between S and C by (N - 1) to get something out with a \chi^2 distribution.

Marvellous.

First how do we get C? Loehlin (2004, p. 41) to the rescue:

C = F \cdot (I-A)^{-1} \cdot S \cdot (1 - A)^{-1'} \cdot F'

Here A and S have the same dimensions as the sample covariance matrix. (This is a different S to the one I mentioned above—don’t be confused yet.)

A contains the (assymetric) path estimates, S contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and F is the so called filter matrix which marks which variables are measured variables. (I is the identity matrix and M' is the transpose of M.)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a C. Take S again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off C is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:

\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) - \mbox{log}(|S|) - n

where \mbox{tr} is the trace of a matrix and is the sum of the diagonal, and |M| is the determinant of M. Oh and n is the number of observed variables.

The R code for this (pulled and edited from the null \chi^2 calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:

require(sem)

N=100

x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)
summary(sem1)

When I ran this, the model \chi^2 = 4.6454.

The S and C matrices can be extracted using

sem1$S
sem1$C

Then plugging these into the formula …

N = 100
n = 4

S = sem1$S
C = sem1$C

(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null \chi^2 you just set C as the diagonal of S.

Next up, would be nice to build C by hand for particular model and its parameter estimates…

Reference

Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.

Advertisements

One comment

  1. kamakshaiah

    dear Andy, this post is really a very nice description to chi-squared statistic. However, I read ” to get the null \chi^2 you just set C as the diagonal of S.” Do you mean we need to interchange S with C and vice versa in the expression.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s