Chi-square in SEM

Playing around again with SEM. Just where does that $\chi^2$ come from? Here’s a brain dump of the gist.

You start with the sample covariance matrix ($S$) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between $S$ and the “implied” covariance matrix (i.e., the one predicted by the model, $C$) is minimised and out pops the final set of estimates. Then you multiply that difference between $S$ and $C$ by ($N - 1$) to get something out with a $\chi^2$ distribution.

Marvellous.

First how do we get $C$? Loehlin (2004, p. 41) to the rescue:

$C = F \cdot (I-A)^{-1} \cdot S \cdot (1 - A)^{-1'} \cdot F'$

Here $A$ and $S$ have the same dimensions as the sample covariance matrix. (This is a different $S$ to the one I mentioned above—don’t be confused yet.)

$A$ contains the (assymetric) path estimates, $S$ contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and $F$ is the so called filter matrix which marks which variables are measured variables. ($I$ is the identity matrix and $M'$ is the transpose of $M$.)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a $C$. Take $S$ again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off $C$ is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:

$\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) - \mbox{log}(|S|) - n$

where $\mbox{tr}$ is the trace of a matrix and is the sum of the diagonal, and $|M|$ is the determinant of $M$. Oh and $n$ is the number of observed variables.

The R code for this (pulled and edited from the null $\chi^2$ calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:

require(sem)

N=100

x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)
summary(sem1)

When I ran this, the model $\chi^2 = 4.6454$.

The $S$ and $C$ matrices can be extracted using

sem1$S sem1$C

Then plugging these into the formula …

N = 100
n = 4

S = sem1$S C = sem1$C

(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null $\chi^2$ you just set $C$ as the diagonal of $S$.

Next up, would be nice to build $C$ by hand for particular model and its parameter estimates…

Reference

Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.