Chi-square in SEM
May 11, 2008 by AndyPlaying around again with SEM. Just where does that come from? Here’s a brain dump of the gist.
You start with the sample covariance matrix () and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between
and the “implied” covariance matrix (i.e., the one predicted by the model,
) is minimised and out pops the final set of estimates. Then you multiply that difference between
and
by (
) to get something out with a
distribution.
Marvellous.
First how do we get ? Loehlin (2004, p. 41) to the rescue:
Here and
have the same dimensions as the sample covariance matrix. (This is a different
to the one I mentioned above—don’t be confused yet.)
contains the (assymetric) path estimates,
contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and
is the so called filter matrix which marks which variables are measured variables. (
is the identity matrix and
is the transpose of
.)
I don’t quite get WHY the implied matrix is plugged together this way, but onwards…
So now we have a . Take
again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off
is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:
where is the trace of a matrix and is the sum of the diagonal, and
is the determinant of
. Oh and
is the number of observed variables.
The R code for this (pulled and edited from the null calculation in the sem fit function) is
sum(diag(S %*% solve(C))) + log(det(C)) - log(det(S)) - n
Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.
Let’s have a quick poke around with the sem package using a simple linear regression:
require(sem)
N=100
x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)y = 2*x1 - 1.2*x2 + 1.5*x3 + 40 + e
thedata = data.frame(x1,x2,x3,y)
mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NAsem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)
summary(sem1)
When I ran this, the model .
The and
matrices can be extracted using
sem1$S
sem1$C
Then plugging these into the formula …
N = 100
n = 4S = sem1$S
C = sem1$C(N - 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) - n)
… gives… 4.645429.
One other thing: to get the null you just set
as the diagonal of
.
Next up, would be nice to build by hand for particular model and its parameter estimates…
Reference
Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.