# Chi-square in SEM

Playing around again with SEM. Just where does that come from? Here’s a brain dump of the gist.

You start with the sample covariance matrix () and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between and the “implied” covariance matrix (i.e., the one predicted by the model, ) is minimised and out pops the final set of estimates. Then you multiply that difference between and by () to get something out with a distribution.

Marvellous.

First how do we get ? Loehlin (2004, p. 41) to the rescue:

Here and have the same dimensions as the sample covariance matrix. (This is a different to the one I mentioned above—don’t be confused yet.)

contains the (assymetric) path estimates, contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and is the so called filter matrix which marks which variables are measured variables. ( is the identity matrix and is the transpose of .)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a . Take again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:

where is the trace of a matrix and is the sum of the diagonal, and is the determinant of . Oh and is the number of observed variables.

The R code for this (pulled and edited from the null calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a *sum* after a *diag*. The *solve* function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:

require(sem)

N=100

x1 = rnorm(N, 20, 20)

x2 = rnorm(N, 50, 10)

x3 = rnorm(N, 100, 15)

e = rnorm(N,0,100)y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()

y <->y, e.y, NA

x1 <->x1, e.x1, NA

x2 <->x2, e.x2, NA

x3 <->x3, e.x3, NA

y <- x1, bx1, NA

y <- x2, bx2, NA

y <- x3, bx3, NAsem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)

summary(sem1)

When I ran this, the model .

The and matrices can be extracted using

sem1$S

sem1$C

Then plugging these into the formula …

N = 100

n = 4S = sem1$S

C = sem1$C(N – 1) *

(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null you just set as the diagonal of .

Next up, would be nice to build by hand for particular model and its parameter estimates…

**Reference**

Loehlin, J. C. (2004). *Latent Variable Models* (4th ed). LEA, NJ, USA.