Chi-square in SEM

Playing around again with SEM. Just where does that \chi^2 come from? Here’s a brain dump of the gist.

You start with the sample covariance matrix (S) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between S and the “implied” covariance matrix (i.e., the one predicted by the model, C) is minimised and out pops the final set of estimates. Then you multiply that difference between S and C by (N - 1) to get something out with a \chi^2 distribution.


First how do we get C? Loehlin (2004, p. 41) to the rescue:

C = F \cdot (I-A)^{-1} \cdot S \cdot (1 - A)^{-1'} \cdot F'

Here A and S have the same dimensions as the sample covariance matrix. (This is a different S to the one I mentioned above—don’t be confused yet.)

A contains the (assymetric) path estimates, S contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and F is the so called filter matrix which marks which variables are measured variables. (I is the identity matrix and M' is the transpose of M.)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a C. Take S again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off C is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:

\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) - \mbox{log}(|S|) - n

where \mbox{tr} is the trace of a matrix and is the sum of the diagonal, and |M| is the determinant of M. Oh and n is the number of observed variables.

The R code for this (pulled and edited from the null \chi^2 calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:



x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)

When I ran this, the model \chi^2 = 4.6454.

The S and C matrices can be extracted using


Then plugging these into the formula …

N = 100
n = 4

S = sem1$S
C = sem1$C

(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null \chi^2 you just set C as the diagonal of S.

Next up, would be nice to build C by hand for particular model and its parameter estimates…


Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.


One comment

  1. kamakshaiah

    dear Andy, this post is really a very nice description to chi-squared statistic. However, I read ” to get the null \chi^2 you just set C as the diagonal of S.” Do you mean we need to interchange S with C and vice versa in the expression.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s