“Models of data have a deep inﬂuence on the kinds of theorising that researchers do. A structural equation model with latent variables named Shifting, Updating, and Inhibition (Miyake et al. 2000) might suggest a view of the mind as inter-connected Gaussian distributed variables. These statistical constructs are driven by correlations between variables, rather than by the underlying cognitive processes […]. Davelaar and Cooper (2010) argued, using a more cognitive-process-based mathematical model of the Stop Signal task and the Stroop task, that the inhibition part of the statistical model does not actually model inhibition, but rather models the strength of the pre-potent response channel. Returning to the older example introduced earlier of g (Spearman 1904), although the scores from a variety of tasks are positively correlated, this need not imply that the correlations are generated by a single cognitive (or social, or genetic, or whatever) process. The dynamical model proposed by van der Mass et al. (2006) shows that correlations can emerge due to mutually beneﬁcial interactions between quite distinct processes.”
Fugard, A. J. B & Stenning, K. (2013). Statistical models as cognitive models of individual differences in reasoning. Argument & Computation, 4(1), 89–102.
Here’s a picture from van der Linden, D., et al. (in press) [The General Factor of Personality: A meta-analysis of Big Five intercorrelations and a criterion-related validity study. Journal of Research in Personality.]:
(α is also more memorably called “Stability” and β is “Plasticity”; GFP is the General Factor in Personality.)
This suggests that, on self-reported questionnaire responses, openness and extraversion tend to go together, and that conscientiousness, agreeableness, and emotional stability tend to go together. Furthermore, all five tend to go together, e.g., pick a bunch of random extraverted people and, more likely than not, they’ll be agreeable and emotionally stable. Or at least they’ll report that they are so.
The authors allude to possible statistical artefacts causing the correlations, or factors of general social desirability.
The evidence for a “substantive” interpretation of the correlations comes from studies showing heritability of the general factor and correlations with other measures. The latter tend to be low, e.g., on average rs around .15, peaking at .3, between the factor scores and boss-reports of various factors of job performance.
What’s missing: explanations from the perspective of social and neural process theories.
As yet I haven’t convinced myself that SEM is a good idea—or at least not in the examples I’ve seen in psychology. Two reasons for starters: (i) fit statistic madness and (ii) weird analyses driven almost entirely by correlations or at best with vague theorizing based on very trival analyses of the tasks (it has a bit of this and bit of that…).
Anyway, recently Andrew Gelman noted:
“… there’s a research paradigm in which you fit a model—maybe a regression, maybe a structural equations model, maybe a multilevel model, whatever—and then you read off the coefficients, with each coefficient telling you something. You gather these together and those are your conclusions.
“My paradigm is a bit different. I sometimes say that each causal inference requires its own analysis and maybe its own experiment. I find it difficult to causally interpret several different coefficients from the same model.”
He goes on in the discussion to add:
“I have no problem with multiple-equation models, including measurement-error models, multilevel models, and instrumental variables. But I’m skeptical of trying to answer several casual questions by fitting a single model to a dataset.”
Interesting discsussion starting over here.
What makes a good SEM fit? Thought I’d take a peek at Bentler (2007).
He gives a range of fairly general suggestions of things to report about model fits, including (a) which bits of the model were and weren’t decided a priori; (b) tests of assumptions, e.g., multivariate normality; (c) descriptives and a correlation matrix; and (d) a summary of parameters before and after paths have been added or removed.
Regarding fit, he suggests that
“At least one statistical test of model fit … should be reported for each major SEM model, unless the author verifies that no appropriate test exists for their design and model.”
In addition, he suggests reporting SRMR or the average absolute standardized residual, as well as the largest several residuals in a correlation metric. He also suggests reporting CFI or RMSEA.
The all important small sample problem. He says “small” is and that in such cases at least one alternative model should be provided which is successfully rejected even with the small sample size.
Now the interesting thing is what he makes of . He argues that any test is unlikely to have an exact distribution, and thus it would be ill-advised to rely too much on an exact test p-value, though importantly he adds that
“The toolkit of possible tests has recently vastly expanded … and it does not make sense to talk about “the” test. Including F-tests, EQS 6 provides more than a dozen model tests… I certainly favor the use of a carefully chosen model test, but even the best of these can fail in applications…”
One of the articles he cites to justify suspicion of test distributions is a simulation study by a colleague and he (Yuan & Bentler, 2004). They summarise the problem faced by applied researchers:
“In practice, many reported chi-square statistics are significant even when sample sizes are not large, and in the context of nested models, the chi-square difference test is often not significant.”
Elaborating on this, they mean that when you don’t want to have a significant test result, you often do get it; when you do want significance, you don’t. (NHST—different issue for another day.) They also summarise another important problem:
“There are many model fit indices in the literature of SEM. For example, SAS CALIS provides about 20 fit indices in its default output. Consequently, there is no unique criterion for judging whether a model fits the data.”
So the gist from their simulations: often you can’t trust “the” tests; nor can you trust the Wald zs testing the individual parameters. Hurray.
Wald tests have been attacked elsewhere, e.g., by Hauck and Donner (1977) for logistic regression; further they demonstrated the problem by an analysis of what predicts the presence of the T. vaginalis organism in college students. The gist: the further your estimate is away from the null value (usually zero; recall your null hypothesis is often that the slope is zero), the lower the power of the Wald test.
Ah, the joys of stats! Give me a nice graph or table any day.
Bentler, P. M. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42, 825-829.
Hauck, Walter W. Jnr & Donner, A. (1977). Wald’s Test as Applied to Hypotheses in Logit Analysis. Journal of the American Statistical Association, 72, 851-853.
K.-H. Yuan and P.M. Bentler (2004). On chi-square difference and z-tests in mean and covariance structure analysis when the base model is misspecified. Educational and Psychological Measurement, 64, 737–757.
Playing around again with SEM. Just where does that come from? Here’s a brain dump of the gist.
You start with the sample covariance matrix () and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between and the “implied” covariance matrix (i.e., the one predicted by the model, ) is minimised and out pops the final set of estimates. Then you multiply that difference between and by () to get something out with a distribution.
First how do we get ? Loehlin (2004, p. 41) to the rescue:
Here and have the same dimensions as the sample covariance matrix. (This is a different to the one I mentioned above—don’t be confused yet.)
contains the (assymetric) path estimates, contains the (symmetric) covariances and residual variances (the latter seem to be squared—why?), and is the so called filter matrix which marks which variables are measured variables. ( is the identity matrix and is the transpose of .)
I don’t quite get WHY the implied matrix is plugged together this way, but onwards…
So now we have a . Take again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off is. I’m playing with SEM in R so let’s see what John Fox’s package does… SEEMS to be this one:
The R code for this (pulled and edited from the null calculation in the sem fit function) is
sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n
Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.
Let’s have a quick poke around with the sem package using a simple linear regression:
x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)
y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e
thedata = data.frame(x1,x2,x3,y)
mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA
sem1 = sem(mod1, cov(thedata), N=dim(thedata), debug=T)
When I ran this, the model .
The and matrices can be extracted using
Then plugging these into the formula …
N = 100
n = 4
S = sem1$S
C = sem1$C
(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)
… gives… 4.645429.
One other thing: to get the null you just set as the diagonal of .
Next up, would be nice to build by hand for particular model and its parameter estimates…
Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.
From an Appendix to An R and S-PLUS Companion to Applied Regression:
A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.