A colleague argued that often there’s no theory behind latent variables—they’re just constructed by a “statistical rigmarole”. I think the folk who use SEM would argue that there is theory there. Even in exploratory factor analysis, the factors have to make sense otherwise they’re rejected. Example of that from a paper on the Systemizing questionnaire:
“Following the initial principal component analysis, 11 factors had an eigenvalue of greater than one, and were retained. The data were then subjected to a varimax rotation. An examination of the factors generated suggested that these did not correspond to factors with any psychological significance. Thus, total SQ score was the only measure analysed.”
Okay, in this case the “theory” is a vague notion of what questions sound similar. That’s very weak.
You can see traces of theory most easily in huge studies of the hierarchical structure of operationalisations of “intelligence”. There’ll be a visual ability latent variable, a verbal ability latent variable, g at the top. Why not just have g? Well because the other components predict things that g doesn’t.
But the theory is vague. The BigMetric crowd never seem to have a notion of, e.g., how the form of the stimuli affects features of response other than correctness, and dismiss as trivial attempts to do so. Presumably the folk who do the work behind the individual tests have theories, however implicit, of what’s going on.
Latent variables make more sense when applied to personality questionnaires. My grasp of the philosophy (absorbed implicitly—I’d be delighted to find a reference) is that you want to find groups of sentences which have similar semantics in a particular population. A factor represents a fuzzy conjunction of propositions. An individual’s factor score then gives degree of endorsement of that fuzzy proposition.
To make sense of the factor scores you need to examine their correlates and the theory associated with those correlates. The understanding emerges from networks of these correlates and theories. Hopefully a few of the correlates have some causal story behind them.
Importantly, theory here does not mean “A predicts B“, at the level of operationalisations. It must be something which goes beyond the stats.