# Factor analysis digression

Set up some factor scores

f = rnorm(100,0,15)

Some noise for the five manifest variables

e1 = rnorm(100,0,5)

e2 = rnorm(100,0,5)

e3 = rnorm(100,0,20)

e4 = rnorm(100,0,50)

e5 = rnorm(100,0,5)

And then the variables themselves, generated from the latent variable. Lots of stuff to play with, e.g. the slopes and the noise term, to see what affects the factor loadings…

x1 = 200 * f + e1

x2 = 5 * f + e2

x3 = 2 * f + e3

x4 = 2 * f + e4

x5 = 2 * f + e5

Now let’s use factor analysis to get back *f*:

fa1 = factanal(~ x1 + x2 + x3 + x4 + x5, factors=1, scores = “Bartlett”)

The output

> fa1

Call:

factanal(x = ~x1 + x2 + x3 + x4 + x5, factors = 1, scores = “Bartlett”)Uniquenesses:

x1 x2 x3 x4 x5

0.005 0.005 0.348 0.677 0.029Loadings:

Factor1

x1 0.998

x2 0.998

x3 0.808

x4 0.569

x5 0.986Factor1

SS loadings 3.940

Proportion Var 0.788Test of the hypothesis that 1 factor is sufficient.

The chi square statistic is 45.24 on 5 degrees of freedom.

The p-value is 1.3e-08

Compare the scores from FA (should have variance 1, mean 0) with *f*.

> as.vector(fa1$scores)

[1] 0.46229282 -0.60935524 1.68547486 0.32820617 -0.31099996 …

> as.vector(scale(f))

[1] 0.45208732 -0.66695823 1.63456194 0.33728772 -0.35131222 …

Looks good, and hopefully the theoretical model in my head isn’t too far off.

Next stop, SEM with a latent variable in…