本章介绍R中的因子分析。
# 读取数据
#P146,例题7.1 52名学生的6科目成绩
library(readr)
library(tidyverse)
eg7_1 <- read_csv("eg6.1.csv")
eg7_1 <- eg7_1 %>% rename(数学 = x1,
物理 = x2,
化学 = x3,
语文 = x4,
历史 = x5,
英语 = x6)
1 第1步:评估数据是否适合做因子分析
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = eg7_1)
Overall MSA = 0.8
MSA for each item =
数学 物理 化学 语文 历史 英语
0.80 0.83 0.76 0.84 0.81 0.78
✔ The Bartlett's test of sphericity was significant at an alpha level of .05.
These data are probably suitable for factor analysis.
𝜒²(15) = 205.97, p < .001
2 第2步:确定提取的因子个数
eigen() decomposition
$values
[1] 3.7099044 1.2624812 0.4408365 0.2705018 0.1695159 0.1467602
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.4120520 -0.3759773 0.21582978 0.78801362 -0.0205822 0.14450829
[2,] -0.3811779 -0.3567060 -0.80555264 -0.11755209 0.2120360 -0.14061074
[3,] -0.3321347 -0.5626165 0.46743533 -0.58763655 -0.0333622 0.09068468
[4,] 0.4611846 -0.2785231 -0.04426879 0.02783261 0.5990449 0.59003773
[5,] 0.4205876 -0.4147836 -0.25039004 0.03376008 -0.7384344 0.20479353
[6,] 0.4301372 -0.4065022 0.14612244 0.13410793 0.2221800 -0.74902427
#绘制碎石图
#install.packages("nFactors")
library(nFactors)
eg7_1.ev$values %>% nScree() %>%
plotnScree(legend = F)
3 第3步:估计因子载荷
3.1 主成分法 principal component method
Principal Components Analysis
Call: principal(r = eg7_1, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 h2 u2 com
数学 -0.79 0.42 0.81 0.19 1.5
物理 -0.73 0.40 0.70 0.30 1.5
化学 -0.64 0.63 0.81 0.19 2.0
语文 0.89 0.31 0.89 0.11 1.2
历史 0.81 0.47 0.87 0.13 1.6
英语 0.83 0.46 0.90 0.10 1.6
PC1 PC2
SS loadings 3.71 1.26
Proportion Var 0.62 0.21
Cumulative Var 0.62 0.83
Proportion Explained 0.75 0.25
Cumulative Proportion 0.75 1.00
Mean item complexity = 1.6
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 5.96 with prob < 0.2
Fit based upon off diagonal values = 0.99
# com的含义
# com : Hoffman's index of complexity for each item,
#该值越接近于1,代表该原始变量主要由某个因子代表。
#varimax正交旋转
fa.pc.varimax <- principal(eg7_1, nfactors = 2,
rotate = "varimax")
fa.pc.varimax
Principal Components Analysis
Call: principal(r = eg7_1, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 h2 u2 com
数学 -0.32 0.84 0.81 0.19 1.3
物理 -0.29 0.78 0.70 0.30 1.3
化学 -0.07 0.90 0.81 0.19 1.0
语文 0.88 -0.35 0.89 0.11 1.3
历史 0.92 -0.18 0.87 0.13 1.1
英语 0.93 -0.20 0.90 0.10 1.1
RC1 RC2
SS loadings 2.66 2.31
Proportion Var 0.44 0.39
Cumulative Var 0.44 0.83
Proportion Explained 0.54 0.46
Cumulative Proportion 0.54 1.00
Mean item complexity = 1.2
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 5.96 with prob < 0.2
Fit based upon off diagonal values = 0.99
Loadings:
RC1 RC2
语文 0.876
历史 0.917
英语 0.925
数学 0.839
物理 0.784
化学 0.897
RC1 RC2
SS loadings 2.661 2.312
Proportion Var 0.443 0.385
Cumulative Var 0.443 0.829
Principal Components Analysis
Call: principal(r = eg7_1, nfactors = 2, rotate = "promax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 h2 u2 com
数学 -0.12 0.83 0.81 0.19 1.0
物理 -0.10 0.78 0.70 0.30 1.0
化学 0.18 0.97 0.81 0.19 1.1
语文 0.87 -0.13 0.89 0.11 1.0
历史 0.97 0.07 0.87 0.13 1.0
英语 0.97 0.05 0.90 0.10 1.0
RC1 RC2
SS loadings 2.70 2.27
Proportion Var 0.45 0.38
Cumulative Var 0.45 0.83
Proportion Explained 0.54 0.46
Cumulative Proportion 0.54 1.00
With component correlations of
RC1 RC2
RC1 1.00 -0.49
RC2 -0.49 1.00
Mean item complexity = 1
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 5.96 with prob < 0.2
Fit based upon off diagonal values = 0.99
Loadings:
RC1 RC2
语文 0.874
历史 0.967
英语 0.971
数学 0.834
物理 0.782
化学 0.972
RC1 RC2
SS loadings 2.699 2.274
Proportion Var 0.450 0.379
Cumulative Var 0.450 0.829
3.2 极大似然法 Maximum Likelihood
Factor Analysis using method = ml
Call: fa(r = eg7_1, nfactors = 2, rotate = "none", fm = "ml")
Standardized loadings (pattern matrix) based upon correlation matrix
ML1 ML2 h2 u2 com
数学 -0.68 0.56 0.77 0.23 1.9
物理 -0.60 0.43 0.54 0.46 1.8
化学 -0.49 0.66 0.67 0.33 1.8
语文 0.92 0.10 0.85 0.15 1.0
历史 0.86 0.24 0.79 0.21 1.2
英语 0.88 0.27 0.85 0.15 1.2
ML1 ML2
SS loadings 3.40 1.07
Proportion Var 0.57 0.18
Cumulative Var 0.57 0.75
Proportion Explained 0.76 0.24
Cumulative Proportion 0.76 1.00
Mean item complexity = 1.5
Test of the hypothesis that 2 factors are sufficient.
df null model = 15 with the objective function = 4.28 with Chi Square = 205.97
df of the model are 4 and the objective function was 0.08
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.04
The harmonic n.obs is 52 with the empirical chi square 0.62 with prob < 0.96
The total n.obs was 52 with Likelihood Chi Square = 3.64 with prob < 0.46
Tucker Lewis Index of factoring reliability = 1.007
RMSEA index = 0 and the 90 % confidence intervals are 0 0.203
BIC = -12.17
Fit based upon off diagonal values = 1
Measures of factor score adequacy
ML1 ML2
Correlation of (regression) scores with factors 0.97 0.89
Multiple R square of scores with factors 0.95 0.80
Minimum correlation of possible factor scores 0.89 0.59
Call:
factanal(x = eg7_1, factors = 2, rotation = "none")
Uniquenesses:
数学 物理 化学 语文 历史 英语
0.228 0.459 0.333 0.148 0.210 0.150
Loadings:
Factor1 Factor2
数学 -0.676 0.562
物理 -0.599 0.427
化学 -0.487 0.656
语文 0.917 0.104
历史 0.856 0.239
英语 0.883 0.266
Factor1 Factor2
SS loadings 3.404 1.068
Proportion Var 0.567 0.178
Cumulative Var 0.567 0.745
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.64 on 4 degrees of freedom.
The p-value is 0.457
3.3 主因子法 principal factor
#函数psych::fa,varimax旋转
library(psych)
fa.pa.varimax <- fa(eg7_1,
nfactors = 2,
fm = "pa",
rotate = "varimax")
fa.pa.varimax
Factor Analysis using method = pa
Call: fa(r = eg7_1, nfactors = 2, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 h2 u2 com
数学 -0.32 0.82 0.77 0.23 1.3
物理 -0.31 0.67 0.54 0.46 1.4
化学 -0.11 0.81 0.66 0.34 1.0
语文 0.85 -0.36 0.85 0.15 1.3
历史 0.86 -0.20 0.78 0.22 1.1
英语 0.90 -0.20 0.86 0.14 1.1
PA1 PA2
SS loadings 2.49 1.98
Proportion Var 0.41 0.33
Cumulative Var 0.41 0.74
Proportion Explained 0.56 0.44
Cumulative Proportion 0.56 1.00
Mean item complexity = 1.2
Test of the hypothesis that 2 factors are sufficient.
df null model = 15 with the objective function = 4.28 with Chi Square = 205.97
df of the model are 4 and the objective function was 0.08
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.04
The harmonic n.obs is 52 with the empirical chi square 0.58 with prob < 0.96
The total n.obs was 52 with Likelihood Chi Square = 3.7 with prob < 0.45
Tucker Lewis Index of factoring reliability = 1.006
RMSEA index = 0 and the 90 % confidence intervals are 0 0.204
BIC = -12.1
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2
Correlation of (regression) scores with factors 0.95 0.91
Multiple R square of scores with factors 0.91 0.83
Minimum correlation of possible factor scores 0.82 0.66
#函数psych::fa,quartimax旋转
library(psych)
fa.pa.quartimax <- fa(eg7_1,
nfactors = 2,
fm = "pa",
rotate = "quartimax")
Loading required namespace: GPArotation
Factor Analysis using method = pa
Call: fa(r = eg7_1, nfactors = 2, rotate = "quartimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 h2 u2 com
数学 -0.42 0.77 0.77 0.23 1.5
物理 -0.39 0.63 0.54 0.46 1.7
化学 -0.21 0.79 0.66 0.34 1.1
语文 0.89 -0.25 0.85 0.15 1.2
历史 0.88 -0.09 0.78 0.22 1.0
英语 0.92 -0.09 0.86 0.14 1.0
PA1 PA2
SS loadings 2.79 1.68
Proportion Var 0.46 0.28
Cumulative Var 0.46 0.74
Proportion Explained 0.62 0.38
Cumulative Proportion 0.62 1.00
Mean item complexity = 1.3
Test of the hypothesis that 2 factors are sufficient.
df null model = 15 with the objective function = 4.28 with Chi Square = 205.97
df of the model are 4 and the objective function was 0.08
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.04
The harmonic n.obs is 52 with the empirical chi square 0.58 with prob < 0.96
The total n.obs was 52 with Likelihood Chi Square = 3.7 with prob < 0.45
Tucker Lewis Index of factoring reliability = 1.006
RMSEA index = 0 and the 90 % confidence intervals are 0 0.204
BIC = -12.1
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2
Correlation of (regression) scores with factors 0.96 0.90
Multiple R square of scores with factors 0.93 0.82
Minimum correlation of possible factor scores 0.85 0.63
3.4 主成分分析中的主成分载荷与因子分析中的主成分估计方法下的因子载荷的区别
PC1 PC2 PC3 PC4 PC5 PC6
数学 0.4120520 0.3759773 0.21582978 -0.78801362 -0.0205822 0.14450829
物理 0.3811779 0.3567060 -0.80555264 0.11755209 0.2120360 -0.14061074
化学 0.3321347 0.5626165 0.46743533 0.58763655 -0.0333622 0.09068468
语文 -0.4611846 0.2785231 -0.04426879 -0.02783261 0.5990449 0.59003773
历史 -0.4205876 0.4147836 -0.25039004 -0.03376008 -0.7384344 0.20479353
英语 -0.4301372 0.4065022 0.14612244 -0.13410793 0.2221800 -0.74902427
#求相关系数矩阵的特征值和特征向量
pc <- eg7_1 %>% cor %>% eigen() %>% .$vectors # 查看主成分载荷矩阵
eigen <- eg7_1 %>% cor %>% eigen() %>% .$values # 查看特征值
sqrt(eigen[1]) * pc[,1] # 计算第1个公共因子载荷
[1] -0.7936579 -0.7341911 -0.6397283 0.8882928 0.8100985 0.8284920
数学 物理 化学 语文 历史 英语
0.4120520 0.3811779 0.3321347 -0.4611846 -0.4205876 -0.4301372
#因子分析中的主成分法
fa.pc.none <- psych::principal(eg7_1, nfactors = 2,
rotate = "none")
fa.pc.none$loadings[,1] # 查看第1个因子公共因子载荷
数学 物理 化学 语文 历史 英语
-0.7936579 -0.7341911 -0.6397283 0.8882928 0.8100985 0.8284920
3.5 因子分析面临的决策:提取几个因子?因子载荷估计方法?旋转方法?
提取的因子个数 特征值大于1 因子的累积贡献达到70%-80%以上
因子载荷估计方法 主成分法 principal() 极大似然法 fa() 主轴因子法 fa()
因子旋转方法 正交旋转:varimax, quartimax 因子相互独立的 斜交旋转:oblimin, promax 因子之间是相关的
#主成分法
#正交旋转varimax(因子之间保持独立)
fa.pc.varimax <- principal(eg7_1,
nfactors = 2,
rotate = "varimax")
fa.pc.varimax$loadings %>% print(digits = 3,
cut = 0.5,
sort = TRUE)
Loadings:
RC1 RC2
语文 0.876
历史 0.917
英语 0.925
数学 0.839
物理 0.784
化学 0.897
RC1 RC2
SS loadings 2.661 2.312
Proportion Var 0.443 0.385
Cumulative Var 0.443 0.829
Principal Components Analysis
Call: principal(r = eg7_1, nfactors = 2, rotate = "promax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 h2 u2 com
数学 -0.12 0.83 0.81 0.19 1.0
物理 -0.10 0.78 0.70 0.30 1.0
化学 0.18 0.97 0.81 0.19 1.1
语文 0.87 -0.13 0.89 0.11 1.0
历史 0.97 0.07 0.87 0.13 1.0
英语 0.97 0.05 0.90 0.10 1.0
RC1 RC2
SS loadings 2.70 2.27
Proportion Var 0.45 0.38
Cumulative Var 0.45 0.83
Proportion Explained 0.54 0.46
Cumulative Proportion 0.54 1.00
With component correlations of
RC1 RC2
RC1 1.00 -0.49
RC2 -0.49 1.00
Mean item complexity = 1
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 5.96 with prob < 0.2
Fit based upon off diagonal values = 0.99
Loadings:
RC1 RC2
语文 0.874
历史 0.967
英语 0.971
数学 0.834
物理 0.782
化学 0.972
RC1 RC2
SS loadings 2.699 2.274
Proportion Var 0.450 0.379
Cumulative Var 0.450 0.829
#极大似然法,正交varimax旋转
fa.ml.varimax <- fa(eg7_1,
nfactors = 2,
fm = "ml",
rotate = "varimax")
fa.ml.varimax$loadings %>% print(digits =3,
cut = 0.5,
sort = TRUE)
Loadings:
ML1 ML2
语文 0.848
历史 0.862
英语 0.899
数学 0.823
物理 0.668
化学 0.811
ML1 ML2
SS loadings 2.471 2.001
Proportion Var 0.412 0.333
Cumulative Var 0.412 0.745
#极大似然法, 斜交promax旋转
fa.ml.promax <- fa(eg7_1,
nfactors = 2,
fm = "ml",
rotate = "promax")
fa.ml.promax$loadings %>% print(digits =3,
cut = 0.5,
sort = TRUE)
Loadings:
ML1 ML2
语文 0.847
历史 0.911
英语 0.956
数学 0.833
物理 0.656
化学 0.890
ML1 ML2
SS loadings 2.510 1.940
Proportion Var 0.418 0.323
Cumulative Var 0.418 0.742
#主轴因子法,正交varimax旋转
fa.pa.varimax <- fa(eg7_1,
nfactors = 2,
fm = "pa",
rotate = "varimax")
fa.pa.varimax$loadings %>% print(digits =3,
cut = 0.5,
sort = TRUE)
Loadings:
PA1 PA2
语文 0.852
历史 0.861
英语 0.903
数学 0.821
物理 0.670
化学 0.805
PA1 PA2
SS loadings 2.487 1.982
Proportion Var 0.415 0.330
Cumulative Var 0.415 0.745
#主轴因子法, 斜交varimax旋转
fa.pa.promax <- fa(eg7_1,
nfactors = 2,
fm = "pa",
rotate = "promax")
fa.pa.varimax$loadings %>% print(digits =3,
cut = 0.5,
sort = TRUE)
Loadings:
PA1 PA2
语文 0.852
历史 0.861
英语 0.903
数学 0.821
物理 0.670
化学 0.805
PA1 PA2
SS loadings 2.487 1.982
Proportion Var 0.415 0.330
Cumulative Var 0.415 0.745
4 第4步 可视化
4.1 因子载荷系数图
#主成分法 Principal Component Method无旋转
fa.pc.none <- principal(eg7_1,
nfactors = 2,
rotate = "none")
fa.pc.none$loadings
Loadings:
PC1 PC2
数学 -0.794 0.422
物理 -0.734 0.401
化学 -0.640 0.632
语文 0.888 0.313
历史 0.810 0.466
英语 0.828 0.457
PC1 PC2
SS loadings 3.710 1.262
Proportion Var 0.618 0.210
Cumulative Var 0.618 0.829
#主成分法 Principal Component Method 正交旋转varimax
fa.pc.varimax <- principal(eg7_1,
nfactors = 2,
rotate = "varimax")
fa.pc.varimax$loadings %>% print(digits = 3,
cut = 0.5,
sort = TRUE)
Loadings:
RC1 RC2
语文 0.876
历史 0.917
英语 0.925
数学 0.839
物理 0.784
化学 0.897
RC1 RC2
SS loadings 2.661 2.312
Proportion Var 0.443 0.385
Cumulative Var 0.443 0.829
#maximum likelihood 极大似然 斜交旋转promax
fa.ml.promax <- fa(eg7_1,
nfactors = 2,
fm = "ml",
rotate = "promax")
fa.ml.promax$loadings %>% print(digits = 3,
cut = 0.5,
sort = TRUE)
Loadings:
ML1 ML2
语文 0.847
历史 0.911
英语 0.956
数学 0.833
物理 0.656
化学 0.890
ML1 ML2
SS loadings 2.510 1.940
Proportion Var 0.418 0.323
Cumulative Var 0.418 0.742
colnames(fa.ml.promax$loadings) <- c("文科", "理科")
fa.ml.promax %>% fa.diagram(rsize = 0.5,
digits = 3)
4.2 因子载荷和因子得分图
#主成分法 Principal Component Method无旋转
fa.pc.none <- principal(eg7_1,
nfactors = 2,
rotate = "none")
biplot(fa.pc.none,
main = "Principal Component, no rotation")
#主成分法 Principal Component 正交旋转varimax
fa.pc.varimax <- principal(eg7_1,
nfactors = 2,
rotate = "varimax")
biplot(fa.pc.varimax,
col = c(5,6),
main = "Principal Component, Varimax")
#极大似然法, 斜交promax旋转
fa.ml.promax <- fa(eg7_1,
nfactors = 2,
fm = "ml",
rotate = "promax")
biplot(fa.ml.promax,
col = c(3,4),
main = "Maximum Likelihood, Promax")
5 本章作业
答题要求:将R的命令和输出结果转成图片,上传至91速课平台。
习题1: 教材P146,例题7.1
完成下列要求:
1.1 采用主成分(principal component)法估计因子载荷,对因子载荷进行旋转,绘制因子得分和因子载荷图。你提取了几个因子?各个因子的方差贡献率是多少?各个因子主要代表哪些原始变量?
1.2 采用极大似然法估计因子载荷,对因子载荷进行旋转,绘制因子得分和因子载荷图。
1.3 采用主因子(principal factor)法估计因子载荷,对因子载荷进行旋转,绘制因子得分和因子载荷图。
习题2: 教材P151, 案例7.1
要求:
2.1实现教材P154的表7-3、表7-4中的估计。
2.2 绘制P157的图7-5.
2.3 实现P158-159的表7-5,表7-6,表7-7的因子得分的计算,无需报告完整的表7-5,表7-6,表7-7,截取上述三张表格的前6家公司即可。
习题3: 教材P160, 习题7.7
要求:
3.1 对习题7.7的进行因子分析,你使用的因子载荷的估计方法是什么?是否对因子载荷进行了旋转?旋转方法是什么?
3.2 你提取了几个因子?各个因子的方差贡献率是多少?各个因子主要代表哪些原始变量的信息?
3.3 绘制因子得分和因子载荷图。