统计学(第8版)P181 习题9.2
<- c(28,56,48,36,32)
ob.freq
# p= 期望概率,若不设定,则默认各类别取值概率相同
chisq.test(ob.freq, p = c(0.1,0.2,0.3,0.2,0.2))
Chi-squared test for given probabilities
data: ob.freq
X-squared = 14, df = 4, p-value = 0.007295
统计学(第8版)P181 习题9.3
<- c(6,12,38,21,13,16,40,22,
ob.freq 14,8,11,9,17,8,6,13)
#matrix 默认按列填充,按行填充需要增加参数项 byrow = TRUE
<- matrix(ob.freq,4,4,
table dimnames = list(c("早上","中午","晚上","有空"),
c("大学以上","大学和大专","高中","高中以下"))
)
table
大学以上 大学和大专 高中 高中以下
早上 6 13 14 17
中午 12 16 8 8
晚上 38 40 11 6
有空 21 22 9 13
chisq.test(table)
Pearson's Chi-squared test
data: table
X-squared = 31.861, df = 9, p-value = 0.0002104
数据集:titanic {ggmosaic}。
利用R完成下列要求,附上R代码及输出结果。
2.1 对Class和Survived进行独立性检验。
data(titanic)
table(titanic$Class, titanic$Survived) %>%
chisq.test()
Pearson's Chi-squared test
data: .
X-squared = 190.4, df = 3, p-value < 2.2e-16
2.2 绘制Class和Survived的列联表、计算行的百分比和列的百分比。
<- table(titanic$Class,
crosstab $Survived)
titanic
%>%
crosstab prop.table() %>%
round(3)*100
No Yes
1st 5.5 9.2
2nd 7.6 5.4
3rd 24.0 8.1
Crew 30.6 9.6
%>%
crosstab prop.table(margin = 1) %>%
round(3)*100
No Yes
1st 37.5 62.5
2nd 58.6 41.4
3rd 74.8 25.2
Crew 76.0 24.0
%>%
crosstab prop.table(margin = 2) %>%
round(3)*100
No Yes
1st 8.2 28.6
2nd 11.2 16.6
3rd 35.4 25.0
Crew 45.2 29.8
2.3 绘制堆栈百分比条形图呈现Class和Survived的分布,在图中标注出百分比。
<- titanic %>%
percent.table group_by(Class, Survived) %>%
summarise(count = n()) %>%
mutate(percent = count/sum(count)*100)
%>%
percent.table ggplot(aes(Class, percent,fill = Survived))+
geom_col()+
geom_text(aes(label = paste0(round(percent),"%")),
position = position_stack(vjust = 0.5),
size = 4)+
labs(title = "Class and Survival on Titanic")+
theme_bw()+
theme(text = element_text(size = 12))
2.4 绘制马赛克图(Mosaic Plot)呈现Class和Survived的分布。
table(titanic$Class, titanic$Survived) %>%
mosaicplot(main = "Class and Survival on Titanic",
xlab = "Class",
ylab = "Survived",
dir = c("h","v"),
off = 3,
color = c(4,5),
las = 1)
2.5 简要概括Class和Survived二者之间关系特征。
一等舱存活率62%,二等舱41%,3等舱和船员的存活率比较接近约为25%。舱位等级越高,存活率越高。
2.6 对Class和Sex进行独立性检验。
table(titanic$Class, titanic$Sex) %>%
chisq.test()
Pearson's Chi-squared test
data: .
X-squared = 349.91, df = 3, p-value < 2.2e-16
2.7 绘制Class和Sex的列联表、计算行的百分比和列的百分比。
<- table(titanic$Class,
crosstab $Sex)
titanic
%>%
crosstab prop.table() %>%
round(3)*100
Male Female
1st 8.2 6.6
2nd 8.1 4.8
3rd 23.2 8.9
Crew 39.2 1.0
%>%
crosstab prop.table(margin = 1) %>%
round(3)*100
Male Female
1st 55.4 44.6
2nd 62.8 37.2
3rd 72.2 27.8
Crew 97.4 2.6
%>%
crosstab prop.table(margin = 2) %>%
round(3)*100
Male Female
1st 10.4 30.9
2nd 10.3 22.6
3rd 29.5 41.7
Crew 49.8 4.9
2.8 绘制堆栈百分比条形图呈现Class和Sex的分布,在图中标注出百分比。
<- titanic %>%
percent.table group_by(Class, Sex) %>%
summarise(count = n()) %>%
mutate(percent = count/sum(count)*100)
%>%
percent.table ggplot(aes(Class, percent,fill = Sex))+
geom_col()+
geom_text(aes(label = paste0(round(percent),"%")),
position = position_stack(vjust = 0.5),
size = 4)+
labs(title = "Class and Sex on Titanic")+
theme_bw()+
theme(text = element_text(size = 12))
2.9 绘制马赛克图(Mosaic Plot)呈现Class和Sex的分布。
table(titanic$Class, titanic$Sex) %>%
mosaicplot(main = "Class and Survival on Titanic",
xlab = "Class",
ylab = "Survived",
dir = c("h","v"),
off = 3,
color = c(4,5),
las = 1)
2.10 简要概括Class和Sex二者之间关系特征。
一等舱女性占比45%,二等舱女性占比37%,三等舱女性占比28%,船员女性占比3%。舱位等级越高,女性比重越大。
数据集:diamonds {ggplot2}
利用R完成下列要求,附上R代码及输出结果。
3.1 对clarity和cut进行独立性检验。
table(diamonds$clarity, diamonds$cut) %>%
chisq.test()
Pearson's Chi-squared test
data: .
X-squared = 4391.4, df = 28, p-value < 2.2e-16
3.2 绘制马赛克图(Mosaic Plot)呈现clarity和cut的分布。
table(diamonds$clarity, diamonds$cut) %>%
mosaicplot(color = c(2:6),
dir = c("h","v"),
las = 1,
main = "Clarity and Cut of Diamonds")
3.3 对clarity和color进行独立性检验。
table(diamonds$color, diamonds$clarity) %>%
chisq.test()
Pearson's Chi-squared test
data: .
X-squared = 2047.1, df = 42, p-value < 2.2e-16
3.4 绘制马赛克图(Mosaic Plot)呈现clarity和color的分布。
table(diamonds$color, diamonds$clarity) %>%
mosaicplot(color = c(1:8),
dir = c("h","v"),
las = 1,
main = "Color and Clarity of Diamonds")