统计学(第8版)P181 习题9.2

ob.freq <- c(28,56,48,36,32)

# p= 期望概率,若不设定,则默认各类别取值概率相同
chisq.test(ob.freq, p = c(0.1,0.2,0.3,0.2,0.2))

    Chi-squared test for given probabilities

data:  ob.freq
X-squared = 14, df = 4, p-value = 0.007295

统计学(第8版)P181 习题9.3

ob.freq <- c(6,12,38,21,13,16,40,22,
             14,8,11,9,17,8,6,13)

#matrix 默认按列填充,按行填充需要增加参数项 byrow = TRUE
table <- matrix(ob.freq,4,4,
       dimnames = list(c("早上","中午","晚上","有空"),
                       c("大学以上","大学和大专","高中","高中以下"))
)

table
     大学以上 大学和大专 高中 高中以下
早上        6         13   14       17
中午       12         16    8        8
晚上       38         40   11        6
有空       21         22    9       13
chisq.test(table)

    Pearson's Chi-squared test

data:  table
X-squared = 31.861, df = 9, p-value = 0.0002104

数据集:titanic {ggmosaic}。

利用R完成下列要求,附上R代码及输出结果。

2.1 对Class和Survived进行独立性检验。

data(titanic)
table(titanic$Class, titanic$Survived) %>% 
  chisq.test()

    Pearson's Chi-squared test

data:  .
X-squared = 190.4, df = 3, p-value < 2.2e-16

2.2 绘制Class和Survived的列联表、计算行的百分比和列的百分比。

crosstab <- table(titanic$Class,
                  titanic$Survived)

crosstab %>% 
  prop.table() %>% 
  round(3)*100
      
         No  Yes
  1st   5.5  9.2
  2nd   7.6  5.4
  3rd  24.0  8.1
  Crew 30.6  9.6
crosstab %>% 
  prop.table(margin = 1) %>% 
  round(3)*100
      
         No  Yes
  1st  37.5 62.5
  2nd  58.6 41.4
  3rd  74.8 25.2
  Crew 76.0 24.0
crosstab %>% 
  prop.table(margin = 2) %>% 
  round(3)*100
      
         No  Yes
  1st   8.2 28.6
  2nd  11.2 16.6
  3rd  35.4 25.0
  Crew 45.2 29.8

2.3 绘制堆栈百分比条形图呈现Class和Survived的分布,在图中标注出百分比。

percent.table <- titanic %>% 
  group_by(Class, Survived) %>% 
  summarise(count = n()) %>% 
  mutate(percent = count/sum(count)*100)

percent.table %>% 
  ggplot(aes(Class, percent,fill  = Survived))+
  geom_col()+
  geom_text(aes(label = paste0(round(percent),"%")),
            position = position_stack(vjust = 0.5),
            size = 4)+
  labs(title = "Class and Survival on Titanic")+
  theme_bw()+
  theme(text = element_text(size = 12))

2.4 绘制马赛克图(Mosaic Plot)呈现Class和Survived的分布。

table(titanic$Class, titanic$Survived) %>% 
  mosaicplot(main = "Class and Survival on Titanic",
             xlab = "Class",
             ylab = "Survived",
             dir = c("h","v"),
             off = 3,
             color = c(4,5),
             las = 1)

2.5 简要概括Class和Survived二者之间关系特征。

一等舱存活率62%,二等舱41%,3等舱和船员的存活率比较接近约为25%。舱位等级越高,存活率越高。

2.6 对Class和Sex进行独立性检验。

table(titanic$Class, titanic$Sex) %>% 
  chisq.test()

    Pearson's Chi-squared test

data:  .
X-squared = 349.91, df = 3, p-value < 2.2e-16

2.7 绘制Class和Sex的列联表、计算行的百分比和列的百分比。

crosstab <- table(titanic$Class,
                  titanic$Sex)

crosstab %>% 
  prop.table() %>% 
  round(3)*100
      
       Male Female
  1st   8.2    6.6
  2nd   8.1    4.8
  3rd  23.2    8.9
  Crew 39.2    1.0
crosstab %>% 
  prop.table(margin = 1) %>% 
  round(3)*100
      
       Male Female
  1st  55.4   44.6
  2nd  62.8   37.2
  3rd  72.2   27.8
  Crew 97.4    2.6
crosstab %>% 
  prop.table(margin = 2) %>% 
  round(3)*100
      
       Male Female
  1st  10.4   30.9
  2nd  10.3   22.6
  3rd  29.5   41.7
  Crew 49.8    4.9

2.8 绘制堆栈百分比条形图呈现Class和Sex的分布,在图中标注出百分比。

percent.table <- titanic %>% 
  group_by(Class, Sex) %>% 
  summarise(count = n()) %>% 
  mutate(percent = count/sum(count)*100)

percent.table %>% 
  ggplot(aes(Class, percent,fill  = Sex))+
  geom_col()+
  geom_text(aes(label = paste0(round(percent),"%")),
            position = position_stack(vjust = 0.5),
            size = 4)+
  labs(title = "Class and Sex on Titanic")+
  theme_bw()+
  theme(text = element_text(size = 12))

2.9 绘制马赛克图(Mosaic Plot)呈现Class和Sex的分布。

table(titanic$Class, titanic$Sex) %>% 
  mosaicplot(main = "Class and Survival on Titanic",
             xlab = "Class",
             ylab = "Survived",
             dir = c("h","v"),
             off = 3,
             color = c(4,5),
             las = 1)

2.10 简要概括Class和Sex二者之间关系特征。

一等舱女性占比45%,二等舱女性占比37%,三等舱女性占比28%,船员女性占比3%。舱位等级越高,女性比重越大。

数据集:diamonds {ggplot2}

利用R完成下列要求,附上R代码及输出结果。

3.1 对clarity和cut进行独立性检验。

table(diamonds$clarity, diamonds$cut) %>% 
  chisq.test()

    Pearson's Chi-squared test

data:  .
X-squared = 4391.4, df = 28, p-value < 2.2e-16

3.2 绘制马赛克图(Mosaic Plot)呈现clarity和cut的分布。

table(diamonds$clarity, diamonds$cut) %>% 
  mosaicplot(color = c(2:6),
             dir = c("h","v"),
             las = 1,
             main = "Clarity and Cut of Diamonds")

3.3 对clarity和color进行独立性检验。

table(diamonds$color, diamonds$clarity) %>% 
  chisq.test()

    Pearson's Chi-squared test

data:  .
X-squared = 2047.1, df = 42, p-value < 2.2e-16

3.4 绘制马赛克图(Mosaic Plot)呈现clarity和color的分布。

table(diamonds$color, diamonds$clarity) %>% 
  mosaicplot(color = c(1:8),
             dir = c("h","v"),
             las = 1,
             main = "Color and Clarity of Diamonds")