6.1 描述性统计量的报告——作业

第1题:数据:ggplot2::mpg

要求:将代码与输出结果转成图片后提交。

1.1 报告该数据集中所有定量变量的均值、中位数、最⼤值、最小值、四分位数、⽅差、标准差、偏度和峰度。

library(tidyverse)
library(psych) # 用于计算描述性统计量, describe()函数属于psych包
library(gt)

mpg %>% 
  select_if(is.numeric) %>% 
  describe(quant=c(.25,.75)) %>% # 计算描述性统计量,包括四分位数。
  select(mean, median, max, min, Q0.25,Q0.75,skew, kurtosis,sd) %>%
  mutate(var = sd^2) %>% # 计算方差
  as.data.frame() %>% # 转换为数据框
  round(3) # 保留三位小数
          mean median  max    min  Q0.25  Q0.75  skew kurtosis    sd    var
displ    3.472    3.3    7    1.6    2.4    4.6 0.439   -0.911 1.292  1.669
year  2003.500 2003.5 2008 1999.0 1999.0 2008.0 0.000   -2.009 4.510 20.337
cyl      5.889    6.0    8    4.0    4.0    8.0 0.112   -1.464 1.612  2.597
cty     16.859   17.0   35    9.0   14.0   19.0 0.786    1.431 4.256 18.113
hwy     23.440   24.0   44   12.0   18.0   27.0 0.365    0.137 5.955 35.458

1.2 将mpg中的汽车按照drv的不同分为三组(front-wheel, four-wheel, rear-wheel), 计算各组的cty和hwy的均值、中位数、最⼤值、最小值和标准差。

library(tidyverse)
library(psych)

mpg %>% 
  select(cty, hwy) %>% 
  describeBy(group = mpg$drv, mat = T, digits = 3) %>% # 按照drv分组计算描述性统计量, mat = T表示返回矩阵, digits = 3表示保留三位小数
  select(group1, n, mean, median, max, min, sd) # 选择需要的列
     group1   n   mean median max min    sd
cty1      4 103 14.330     14  21   9 2.874
cty2      f 106 19.972     19  35  11 3.627
cty3      r  25 14.080     15  18  11 2.216
hwy1      4 103 19.175     18  28  12 4.079
hwy2      f 106 28.160     28  44  17 4.207
hwy3      r  25 21.000     21  26  15 3.663
mpg %>% 
  select(drv, cty, hwy) %>%
  group_by(drv) %>% # 按照drv分组
  summarize(mean = mean(cty) %>% round(3), # 计算均值
            median = median(cty),
            max = max(cty),
            min = min(cty), 
            sd = sd(cty) %>% round(3)) # 计算标准差
# A tibble: 3 × 6
  drv    mean median   max   min    sd
  <chr> <dbl>  <dbl> <int> <int> <dbl>
1 4      14.3     14    21     9  2.87
2 f      20.0     19    35    11  3.63
3 r      14.1     15    18    11  2.22

6.2绘制概率密度曲线——作业

第2题:绘制概率密度曲线 2.1 在同⼀个坐标系下绘制N(0,1),N(-1,0.25),N(2,4)的概率密度曲线。

2.2 在同⼀个坐标系下绘制N(0,1), t(2), t(5),t(30)的概率密度曲线。

2.3 在同⼀个坐标系下绘制chisq(5), chisq(10), chisq(30)的概率密度曲线。

2.4 同⼀个坐标系下绘制F(2, 5), F(5, 10)的概率密度曲线。

library(tidyverse)

norm <- data.frame(x = seq(-7,7,0.01),
                   y1 = dnorm(seq(-7,7,0.01),0,1),
                   y2 = dnorm(seq(-7,7,0.01),-1,0.5),
                   y3 = dnorm(seq(-7,7,0.01),1,2))

norm %>% 
  ggplot(aes(x)) +
  geom_line(aes(y = y1, color = "norm(0,1)")) +
  geom_line(aes(y = y2, color = "norm(-1,0.25)")) +
  geom_line(aes(y = y3, color = "norm(1,4)")) +
  scale_color_manual(values = c("norm(0,1)" = "blue", 
                                "norm(-1,0.25)" = "cyan4", 
                                "norm(1,4)" = "darkorchid")) +
  theme(legend.position = "bottom") 

2.在同⼀个坐标系下绘制N(0,1), t(2), t(5),t(30)的概率密度曲线。

t <- data.frame(x = seq(-4,4,0.01),
                   y1 = dnorm(seq(-4,4,0.01),0,1),
                   y2 = dt(seq(-4,4,0.01),2),
                   y3 = dt(seq(-4,4,0.01),5),
                   y4 = dt(seq(-4,4,0.01),30))



t %>% 
  ggplot(aes(x)) +
  geom_line(aes(y = y1, color = "norm(0,1)")) +
  geom_line(aes(y = y2, color = "t(2)")) +
  geom_line(aes(y = y3, color = "t(5)")) +
  geom_line(aes(y = y4, color = "t(30)")) +
  scale_color_manual(values = c("norm(0,1)" = "blue", 
                                "t(2)" = "cyan4", 
                                "t(5)" = "darkorchid",
                                "t(30)" = "green")) +
  theme(legend.position = "bottom") +
  theme_bw()

3.在同⼀个坐标系下绘制chisq(5), chisq(10), chisq(30)的概率密度曲线

chisq <- data.frame(x = seq(0,60,0.01),
                   y1 = dchisq(seq(0,60,0.01),5),
                   y2 = dchisq(seq(0,60,0.01),10),
                   y3 = dchisq(seq(0,60,0.01),30))

chisq %>% 
  ggplot(aes(x)) +
  geom_line(aes(y = y1, color = "chisq(5)")) +
  geom_line(aes(y = y2, color = "chisq(10)")) +
  geom_line(aes(y = y3, color = "chisq(30)")) +
  scale_color_manual(values = c("chisq(5)" = "blue", 
                                "chisq(10)" = "cyan4", 
                                "chisq(30)" = "darkorchid"))+
  theme(legend.position = "bottom")+
  theme_bw()

4.同⼀个坐标系下绘制F(2, 5), F(5, 10)的概率密度曲线。

f <- data.frame(x = seq(0,20,0.01),
                   y1 = df(seq(0,20,0.01),2,5),
                   y2 = df(seq(0,20,0.01),5,10))

f %>% 
  ggplot(aes(x)) +
  geom_line(aes(y = y1, color = "F(2,5)")) +
  geom_line(aes(y = y2, color = "F(5,10)")) +
  scale_color_manual(values = c("F(2,5)" = "blue", 
                                "F(5,10)" = "cyan4"))+
  theme(legend.position = "bottom")+
  theme_bw()

第3题:绘制直方图。 用循环语句完成下列图形的绘制:

3.1 ⽣成1000个服从自由度为4的卡方分布的随机数,绘制其直⽅图。

3.2 ⽣成1000个服从自由度为8的卡方分布的随机数,绘制其直⽅图。

3.3 ⽣成1000个服从自由度为12的卡方分布的随机数,绘制其直⽅图。

3.4 ⽣成1000个服从自由度为16的卡方分布的随机数,绘制其直⽅图。

3.5 ⽣成1000个服从自由度为20的卡方分布的随机数,绘制其直⽅图。

3.6 ⽣成1000个服从自由度为24的卡方分布的随机数,绘制其直⽅图。