6.1 描述性统计量的报告——作业
第1题:数据:ggplot2::mpg
要求:将代码与输出结果转成图片后提交。
1.1 报告该数据集中所有定量变量的均值、中位数、最⼤值、最小值、四分位数、⽅差、标准差、偏度和峰度。
library(tidyverse)
library(psych) # 用于计算描述性统计量, describe()函数属于psych包
library(gt)
mpg %>%
select_if(is.numeric) %>%
describe(quant=c(.25,.75)) %>% # 计算描述性统计量,包括四分位数。
select(mean, median, max, min, Q0.25,Q0.75,skew, kurtosis,sd) %>%
mutate(var = sd^2) %>% # 计算方差
as.data.frame() %>% # 转换为数据框
round(3) # 保留三位小数
mean median max min Q0.25 Q0.75 skew kurtosis sd var
displ 3.472 3.3 7 1.6 2.4 4.6 0.439 -0.911 1.292 1.669
year 2003.500 2003.5 2008 1999.0 1999.0 2008.0 0.000 -2.009 4.510 20.337
cyl 5.889 6.0 8 4.0 4.0 8.0 0.112 -1.464 1.612 2.597
cty 16.859 17.0 35 9.0 14.0 19.0 0.786 1.431 4.256 18.113
hwy 23.440 24.0 44 12.0 18.0 27.0 0.365 0.137 5.955 35.458
1.2 将mpg中的汽车按照drv的不同分为三组(front-wheel, four-wheel, rear-wheel), 计算各组的cty和hwy的均值、中位数、最⼤值、最小值和标准差。
library(tidyverse)
library(psych)
mpg %>%
select(cty, hwy) %>%
describeBy(group = mpg$drv, mat = T, digits = 3) %>% # 按照drv分组计算描述性统计量, mat = T表示返回矩阵, digits = 3表示保留三位小数
select(group1, n, mean, median, max, min, sd) # 选择需要的列
group1 n mean median max min sd
cty1 4 103 14.330 14 21 9 2.874
cty2 f 106 19.972 19 35 11 3.627
cty3 r 25 14.080 15 18 11 2.216
hwy1 4 103 19.175 18 28 12 4.079
hwy2 f 106 28.160 28 44 17 4.207
hwy3 r 25 21.000 21 26 15 3.663
mpg %>%
select(drv, cty, hwy) %>%
group_by(drv) %>% # 按照drv分组
summarize(mean = mean(cty) %>% round(3), # 计算均值
median = median(cty),
max = max(cty),
min = min(cty),
sd = sd(cty) %>% round(3)) # 计算标准差
# A tibble: 3 × 6
drv mean median max min sd
<chr> <dbl> <dbl> <int> <int> <dbl>
1 4 14.3 14 21 9 2.87
2 f 20.0 19 35 11 3.63
3 r 14.1 15 18 11 2.22
6.2绘制概率密度曲线——作业
第2题:绘制概率密度曲线 2.1 在同⼀个坐标系下绘制N(0,1),N(-1,0.25),N(2,4)的概率密度曲线。
2.2 在同⼀个坐标系下绘制N(0,1), t(2), t(5),t(30)的概率密度曲线。
2.3 在同⼀个坐标系下绘制chisq(5), chisq(10), chisq(30)的概率密度曲线。
2.4 同⼀个坐标系下绘制F(2, 5), F(5, 10)的概率密度曲线。
library(tidyverse)
norm <- data.frame(x = seq(-7,7,0.01),
y1 = dnorm(seq(-7,7,0.01),0,1),
y2 = dnorm(seq(-7,7,0.01),-1,0.5),
y3 = dnorm(seq(-7,7,0.01),1,2))
norm %>%
ggplot(aes(x)) +
geom_line(aes(y = y1, color = "norm(0,1)")) +
geom_line(aes(y = y2, color = "norm(-1,0.25)")) +
geom_line(aes(y = y3, color = "norm(1,4)")) +
scale_color_manual(values = c("norm(0,1)" = "blue",
"norm(-1,0.25)" = "cyan4",
"norm(1,4)" = "darkorchid")) +
theme(legend.position = "bottom")
2.在同⼀个坐标系下绘制N(0,1), t(2), t(5),t(30)的概率密度曲线。
t <- data.frame(x = seq(-4,4,0.01),
y1 = dnorm(seq(-4,4,0.01),0,1),
y2 = dt(seq(-4,4,0.01),2),
y3 = dt(seq(-4,4,0.01),5),
y4 = dt(seq(-4,4,0.01),30))
t %>%
ggplot(aes(x)) +
geom_line(aes(y = y1, color = "norm(0,1)")) +
geom_line(aes(y = y2, color = "t(2)")) +
geom_line(aes(y = y3, color = "t(5)")) +
geom_line(aes(y = y4, color = "t(30)")) +
scale_color_manual(values = c("norm(0,1)" = "blue",
"t(2)" = "cyan4",
"t(5)" = "darkorchid",
"t(30)" = "green")) +
theme(legend.position = "bottom") +
theme_bw()
3.在同⼀个坐标系下绘制chisq(5), chisq(10), chisq(30)的概率密度曲线
chisq <- data.frame(x = seq(0,60,0.01),
y1 = dchisq(seq(0,60,0.01),5),
y2 = dchisq(seq(0,60,0.01),10),
y3 = dchisq(seq(0,60,0.01),30))
chisq %>%
ggplot(aes(x)) +
geom_line(aes(y = y1, color = "chisq(5)")) +
geom_line(aes(y = y2, color = "chisq(10)")) +
geom_line(aes(y = y3, color = "chisq(30)")) +
scale_color_manual(values = c("chisq(5)" = "blue",
"chisq(10)" = "cyan4",
"chisq(30)" = "darkorchid"))+
theme(legend.position = "bottom")+
theme_bw()
4.同⼀个坐标系下绘制F(2, 5), F(5, 10)的概率密度曲线。
f <- data.frame(x = seq(0,20,0.01),
y1 = df(seq(0,20,0.01),2,5),
y2 = df(seq(0,20,0.01),5,10))
f %>%
ggplot(aes(x)) +
geom_line(aes(y = y1, color = "F(2,5)")) +
geom_line(aes(y = y2, color = "F(5,10)")) +
scale_color_manual(values = c("F(2,5)" = "blue",
"F(5,10)" = "cyan4"))+
theme(legend.position = "bottom")+
theme_bw()
第3题:绘制直方图。 用循环语句完成下列图形的绘制:
3.1 ⽣成1000个服从自由度为4的卡方分布的随机数,绘制其直⽅图。
3.2 ⽣成1000个服从自由度为8的卡方分布的随机数,绘制其直⽅图。
3.3 ⽣成1000个服从自由度为12的卡方分布的随机数,绘制其直⽅图。
3.4 ⽣成1000个服从自由度为16的卡方分布的随机数,绘制其直⽅图。
3.5 ⽣成1000个服从自由度为20的卡方分布的随机数,绘制其直⽅图。
3.6 ⽣成1000个服从自由度为24的卡方分布的随机数,绘制其直⽅图。