第3章

第3章 图表工具

讲义讲义

课堂练习

题目1 饼图

数据来源:大众点评网,广州六大品牌奶茶店

样本容量:336

数据来源:大众点评网,广州6大奶茶品牌

Show the code
library(tidyverse)
library(readxl)
data  <- read_excel("data/top6.xlsx")

par(family  = 'STKaiti')
library(showtext)
showtext_auto()

data %>% 
  count(brand) %>% 
  mutate(percent = round(n/sum(n)*100, 1))
# A tibble: 6 × 3
  brand                n percent
  <chr>            <int>   <dbl>
1 CoCo都可            64    19  
2 TANING手挞柠檬茶    44    13.1
3 书亦烧仙草          82    24.4
4 喜茶                43    12.8
5 益禾堂              45    13.4
6 茶百道              58    17.3
Show the code
table(data$brand) %>% 
  pie()

Show the code
pie_table <- data %>% 
  count(brand) %>% 
  mutate(percent = round(n/sum(n)*100, 1),
         per_label = paste0(brand, " ", percent, "%")) %>% 
  arrange(desc(percent))
  
pie(pie_table$percent, 
    labels = pie_table$per_label, 
    clockwise = TRUE,
    init.angle = 90)

题目2 条形图

Show the code
data %>% ggplot(aes(brand, fill = brand)) + 
  geom_bar() +
  labs(x = "品牌", fill = "品牌") +
  theme_bw() +
  theme(text = element_text(size = 15),
        legend.position = "bottom",
        legend.text = element_text(size = 10)) 

Show the code
library(forcats)
data %>% ggplot(aes(fct_infreq(brand), fill = brand)) + 
  geom_bar() +
  geom_text(stat = "count", aes(label = after_stat(count)), vjust = -1)+ 
  scale_y_continuous(limits = c(0,100))+
  guides(x = guide_axis(angle = 45)) +
  labs(x = "品牌", fill = "品牌") +
  theme_bw() +
  theme(text = element_text(size = 15),
        legend.position = "none",
        legend.text = element_text(size = 10)) 

题目3 直方图

Show the code
library(ggplot2)

# 难看的直方图
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 5, color = "black", alpha = 0.9) +
  labs(title = "Ugly Histogram", x = "Highway Mileage", y = "Count") +
  theme(
    plot.title = element_text(size = 20, hjust = 0.5, color = "purple"),
    axis.title.x = element_text(size = 15, angle = 45, vjust = 1, color = "red"),
    axis.title.y = element_text(size = 15, angle = 90, hjust = 1, color = "green"),
    legend.position = "top",
    panel.background = element_rect(fill = "pink"),
    panel.grid.major = element_line(color = "yellow", size = 1)
  )

Show the code
# 好看的直方图
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(breaks = seq(12, 46, 2), 
                 fill = "#0073C2FF", 
                 color = "white", 
                 alpha = 0.8) +
  scale_x_continuous(breaks = seq(12, 46, 2),
                     labels = seq(12, 46, 2)) +
  scale_y_continuous(limits = c(0,50))+
  labs(title = "Highway Mileage Distribution", x = "Highway Mileage", y = "Frequency") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 20, hjust = 0.5),
    axis.title.x = element_text(size = 18),
    axis.title.y = element_text(size = 18),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank()
  )

题目4 二维变量的作图

分组条形图

Show the code
data %>%
  group_by(area, brand) %>%
  summarise(count = n(), .groups = "drop") %>% 
  mutate(brand_order = paste(brand, area, rank(-count), sep = "_"))  %>% 
  ggplot(aes(reorder(brand_order, -count),
             count, fill = brand)) +
  geom_col()+
  facet_wrap(~ area, scales = "free_x", ncol = 3) +
  geom_text(aes(label = count), vjust = -0.5) +
  scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) + # 移除排序编号
  scale_y_continuous(limits = c(0, 42)) +
  labs(x = "品牌", y = "数量") +
  theme(axis.text.x = element_text(angle = 42, hjust = 1), 
        legend.position = "none")

分组直方图

Show the code
#按单个定性变量分组
data %>% 
  ggplot(aes(retail.price, fill = brand))+
  geom_histogram(breaks = seq(6, 38, 2))+
  facet_wrap(~brand, ncol = 2) +
  scale_y_continuous(limits = c(0,60))+
  scale_fill_brewer(palette  = "Set1") +
  labs(title = "零售价直方图",
       x = "零售价",
       y = "频数") +
  scale_x_continuous(breaks = seq(6, 36, 4),
                     labels = seq(6, 36, 4)) +
  theme_bw() +
  theme(text = element_text(size = 15),
        legend.position = "bottom",
        legend.text = element_text(size = 10))

分组直方图

Show the code
data %>% 
  ggplot(aes(retail.price, reorder(brand, retail.price, FUN = median),
             col = brand))+
  geom_boxplot()+
  labs(title ="六大奶茶品牌零售价格",
       x = "零售价格",
       y = "品牌", fill = "品牌")+
  theme_bw() +
  theme(text = element_text(size = 15),
        legend.position = "bottom",
        legend.text = element_text(size = 10))

Excel教学视频

讲义Excel图表工具合集

讲义Excel图表工具综合运用——基金经理

讲义EXCEL数据清洗合集

讲义EXCEL如何美化箱线图?

Excel习题

习题3 毕业生.xlsx

数据文件:毕业生.xlsx(数据文件在QQ 群文件夹中),用Excel完成以下任务:

3.1 绘制毕业生性别分布的频数分布表,在表中列出男性和女性的人数及比重。

3.2 绘制毕业生性别分布的饼图和条形图。

3.3 绘制毕业生的专业人数分布的频数分布表,在表中列出各个专业的人数和百分比。

3.4 绘制毕业生各专业人数分布的帕累托图。

3.5 绘制毕业生的政治面貌的频数分布表。

3.6 绘制毕业生的政治面貌的瀑布图。

3.7 绘制毕业生的就业单位类型的频数分布表。

3.8 对毕业生先按性别、再按就业单位类型进行层级分组,绘制树状图。

3.9 对毕业生先按性别、再按政治面貌进行层级分组,绘制树状图。

习题4 毕业生.xlsx

数据文件:毕业生.xlsx(数据文件在QQ 群文件夹中),用Excel完成以下任务:

4.1 绘制毕业生月薪的频数分布表,采用适宜的组矩,在表中列出各个组别的人数和百分比。

4.2 绘制毕业生月薪的直方图。

4.3 绘制毕业生月薪的频数折线图、累积百分比折线图,在图中标注出频数或累积百分比。在870位毕业生中,月薪小于等于4000、小于等于6000的各占比多少?。

4.4 绘制毕业生月薪的箱线图,在箱线图中标注出第1个四分位数、中位数和第3个四分位数。

习题讲评视频

讲义Excel习题3讲评

讲义Excel习题4讲评

讲义教材第3章习题讲评

学生习作讲评

教材第3章习题学生习作点评

拓展资源

在线词云图工具 https://www.wordclouds.com/