本节使用数据集starwars。

#加载包dplyr
library(dplyr)
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

1 dplyr函数概况

1.1 dplyr函数的分类

所有的dplyr函数,第1项参数都是数据框(data frame 或者 tibble)

dplyr提供了一系列管理数据的函数,根据管理的对象不同,可以将这些函数分为对列的管理、对行的管理、对行组的管理三类。

  • 列的管理 Columns:
    • select() changes whether or not a column is included.
    • rename() changes the name of columns.
    • mutate() changes the values of columns and creates new columns.
    • relocate() changes the order of the columns.
  • 行的管理 Rows:
    • filter() chooses rows based on column values.
    • slice() chooses rows based on location.
    • arrange() changes the order of the rows.
  • 行组的管理Groups of rows:
    • summarise() collapses a group into a single row.
    • count()
    • top_n()

1.2 dplyr函数的共性 Commonalities

  • 第1项参数是data frame.
  • 第2项参数描述动作。
  • 输出结果是一个新的data frame.

2 列的管理Column

2.1 select()

  • contains()
  • starts_with()
  • ends_with()
  • last_col()
  • ?select_helpers
#names()列出数据框所有变量名
names(starwars)
 [1] "name"       "height"     "mass"       "hair_color" "skin_color"
 [6] "eye_color"  "birth_year" "sex"        "gender"     "homeworld" 
[11] "species"    "films"      "vehicles"   "starships" 
#选择需要的变量,创建新的数据框
starwars_selected <- starwars %>% select(
  name, height, mass, sex, species, films)
starwars_selected
# A tibble: 87 × 6
   name               height  mass sex    species films    
   <chr>               <int> <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172    77 male   Human   <chr [5]>
 2 C-3PO                 167    75 none   Droid   <chr [6]>
 3 R2-D2                  96    32 none   Droid   <chr [7]>
 4 Darth Vader           202   136 male   Human   <chr [4]>
 5 Leia Organa           150    49 female Human   <chr [5]>
 6 Owen Lars             178   120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>
 8 R5-D4                  97    32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>
# ℹ 77 more rows
#选择一个范围,变量名1:变量2
data1 <- starwars %>% 
       select(name:mass)
data1
# A tibble: 87 × 3
   name               height  mass
   <chr>               <int> <dbl>
 1 Luke Skywalker        172    77
 2 C-3PO                 167    75
 3 R2-D2                  96    32
 4 Darth Vader           202   136
 5 Leia Organa           150    49
 6 Owen Lars             178   120
 7 Beru Whitesun lars    165    75
 8 R5-D4                  97    32
 9 Biggs Darklighter     183    84
10 Obi-Wan Kenobi        182    77
# ℹ 77 more rows
#设置条件 contains()
data2 <- starwars %>% 
       select(name, species, contains("color"))
data2
# A tibble: 87 × 5
   name               species hair_color    skin_color  eye_color
   <chr>              <chr>   <chr>         <chr>       <chr>    
 1 Luke Skywalker     Human   blond         fair        blue     
 2 C-3PO              Droid   <NA>          gold        yellow   
 3 R2-D2              Droid   <NA>          white, blue red      
 4 Darth Vader        Human   none          white       yellow   
 5 Leia Organa        Human   brown         light       brown    
 6 Owen Lars          Human   brown, grey   light       blue     
 7 Beru Whitesun lars Human   brown         light       blue     
 8 R5-D4              Droid   <NA>          white, red  red      
 9 Biggs Darklighter  Human   black         light       brown    
10 Obi-Wan Kenobi     Human   auburn, white fair        blue-gray
# ℹ 77 more rows

2.2 mutate()

在数据集中追加新的变量。

#生成新变量
starwars_selected %>% 
  mutate(BMI = mass/((height/100)^2)) %>% 
  arrange(desc(BMI))
# A tibble: 87 × 7
   name                  height  mass sex            species        films    BMI
   <chr>                  <int> <dbl> <chr>          <chr>          <list> <dbl>
 1 Jabba Desilijic Tiure    175  1358 hermaphroditic Hutt           <chr>  443. 
 2 Dud Bolt                  94    45 male           Vulptereen     <chr>   50.9
 3 Yoda                      66    17 male           Yoda's species <chr>   39.0
 4 Owen Lars                178   120 male           Human          <chr>   37.9
 5 IG-88                    200   140 none           Droid          <chr>   35  
 6 R2-D2                     96    32 none           Droid          <chr>   34.7
 7 Grievous                 216   159 male           Kaleesh        <chr>   34.1
 8 R5-D4                     97    32 none           Droid          <chr>   34.0
 9 Jek Tono Porkins         180   110 male           Human          <chr>   34.0
10 Darth Vader              202   136 male           Human          <chr>   33.3
# ℹ 77 more rows
starwars_selected
# A tibble: 87 × 6
   name               height  mass sex    species films    
   <chr>               <int> <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172    77 male   Human   <chr [5]>
 2 C-3PO                 167    75 none   Droid   <chr [6]>
 3 R2-D2                  96    32 none   Droid   <chr [7]>
 4 Darth Vader           202   136 male   Human   <chr [4]>
 5 Leia Organa           150    49 female Human   <chr [5]>
 6 Owen Lars             178   120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>
 8 R5-D4                  97    32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>
# ℹ 77 more rows
starwars_selected %>% 
  mutate(wt_rank = min_rank(desc(mass))) %>% 
  arrange((wt_rank)) 
# A tibble: 87 × 7
   name                  height  mass sex            species    films  wt_rank
   <chr>                  <int> <dbl> <chr>          <chr>      <list>   <int>
 1 Jabba Desilijic Tiure    175  1358 hermaphroditic Hutt       <chr>        1
 2 Grievous                 216   159 male           Kaleesh    <chr>        2
 3 IG-88                    200   140 none           Droid      <chr>        3
 4 Darth Vader              202   136 male           Human      <chr>        4
 5 Tarfful                  234   136 male           Wookiee    <chr>        4
 6 Owen Lars                178   120 male           Human      <chr>        6
 7 Bossk                    190   113 male           Trandoshan <chr>        7
 8 Chewbacca                228   112 male           Wookiee    <chr>        8
 9 Jek Tono Porkins         180   110 male           Human      <chr>        9
10 Dexter Jettster          198   102 male           Besalisk   <chr>       10
# ℹ 77 more rows
#overwrite the variable
starwars_selected %>% 
  mutate(height = height/100) %>% 
  arrange(desc(height))
# A tibble: 87 × 6
   name         height  mass sex    species  films    
   <chr>         <dbl> <dbl> <chr>  <chr>    <list>   
 1 Yarael Poof    2.64    NA male   Quermian <chr [1]>
 2 Tarfful        2.34   136 male   Wookiee  <chr [1]>
 3 Lama Su        2.29    88 male   Kaminoan <chr [1]>
 4 Chewbacca      2.28   112 male   Wookiee  <chr [5]>
 5 Roos Tarpals   2.24    82 male   Gungan   <chr [1]>
 6 Grievous       2.16   159 male   Kaleesh  <chr [1]>
 7 Taun We        2.13    NA female Kaminoan <chr [1]>
 8 Rugor Nass     2.06    NA male   Gungan   <chr [1]>
 9 Tion Medon     2.06    80 male   Pau'an   <chr [1]>
10 Darth Vader    2.02   136 male   Human    <chr [4]>
# ℹ 77 more rows
#everything() selects all variable.
starwars %>%
  mutate(height_m = height / 100) %>%
  select(height_m, height, everything())
# A tibble: 87 × 15
   height_m height name    mass hair_color skin_color eye_color birth_year sex  
      <dbl>  <int> <chr>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
 1     1.72    172 Luke …    77 blond      fair       blue            19   male 
 2     1.67    167 C-3PO     75 <NA>       gold       yellow         112   none 
 3     0.96     96 R2-D2     32 <NA>       white, bl… red             33   none 
 4     2.02    202 Darth…   136 none       white      yellow          41.9 male 
 5     1.5     150 Leia …    49 brown      light      brown           19   fema…
 6     1.78    178 Owen …   120 brown, gr… light      blue            52   male 
 7     1.65    165 Beru …    75 brown      light      blue            47   fema…
 8     0.97     97 R5-D4     32 <NA>       white, red red             NA   none 
 9     1.83    183 Biggs…    84 black      light      brown           24   male 
10     1.82    182 Obi-W…    77 auburn, w… fair       blue-gray       57   male 
# ℹ 77 more rows
# ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

2.3 rename()

#重命名变量,rename(新名=旧名)
starwars_selected %>% 
  rename(weight = mass)
# A tibble: 87 × 6
   name               height weight sex    species films    
   <chr>               <int>  <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172     77 male   Human   <chr [5]>
 2 C-3PO                 167     75 none   Droid   <chr [6]>
 3 R2-D2                  96     32 none   Droid   <chr [7]>
 4 Darth Vader           202    136 male   Human   <chr [4]>
 5 Leia Organa           150     49 female Human   <chr [5]>
 6 Owen Lars             178    120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165     75 female Human   <chr [3]>
 8 R5-D4                  97     32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183     84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182     77 male   Human   <chr [6]>
# ℹ 77 more rows
#在select 的时候重命名
data3 <- starwars %>% 
  select(name, height, weight = mass)
data3
# A tibble: 87 × 3
   name               height weight
   <chr>               <int>  <dbl>
 1 Luke Skywalker        172     77
 2 C-3PO                 167     75
 3 R2-D2                  96     32
 4 Darth Vader           202    136
 5 Leia Organa           150     49
 6 Owen Lars             178    120
 7 Beru Whitesun lars    165     75
 8 R5-D4                  97     32
 9 Biggs Darklighter     183     84
10 Obi-Wan Kenobi        182     77
# ℹ 77 more rows

2.4 transmute()

transmute()是select()和mutate()的结合。

Transmute is like a combination of select and mutate: you are getting back just a subset of the columns, but you are transforming and changing them at the same time.

data4 <- starwars %>% 
  filter(height != "NA") %>% 
  transmute(name, height, mass, bmi = mass/((height/100)^2))
data4
# A tibble: 81 × 4
   name               height  mass   bmi
   <chr>               <int> <dbl> <dbl>
 1 Luke Skywalker        172    77  26.0
 2 C-3PO                 167    75  26.9
 3 R2-D2                  96    32  34.7
 4 Darth Vader           202   136  33.3
 5 Leia Organa           150    49  21.8
 6 Owen Lars             178   120  37.9
 7 Beru Whitesun lars    165    75  27.5
 8 R5-D4                  97    32  34.0
 9 Biggs Darklighter     183    84  25.1
10 Obi-Wan Kenobi        182    77  23.2
# ℹ 71 more rows

2.5 relocate()

调整列的排列顺序。

starwars %>% relocate(sex:homeworld, .before = height)
# A tibble: 87 × 14
   name      sex   gender homeworld height  mass hair_color skin_color eye_color
   <chr>     <chr> <chr>  <chr>      <int> <dbl> <chr>      <chr>      <chr>    
 1 Luke Sky… male  mascu… Tatooine     172    77 blond      fair       blue     
 2 C-3PO     none  mascu… Tatooine     167    75 <NA>       gold       yellow   
 3 R2-D2     none  mascu… Naboo         96    32 <NA>       white, bl… red      
 4 Darth Va… male  mascu… Tatooine     202   136 none       white      yellow   
 5 Leia Org… fema… femin… Alderaan     150    49 brown      light      brown    
 6 Owen Lars male  mascu… Tatooine     178   120 brown, gr… light      blue     
 7 Beru Whi… fema… femin… Tatooine     165    75 brown      light      blue     
 8 R5-D4     none  mascu… Tatooine      97    32 <NA>       white, red red      
 9 Biggs Da… male  mascu… Tatooine     183    84 black      light      brown    
10 Obi-Wan … male  mascu… Stewjon      182    77 auburn, w… fair       blue-gray
# ℹ 77 more rows
# ℹ 5 more variables: birth_year <dbl>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
starwars %>% 
  select(name, species, everything())
# A tibble: 87 × 14
   name    species height  mass hair_color skin_color eye_color birth_year sex  
   <chr>   <chr>    <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
 1 Luke S… Human      172    77 blond      fair       blue            19   male 
 2 C-3PO   Droid      167    75 <NA>       gold       yellow         112   none 
 3 R2-D2   Droid       96    32 <NA>       white, bl… red             33   none 
 4 Darth … Human      202   136 none       white      yellow          41.9 male 
 5 Leia O… Human      150    49 brown      light      brown           19   fema…
 6 Owen L… Human      178   120 brown, gr… light      blue            52   male 
 7 Beru W… Human      165    75 brown      light      blue            47   fema…
 8 R5-D4   Droid       97    32 <NA>       white, red red             NA   none 
 9 Biggs … Human      183    84 black      light      brown           24   male 
10 Obi-Wa… Human      182    77 auburn, w… fair       blue-gray       57   male 
# ℹ 77 more rows
# ℹ 5 more variables: gender <chr>, homeworld <chr>, films <list>,
#   vehicles <list>, starships <list>
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

2.6 if_else()

对变量值进行条件转换。

#加载包tidyr,drop_na()属于包tidyr
library(tidyr)
size_of_weight <- starwars_selected %>% 
  drop_na(mass) %>% 
  mutate(weight_size = if_else(mass >100, 
                               "large",
                               "small"))
size_of_weight
# A tibble: 59 × 7
   name               height  mass sex    species films     weight_size
   <chr>               <int> <dbl> <chr>  <chr>   <list>    <chr>      
 1 Luke Skywalker        172    77 male   Human   <chr [5]> small      
 2 C-3PO                 167    75 none   Droid   <chr [6]> small      
 3 R2-D2                  96    32 none   Droid   <chr [7]> small      
 4 Darth Vader           202   136 male   Human   <chr [4]> large      
 5 Leia Organa           150    49 female Human   <chr [5]> small      
 6 Owen Lars             178   120 male   Human   <chr [3]> large      
 7 Beru Whitesun lars    165    75 female Human   <chr [3]> small      
 8 R5-D4                  97    32 none   Droid   <chr [1]> small      
 9 Biggs Darklighter     183    84 male   Human   <chr [1]> small      
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]> small      
# ℹ 49 more rows

2.7 recode()

对变量值进行重新编码。

# overwrite the original variable.
size_of_weight %>% 
  mutate(weight_size = recode(weight_size,
                              "large" =1,
                              "small" =2))
# A tibble: 59 × 7
   name               height  mass sex    species films     weight_size
   <chr>               <int> <dbl> <chr>  <chr>   <list>          <dbl>
 1 Luke Skywalker        172    77 male   Human   <chr [5]>           2
 2 C-3PO                 167    75 none   Droid   <chr [6]>           2
 3 R2-D2                  96    32 none   Droid   <chr [7]>           2
 4 Darth Vader           202   136 male   Human   <chr [4]>           1
 5 Leia Organa           150    49 female Human   <chr [5]>           2
 6 Owen Lars             178   120 male   Human   <chr [3]>           1
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>           2
 8 R5-D4                  97    32 none   Droid   <chr [1]>           2
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>           2
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>           2
# ℹ 49 more rows
size_of_weight
# A tibble: 59 × 7
   name               height  mass sex    species films     weight_size
   <chr>               <int> <dbl> <chr>  <chr>   <list>    <chr>      
 1 Luke Skywalker        172    77 male   Human   <chr [5]> small      
 2 C-3PO                 167    75 none   Droid   <chr [6]> small      
 3 R2-D2                  96    32 none   Droid   <chr [7]> small      
 4 Darth Vader           202   136 male   Human   <chr [4]> large      
 5 Leia Organa           150    49 female Human   <chr [5]> small      
 6 Owen Lars             178   120 male   Human   <chr [3]> large      
 7 Beru Whitesun lars    165    75 female Human   <chr [3]> small      
 8 R5-D4                  97    32 none   Droid   <chr [1]> small      
 9 Biggs Darklighter     183    84 male   Human   <chr [1]> small      
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]> small      
# ℹ 49 more rows

2.8 duplicated()

# Create a Data Frame
df6 <- data.frame(x1 = c(1:5),
                    x2 = letters[1:5],
                    x3 = seq(2,10,2),
                    x1 = LETTERS[22:26],
                    check.names = FALSE)
df6
  x1 x2 x3 x1
1  1  a  2  V
2  2  b  4  W
3  3  c  6  X
4  4  d  8  Y
5  5  e 10  Z
# Find Duplicate Column Names
duplicated_names <- duplicated(colnames(df6))

# Remove Duplicate Column Names
df6[!duplicated_names]
  x1 x2 x3
1  1  a  2
2  2  b  4
3  3  c  6
4  4  d  8
5  5  e 10

2.9 as.factor()

将字符串转换为因子。

starwars_selected %>% 
  mutate(vore = as.factor(species)) %>% 
  glimpse()
Rows: 87
Columns: 7
$ name    <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organ…
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 228,…
$ mass    <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.0, …
$ sex     <chr> "male", "none", "none", "male", "female", "male", "female", "n…
$ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Human",…
$ films   <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return of…
$ vore    <fct> Human, Droid, Droid, Human, Human, Human, Human, Droid, Human,…

3 行的管理Rows

3.1 arrange()

将个案按某个变量排序。

#升序
starwars_selected %>% arrange(height)
# A tibble: 87 × 6
   name                  height  mass sex   species        films    
   <chr>                  <int> <dbl> <chr> <chr>          <list>   
 1 Yoda                      66    17 male  Yoda's species <chr [5]>
 2 Ratts Tyerell             79    15 male  Aleena         <chr [1]>
 3 Wicket Systri Warrick     88    20 male  Ewok           <chr [1]>
 4 Dud Bolt                  94    45 male  Vulptereen     <chr [1]>
 5 R2-D2                     96    32 none  Droid          <chr [7]>
 6 R4-P17                    96    NA none  Droid          <chr [2]>
 7 R5-D4                     97    32 none  Droid          <chr [1]>
 8 Sebulba                  112    40 male  Dug            <chr [1]>
 9 Gasgano                  122    NA male  Xexto          <chr [1]>
10 Watto                    137    NA male  Toydarian      <chr [2]>
# ℹ 77 more rows
#降序 desc()
starwars_selected %>% 
  arrange(desc(height))
# A tibble: 87 × 6
   name         height  mass sex    species  films    
   <chr>         <int> <dbl> <chr>  <chr>    <list>   
 1 Yarael Poof     264    NA male   Quermian <chr [1]>
 2 Tarfful         234   136 male   Wookiee  <chr [1]>
 3 Lama Su         229    88 male   Kaminoan <chr [1]>
 4 Chewbacca       228   112 male   Wookiee  <chr [5]>
 5 Roos Tarpals    224    82 male   Gungan   <chr [1]>
 6 Grievous        216   159 male   Kaleesh  <chr [1]>
 7 Taun We         213    NA female Kaminoan <chr [1]>
 8 Rugor Nass      206    NA male   Gungan   <chr [1]>
 9 Tion Medon      206    80 male   Pau'an   <chr [1]>
10 Darth Vader     202   136 male   Human    <chr [4]>
# ℹ 77 more rows
# 查看前6行
head(starwars_selected)
# A tibble: 6 × 6
  name           height  mass sex    species films    
  <chr>           <int> <dbl> <chr>  <chr>   <list>   
1 Luke Skywalker    172    77 male   Human   <chr [5]>
2 C-3PO             167    75 none   Droid   <chr [6]>
3 R2-D2              96    32 none   Droid   <chr [7]>
4 Darth Vader       202   136 male   Human   <chr [4]>
5 Leia Organa       150    49 female Human   <chr [5]>
6 Owen Lars         178   120 male   Human   <chr [3]>
# see a few values from all the columns
glimpse(starwars_selected)
Rows: 87
Columns: 6
$ name    <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organ…
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 228,…
$ mass    <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.0, …
$ sex     <chr> "male", "none", "none", "male", "female", "male", "female", "n…
$ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Human",…
$ films   <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return of…

3.2 filter()

提取满足条件的个案。

#列出变量的取值种类
unique(starwars$species)
 [1] "Human"          "Droid"          "Wookiee"        "Rodian"        
 [5] "Hutt"           "Yoda's species" "Trandoshan"     "Mon Calamari"  
 [9] "Ewok"           "Sullustan"      "Neimodian"      "Gungan"        
[13] NA               "Toydarian"      "Dug"            "Zabrak"        
[17] "Twi'lek"        "Vulptereen"     "Xexto"          "Toong"         
[21] "Cerean"         "Nautolan"       "Tholothian"     "Iktotchi"      
[25] "Quermian"       "Kel Dor"        "Chagrian"       "Geonosian"     
[29] "Mirialan"       "Clawdite"       "Besalisk"       "Kaminoan"      
[33] "Aleena"         "Skakoan"        "Muun"           "Togruta"       
[37] "Kaleesh"        "Pau'an"        
#提取species == "Droid"的个案
starwars_selected %>% 
  filter(species == "Droid") %>% 
  arrange(height) 
# A tibble: 6 × 6
  name   height  mass sex   species films    
  <chr>   <int> <dbl> <chr> <chr>   <list>   
1 R2-D2      96    32 none  Droid   <chr [7]>
2 R4-P17     96    NA none  Droid   <chr [2]>
3 R5-D4      97    32 none  Droid   <chr [1]>
4 C-3PO     167    75 none  Droid   <chr [6]>
5 IG-88     200   140 none  Droid   <chr [1]>
6 BB8        NA    NA none  Droid   <chr [1]>
#提取height < 200的个案
starwars_selected %>% 
  arrange(desc(height)) %>% 
  filter(height < 200)
# A tibble: 70 × 6
   name                height  mass sex   species   films    
   <chr>                <int> <dbl> <chr> <chr>     <list>   
 1 Ki-Adi-Mundi           198    82 male  Cerean    <chr [3]>
 2 Dexter Jettster        198   102 male  Besalisk  <chr [1]>
 3 Jar Jar Binks          196    66 male  Gungan    <chr [2]>
 4 Kit Fisto              196    87 male  Nautolan  <chr [3]>
 5 Mas Amedda             196    NA male  Chagrian  <chr [2]>
 6 Qui-Gon Jinn           193    89 male  Human     <chr [1]>
 7 Dooku                  193    80 male  Human     <chr [2]>
 8 Wat Tambor             193    48 male  Skakoan   <chr [1]>
 9 Nute Gunray            191    90 male  Neimodian <chr [3]>
10 Bail Prestor Organa    191    NA male  Human     <chr [2]>
# ℹ 60 more rows
#用逗号分隔多个提取条件
starwars_selected %>% 
    filter(height < 200, species == "Human")  %>% 
    arrange(desc(height))
# A tibble: 30 × 6
   name                height  mass sex   species films    
   <chr>                <int> <dbl> <chr> <chr>   <list>   
 1 Qui-Gon Jinn           193  89   male  Human   <chr [1]>
 2 Dooku                  193  80   male  Human   <chr [2]>
 3 Bail Prestor Organa    191  NA   male  Human   <chr [2]>
 4 Anakin Skywalker       188  84   male  Human   <chr [3]>
 5 Mace Windu             188  84   male  Human   <chr [3]>
 6 Raymus Antilles        188  79   male  Human   <chr [2]>
 7 Gregar Typho           185  85   male  Human   <chr [1]>
 8 Biggs Darklighter      183  84   male  Human   <chr [1]>
 9 Boba Fett              183  78.2 male  Human   <chr [3]>
10 Cliegg Lars            183  NA   male  Human   <chr [1]>
# ℹ 20 more rows
starwars_selected
# A tibble: 87 × 6
   name               height  mass sex    species films    
   <chr>               <int> <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172    77 male   Human   <chr [5]>
 2 C-3PO                 167    75 none   Droid   <chr [6]>
 3 R2-D2                  96    32 none   Droid   <chr [7]>
 4 Darth Vader           202   136 male   Human   <chr [4]>
 5 Leia Organa           150    49 female Human   <chr [5]>
 6 Owen Lars             178   120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>
 8 R5-D4                  97    32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>
# ℹ 77 more rows
starwars_selected %>% 
    filter((species == "Human"|species == "Droid") & 
         height < 200 ) %>% 
    arrange(desc(height))
# A tibble: 34 × 6
   name                height  mass sex   species films    
   <chr>                <int> <dbl> <chr> <chr>   <list>   
 1 Qui-Gon Jinn           193  89   male  Human   <chr [1]>
 2 Dooku                  193  80   male  Human   <chr [2]>
 3 Bail Prestor Organa    191  NA   male  Human   <chr [2]>
 4 Anakin Skywalker       188  84   male  Human   <chr [3]>
 5 Mace Windu             188  84   male  Human   <chr [3]>
 6 Raymus Antilles        188  79   male  Human   <chr [2]>
 7 Gregar Typho           185  85   male  Human   <chr [1]>
 8 Biggs Darklighter      183  84   male  Human   <chr [1]>
 9 Boba Fett              183  78.2 male  Human   <chr [3]>
10 Cliegg Lars            183  NA   male  Human   <chr [1]>
# ℹ 24 more rows
starwars_selected
# A tibble: 87 × 6
   name               height  mass sex    species films    
   <chr>               <int> <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172    77 male   Human   <chr [5]>
 2 C-3PO                 167    75 none   Droid   <chr [6]>
 3 R2-D2                  96    32 none   Droid   <chr [7]>
 4 Darth Vader           202   136 male   Human   <chr [4]>
 5 Leia Organa           150    49 female Human   <chr [5]>
 6 Owen Lars             178   120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>
 8 R5-D4                  97    32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>
# ℹ 77 more rows
#%in% 属于
starwars_selected %>% 
    filter((species %in% c("Human","Droid")) & 
         height < 200 ) %>% 
    arrange(desc(height)) 
# A tibble: 34 × 6
   name                height  mass sex   species films    
   <chr>                <int> <dbl> <chr> <chr>   <list>   
 1 Qui-Gon Jinn           193  89   male  Human   <chr [1]>
 2 Dooku                  193  80   male  Human   <chr [2]>
 3 Bail Prestor Organa    191  NA   male  Human   <chr [2]>
 4 Anakin Skywalker       188  84   male  Human   <chr [3]>
 5 Mace Windu             188  84   male  Human   <chr [3]>
 6 Raymus Antilles        188  79   male  Human   <chr [2]>
 7 Gregar Typho           185  85   male  Human   <chr [1]>
 8 Biggs Darklighter      183  84   male  Human   <chr [1]>
 9 Boba Fett              183  78.2 male  Human   <chr [3]>
10 Cliegg Lars            183  NA   male  Human   <chr [1]>
# ℹ 24 more rows
starwars_selected
# A tibble: 87 × 6
   name               height  mass sex    species films    
   <chr>               <int> <dbl> <chr>  <chr>   <list>   
 1 Luke Skywalker        172    77 male   Human   <chr [5]>
 2 C-3PO                 167    75 none   Droid   <chr [6]>
 3 R2-D2                  96    32 none   Droid   <chr [7]>
 4 Darth Vader           202   136 male   Human   <chr [4]>
 5 Leia Organa           150    49 female Human   <chr [5]>
 6 Owen Lars             178   120 male   Human   <chr [3]>
 7 Beru Whitesun lars    165    75 female Human   <chr [3]>
 8 R5-D4                  97    32 none   Droid   <chr [1]>
 9 Biggs Darklighter     183    84 male   Human   <chr [1]>
10 Obi-Wan Kenobi        182    77 male   Human   <chr [6]>
# ℹ 77 more rows
#filter(!is.na(height))保留height没有缺失值的个案
starwars_selected %>% 
  arrange(desc(height)) %>% 
  filter(!is.na(height))
# A tibble: 81 × 6
   name         height  mass sex    species  films    
   <chr>         <int> <dbl> <chr>  <chr>    <list>   
 1 Yarael Poof     264    NA male   Quermian <chr [1]>
 2 Tarfful         234   136 male   Wookiee  <chr [1]>
 3 Lama Su         229    88 male   Kaminoan <chr [1]>
 4 Chewbacca       228   112 male   Wookiee  <chr [5]>
 5 Roos Tarpals    224    82 male   Gungan   <chr [1]>
 6 Grievous        216   159 male   Kaleesh  <chr [1]>
 7 Taun We         213    NA female Kaminoan <chr [1]>
 8 Rugor Nass      206    NA male   Gungan   <chr [1]>
 9 Tion Medon      206    80 male   Pau'an   <chr [1]>
10 Darth Vader     202   136 male   Human    <chr [4]>
# ℹ 71 more rows

3.3 slice()

slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows.

slice_min() and slice_max() select rows with highest or lowest values of a variable.

#提取第5到第10行
starwars %>% slice(5:10)
# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Leia Org…    150    49 brown      light      brown             19 fema… femin…
2 Owen Lars    178   120 brown, gr… light      blue              52 male  mascu…
3 Beru Whi…    165    75 brown      light      blue              47 fema… femin…
4 R5-D4         97    32 <NA>       white, red red               NA none  mascu…
5 Biggs Da…    183    84 black      light      brown             24 male  mascu…
6 Obi-Wan …    182    77 auburn, w… fair       blue-gray         57 male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
#提取前3行
starwars %>% slice_head(n = 3)
# A tibble: 3 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sky…    172    77 blond      fair       blue              19 male  mascu…
2 C-3PO        167    75 <NA>       gold       yellow           112 none  mascu…
3 R2-D2         96    32 <NA>       white, bl… red               33 none  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
#提取后3行
starwars %>% slice_tail(n = 3)
# A tibble: 3 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 BB8           NA    NA none       none       black             NA none  mascu…
2 Captain …     NA    NA unknown    unknown    unknown           NA <NA>  <NA>  
3 Padmé Am…    165    45 brown      light      brown             46 fema… femin…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
#随机抽样3行
starwars %>% slice_sample(n = 5)
# A tibble: 5 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Cordé        157    NA brown      light      brown             NA fema… femin…
2 Dud Bolt      94    45 none       blue, grey yellow            NA male  mascu…
3 Dormé        165    NA brown      light      brown             NA fema… femin…
4 Ayla Sec…    178    55 none       blue       hazel             48 fema… femin…
5 Captain …     NA    NA unknown    unknown    unknown           NA <NA>  <NA>  
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
#随机抽样比例0.1
starwars %>% slice_sample(prop = 0.1)
# A tibble: 8 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Tion Med…    206    80 none       grey       black             NA male  mascu…
2 Finis Va…    170    NA blond      fair       blue              91 male  mascu…
3 Nute Gun…    191    90 none       mottled g… red               NA male  mascu…
4 San Hill     191    NA none       grey       gold              NA male  mascu…
5 Yoda          66    17 white      green      brown            896 male  mascu…
6 Gregar T…    185    85 black      dark       brown             NA male  mascu…
7 Finn          NA    NA black      dark       dark              NA male  mascu…
8 Dooku        193    80 white      fair       brown            102 male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
#切取某个变量最大或最小的几个个案
starwars %>%
  filter(!is.na(height)) %>%
  slice_max(height, n = 3)
# A tibble: 3 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Yarael P…    264    NA none       white      yellow            NA male  mascu…
2 Tarfful      234   136 brown      brown      blue              NA male  mascu…
3 Lama Su      229    88 none       grey       black             NA male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

3.4 distinct()

剔除重复的行。

## 剔除所有列的值都相同的行
df4 <- data.frame(ID =c(1,2,3,4,2),
                  name = c("John","Bill","Jim","Kate", "Bill"))
df4
  ID name
1  1 John
2  2 Bill
3  3  Jim
4  4 Kate
5  2 Bill
df4 <- distinct(df4)
df4
  ID name
1  1 John
2  2 Bill
3  3  Jim
4  4 Kate
## 剔除某一列的值相同的行
df5 <- data.frame(ID =c(1,2,3,4,2),
                  name = c("John","Bill","Jim","Kate", "Alex"))
df5
  ID name
1  1 John
2  2 Bill
3  3  Jim
4  4 Kate
5  2 Alex
df5 <- distinct(df5, ID,.keep_all = TRUE)
df5
  ID name
1  1 John
2  2 Bill
3  3  Jim
4  4 Kate
unique(df4)
  ID name
1  1 John
2  2 Bill
3  3  Jim
4  4 Kate

4 分组和概要 groupby and summarize

4.1 count()

The simple way you can aggregate data is to count it: to find out the number of observations.

starwars %>% 
  count()
# A tibble: 1 × 1
      n
  <int>
1    87
#按性别计数
starwars %>% 
  count(sex)
# A tibble: 5 × 2
  sex                n
  <chr>          <int>
1 female            16
2 hermaphroditic     1
3 male              60
4 none               6
5 <NA>               4
#计数和排序
starwars %>% 
  count(sex, sort = TRUE)
# A tibble: 5 × 2
  sex                n
  <chr>          <int>
1 male              60
2 female            16
3 none               6
4 <NA>               4
5 hermaphroditic     1
#按性别分组,各组个案的体重之和
starwars %>% 
  count(sex, wt = mass, sort = TRUE)
# A tibble: 5 × 2
  sex                n
  <chr>          <dbl>
1 male           3564.
2 hermaphroditic 1358 
3 female          492.
4 none            279 
5 <NA>             48 

4.2 group_by()

对个案分组。

4.3 summarize()

summarise() and summarize() are synonyms.

summary functions

  • sum()
  • mean()
  • median()
  • min()
  • max()
  • n()
starwars_selected %>% 
  filter(height != "NA", mass != "NA", sex != "NA") %>% 
  group_by(sex) %>% 
  summarize(number = n(), median_height = median(height), 
            median_weight = median(mass)) %>% 
  arrange(desc(number))
# A tibble: 4 × 4
  sex            number median_height median_weight
  <chr>           <int>         <dbl>         <dbl>
1 male               44           183          80  
2 female              9           168          55  
3 none                4           132          53.5
4 hermaphroditic      1           175        1358  

4.4 top_n()

starwars_selected %>% 
  filter(height != "NA", mass != "NA", sex != "NA") %>% 
  group_by(sex) %>% 
  top_n(1,height)
# A tibble: 4 × 6
# Groups:   sex [4]
  name                  height  mass sex            species    films    
  <chr>                  <int> <dbl> <chr>          <chr>      <list>   
1 Jabba Desilijic Tiure    175  1358 hermaphroditic Hutt       <chr [3]>
2 IG-88                    200   140 none           Droid      <chr [1]>
3 Adi Gallia               184    50 female         Tholothian <chr [2]>
4 Tarfful                  234   136 male           Wookiee    <chr [1]>
starwars_selected %>% 
  filter(height != "NA", mass != "NA", sex != "NA") %>% 
  group_by(sex) %>% 
  top_n(2,mass) %>% 
  arrange(sex)
# A tibble: 8 × 6
# Groups:   sex [4]
  name                  height  mass sex            species films    
  <chr>                  <int> <dbl> <chr>          <chr>   <list>   
1 Beru Whitesun lars       165    75 female         Human   <chr [3]>
2 Shaak Ti                 178    57 female         Togruta <chr [2]>
3 Jabba Desilijic Tiure    175  1358 hermaphroditic Hutt    <chr [3]>
4 Darth Vader              202   136 male           Human   <chr [4]>
5 Grievous                 216   159 male           Kaleesh <chr [1]>
6 Tarfful                  234   136 male           Wookiee <chr [1]>
7 C-3PO                    167    75 none           Droid   <chr [6]>
8 IG-88                    200   140 none           Droid   <chr [1]>

4.5 缺失值 Missing data

#有missing data, 无法计算均值
mean(starwars$height)
[1] NA
mean(starwars$height,na.rm = TRUE)
[1] 174.358
starwars %>% 
  select(name, gender, hair_color, height) %>% 
  na.omit()
# A tibble: 73 × 4
   name               gender    hair_color    height
   <chr>              <chr>     <chr>          <int>
 1 Luke Skywalker     masculine blond            172
 2 Darth Vader        masculine none             202
 3 Leia Organa        feminine  brown            150
 4 Owen Lars          masculine brown, grey      178
 5 Beru Whitesun lars feminine  brown            165
 6 Biggs Darklighter  masculine black            183
 7 Obi-Wan Kenobi     masculine auburn, white    182
 8 Anakin Skywalker   masculine blond            188
 9 Wilhuff Tarkin     masculine auburn, grey     180
10 Chewbacca          masculine brown            228
# ℹ 63 more rows
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
# 查看有缺失值的个案
starwars %>% 
  select(name, gender, hair_color, height) %>% 
  filter(!complete.cases(.)) 
# A tibble: 14 × 4
   name                  gender    hair_color height
   <chr>                 <chr>     <chr>       <int>
 1 C-3PO                 masculine <NA>          167
 2 R2-D2                 masculine <NA>           96
 3 R5-D4                 masculine <NA>           97
 4 Greedo                masculine <NA>          173
 5 Jabba Desilijic Tiure masculine <NA>          175
 6 Arvel Crynyd          masculine brown          NA
 7 Ric Olié              <NA>      brown         183
 8 Quarsh Panaka         <NA>      black         183
 9 Sly Moore             <NA>      none          178
10 Finn                  masculine black          NA
11 Rey                   feminine  brown          NA
12 Poe Dameron           masculine brown          NA
13 BB8                   masculine none           NA
14 Captain Phasma        <NA>      unknown        NA
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
starwars %>% 
  select(name, gender, hair_color, height) %>% 
  filter(!complete.cases(.)) %>% 
  drop_na(height)
# A tibble: 8 × 4
  name                  gender    hair_color height
  <chr>                 <chr>     <chr>       <int>
1 C-3PO                 masculine <NA>          167
2 R2-D2                 masculine <NA>           96
3 R5-D4                 masculine <NA>           97
4 Greedo                masculine <NA>          173
5 Jabba Desilijic Tiure masculine <NA>          175
6 Ric Olié              <NA>      brown         183
7 Quarsh Panaka         <NA>      black         183
8 Sly Moore             <NA>      none          178
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
starwars %>% 
  select(name, gender, hair_color, height) %>% 
  filter(complete.cases(.)) %>% 
  mutate(hair_color2 = replace_na(hair_color, "none"))
# A tibble: 73 × 5
   name               gender    hair_color    height hair_color2  
   <chr>              <chr>     <chr>          <int> <chr>        
 1 Luke Skywalker     masculine blond            172 blond        
 2 Darth Vader        masculine none             202 none         
 3 Leia Organa        feminine  brown            150 brown        
 4 Owen Lars          masculine brown, grey      178 brown, grey  
 5 Beru Whitesun lars feminine  brown            165 brown        
 6 Biggs Darklighter  masculine black            183 black        
 7 Obi-Wan Kenobi     masculine auburn, white    182 auburn, white
 8 Anakin Skywalker   masculine blond            188 blond        
 9 Wilhuff Tarkin     masculine auburn, grey     180 auburn, grey 
10 Chewbacca          masculine brown            228 brown        
# ℹ 63 more rows
starwars         
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

5 总结

These five functions provide the basis of a language of data manipulation.

5.1 本章作业

数据集:mpg {ggplot2}

  1. 用select()函数从mpg提取5个变量⽣成新的数据框。

  2. 用mutate()函数在数据框中追加新的变量,将油耗变量cty和hwy(miles per gallon)转换成转公里/升(kilometers per liter)的油耗指标。

  3. 任选mpg中的某个变量,用if_else()函数对该变量的数值进⾏条件转换。

  4. 设置三个筛选条件,用filter()函数从mpg筛选个案⽣成新的数据框。

  5. 提交要求:R script