Selected Solutions to Exercises in R for Data Science

R for Data Science (R4DS) is an excellent book about doing data science with R. Here are some solutions I came up with while reading the book.

library(tidyverse)
library(nycflights13)

3.6.1.6 “Recreate the R code necessary to generate the following graphs”: the last graph

ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point(color = "white", size = 4) +
  geom_point(aes(color = drv))

5.3.1.1 “How could you use arrange() to sort all missing values to the start? (Hint: use is.na())”:

df <- tibble(x = c(5, 2, NA))
arrange(df, !is.na(x), x)
## # A tibble: 3 x 1
##       x
##   <dbl>
## 1    NA
## 2     2
## 3     5

5.6.7.2

not_cancelled <- flights %>% 
  filter(!is.na(dep_delay), !is.na(arr_delay))
not_cancelled %>% count(tailnum, wt = distance)
## # A tibble: 4,037 x 2
##    tailnum      n
##      <chr>  <dbl>
##  1  D942DN   3418
##  2  N0EGMQ 239143
##  3  N10156 109664
##  4  N102UW  25722
##  5  N103US  24619
##  6  N104UW  24616
##  7  N10575 139903
##  8  N105UW  23618
##  9  N107US  21677
## 10  N108UW  32070
## # ... with 4,027 more rows
not_cancelled %>% group_by(tailnum) %>% summarize(n = sum(distance))
## # A tibble: 4,037 x 2
##    tailnum      n
##      <chr>  <dbl>
##  1  D942DN   3418
##  2  N0EGMQ 239143
##  3  N10156 109664
##  4  N102UW  25722
##  5  N103US  24619
##  6  N104UW  24616
##  7  N10575 139903
##  8  N105UW  23618
##  9  N107US  21677
## 10  N108UW  32070
## # ... with 4,027 more rows