Introduction

This part of the exercise explains the basics of ggplot2. See the first part for preparations. We start directly from the data set.

travel_weather %>%
    tbl_df()
## # A tibble: 731 x 7
##     year month   day Amsterdam London   NYC Venice
##  * <dbl> <dbl> <dbl>     <dbl>  <dbl> <dbl>  <dbl>
##  1  2015    11     1         8      8    16     13
##  2  2015    11     2        10     11    15     10
##  3  2015    11     3         9     11    16      9
##  4  2015    11     4        12     11    17     10
##  5  2015    11     5        13     13    18     12
##  6  2015    11     6        16     14    21     13
##  7  2015    11     7        16     14    17     14
##  8  2015    11     8        12     12    11     13
##  9  2015    11     9        13     12    11     11
## 10  2015    11    10        14     14    12     11
## # ... with 721 more rows

You are used to pipe operator (%>%) for dplyr. While dplyr provides a chain of events, the method of thinking in ggplot2 is similar to putting layers on top of each other, starting with a canvas. Therefore we use + operator to connect the functions.

Scatterplot

Scatterplot is the first chart we are going to learn, along with the basics of ggplot2 anatomy. We start with the canvas function ggplot. aes is the aesthetics part where we define x and y axes along with other grouping variables (e.g. color, fill, alpha, shape, size). Once we set the data and aesthetics we declare the plot type we would like to show. For scatterplot the function is geom_point.

ggplot(data = travel_weather, aes(x = Venice, y = London)) + 
    geom_point()

By the way, ggplot2 is quite flexible in terms of placements of elements. Though, use them responsibly. Below is the same chart as above with different representation and some mixing with dplyr.

travel_weather %>% ggplot() + geom_point(aes(x = Venice, y = London))

Let’s color this chart a bit. If both Venice and London are warmer than Amsterdam, let’s show it with some color.

travel_weather %>% mutate(VL_warmer_A = pmin(Venice, London) >= 
    Amsterdam)
## # A tibble: 731 x 8
##     year month   day Amsterdam London   NYC Venice VL_warmer_A
##    <dbl> <dbl> <dbl>     <dbl>  <dbl> <dbl>  <dbl> <lgl>      
##  1  2015    11     1         8      8    16     13 TRUE       
##  2  2015    11     2        10     11    15     10 TRUE       
##  3  2015    11     3         9     11    16      9 TRUE       
##  4  2015    11     4        12     11    17     10 FALSE      
##  5  2015    11     5        13     13    18     12 FALSE      
##  6  2015    11     6        16     14    21     13 FALSE      
##  7  2015    11     7        16     14    17     14 FALSE      
##  8  2015    11     8        12     12    11     13 TRUE       
##  9  2015    11     9        13     12    11     11 FALSE      
## 10  2015    11    10        14     14    12     11 FALSE      
## # ... with 721 more rows
travel_weather %>% mutate(VL_warmer_A = pmin(Venice, London) >= 
    Amsterdam) %>% ggplot() + geom_point(aes(x = Venice, y = London, 
    color = VL_warmer_A))