Introduction

This part of the exercise explains the basics of ggplot2. See the first part for preparations. We start directly from the data set.

travel_weather %>%
    tbl_df()
## # A tibble: 731 x 7
##     year month   day Amsterdam London   NYC Venice
##  * <dbl> <dbl> <dbl>     <dbl>  <dbl> <dbl>  <dbl>
##  1  2015    11     1         8      8    16     13
##  2  2015    11     2        10     11    15     10
##  3  2015    11     3         9     11    16      9
##  4  2015    11     4        12     11    17     10
##  5  2015    11     5        13     13    18     12
##  6  2015    11     6        16     14    21     13
##  7  2015    11     7        16     14    17     14
##  8  2015    11     8        12     12    11     13
##  9  2015    11     9        13     12    11     11
## 10  2015    11    10        14     14    12     11
## # ... with 721 more rows

You are used to pipe operator (%>%) for dplyr. While dplyr provides a chain of events, the method of thinking in ggplot2 is similar to putting layers on top of each other, starting with a canvas. Therefore we use + operator to connect the functions.

Scatterplot

Scatterplot is the first chart we are going to learn, along with the basics of ggplot2 anatomy. We start with the canvas function ggplot. aes is the aesthetics part where we define x and y axes along with other grouping variables (e.g. color, fill, alpha, shape, size). Once we set the data and aesthetics we declare the plot type we would like to show. For scatterplot the function is geom_point.

ggplot(data = travel_weather, aes(x = Venice, y = London)) + 
    geom_point()

By the way, ggplot2 is quite flexible in terms of placements of elements. Though, use them responsibly. Below is the same chart as above with different representation and some mixing with dplyr.

travel_weather %>% ggplot() + geom_point(aes(x = Venice, y = London))

Let’s color this chart a bit. If both Venice and London are warmer than Amsterdam, let’s show it with some color.

travel_weather %>% mutate(VL_warmer_A = pmin(Venice, London) >= 
    Amsterdam)
## # A tibble: 731 x 8
##     year month   day Amsterdam London   NYC Venice VL_warmer_A
##    <dbl> <dbl> <dbl>     <dbl>  <dbl> <dbl>  <dbl> <lgl>      
##  1  2015    11     1         8      8    16     13 TRUE       
##  2  2015    11     2        10     11    15     10 TRUE       
##  3  2015    11     3         9     11    16      9 TRUE       
##  4  2015    11     4        12     11    17     10 FALSE      
##  5  2015    11     5        13     13    18     12 FALSE      
##  6  2015    11     6        16     14    21     13 FALSE      
##  7  2015    11     7        16     14    17     14 FALSE      
##  8  2015    11     8        12     12    11     13 TRUE       
##  9  2015    11     9        13     12    11     11 FALSE      
## 10  2015    11    10        14     14    12     11 FALSE      
## # ... with 721 more rows
travel_weather %>% mutate(VL_warmer_A = pmin(Venice, London) >= 
    Amsterdam) %>% ggplot() + geom_point(aes(x = Venice, y = London, 
    color = VL_warmer_A))

Refer to ggplot2 tutorials and cheat sheets for more “tricks”.

Line Chart

Unsurprisingly we are going to use geom_line here.

ggplot(data = travel_weather, aes(x = 1:nrow(travel_weather), 
    y = Venice)) + geom_line()

Let’s make it more beautiful (kinda).

travel_weather %>% rowwise() %>% mutate(date = lubridate::as_date(paste(year, 
    as.integer(month), as.integer(day), sep = "-"))) %>% ungroup() %>% 
    select(-(year:day)) %>% gather(key = City, value = Temperature, 
    -date) %>% ggplot(data = ., aes(x = date, y = Temperature, 
    color = City)) + geom_line()

Bar Chart

Suppose you want to compare average temperatures of each city.

travel_weather %>% select(Amsterdam:Venice) %>% gather(key = City, 
    value = Temperature) %>% group_by(City) %>% summarise(avg_temp = mean(Temperature, 
    na.rm = T)) %>% ggplot(data = ., aes(x = City, y = avg_temp, 
    fill = City)) + geom_bar(stat = "identity")

Modifications

It is possible to store ggplot2 in objects, modify the axes and other stuff easily. Continuing from the last example.

my_plot <- travel_weather %>% select(Amsterdam:Venice) %>% gather(key = City, 
    value = Temperature) %>% group_by(City) %>% summarise(avg_temp = mean(Temperature, 
    na.rm = T)) %>% ggplot(data = ., aes(x = City, y = avg_temp, 
    fill = City)) + geom_bar(stat = "identity")

We store the whole plot in my_plot. Now, let’s change the scene a bit.

my_plot + labs(x = "", y = "Average Temperature (Celsius)", title = "Average Temperature of Cities Between 2015-2017") + 
    theme_bw() + theme(legend.position = "none", axis.text.x = element_text(angle = 45, 
    vjust = 0.5, hjust = 0.5, size = 12))

It is possible to do much more with ggplot2. It is left to your imagination and expertise.