ggplot2

Quick Introduction

Berk Orbay

MEF BDA 503

What is dplyr?

ggplot2 is an R package specialized on data visualization. It is both a companion to the dplyr data manipulation package and a whole library by itself. Within a well defined framework, you can

  • easily draw a large number plot types with the same methodology
  • draw sophisticated plots
  • multiple well adjusted plots (e.g, 2x2)
  • do advanced theming and custom theming
  • parametric plots

 

Official page: https://ggplot2.tidyverse.org/

Why ggplot2?

ggplot2 has the same UX in mind with dplyr and is being developed by the same person (Hadley Wickham) and their team.

  • Based on Grammar of Graphics (hence the “gg”).
  • Start with a “canvas” (ggplot()) and apply data (aes(x,y,color)) and layers (geom_*) on it
  • Add labels, annotations, titles, themes easily.

Fundamentals

Important Full notes are on Book of EDA


##  If you have never installed ggplot2 before
##  install.packages("tidyverse")

##  First load the library
##  We will also use dplyr therefore tidyverse is better
library(tidyverse)

## We are going to use starwars datasets to show examples
starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

Let’s recall starwars tibble from dplyr.

Scatter plot geom_point

select simply displays columns

ggplot(starwars %>% select(name, height, mass) %>% filter(mass < 1000) %>% filter(complete.cases(.)), aes(x = height, y = mass)) +
    geom_point()

We use + instead of the pipe operator (%>%) to connect layers.

Scatter plot (cont.) geom_point

Let’s add some categorization with color.

ggplot(starwars %>% select(name, species, height, mass) %>% filter(mass < 1000) %>% filter(complete.cases(.)), aes(x = height, y = mass)) +
    geom_point(aes(color = species))

Scatter plot (cont.) geom_point

Let’s also add some labels and themes.

ggplot(starwars %>% select(name, species, height, mass) %>% filter(mass < 1000) %>% filter(complete.cases(.)), aes(x = height, y = mass)) +
    geom_point(aes(color = species)) +
    labs(title = "Star Wars Characters by Height and Mass", subtitle = "Colors by species", x = "Height (in meters)", y = "Mass (in KG)", color = "Species") +
    theme_minimal() +
    theme(legend.position = "bottom")

Line plot geom_line

Let’s use another data for line plot, EUStockMarkets data from base R.

stock_df <- as_tibble(EuStockMarkets) %>% mutate(date = lubridate::as_date("1991-01-01") + 1:nrow(.), .before = everything())
print(stock_df)
# A tibble: 1,860 × 5
   date         DAX   SMI   CAC  FTSE
   <date>     <dbl> <dbl> <dbl> <dbl>
 1 1991-01-02 1629. 1678. 1773. 2444.
 2 1991-01-03 1614. 1688. 1750. 2460.
 3 1991-01-04 1607. 1679. 1718  2448.
 4 1991-01-05 1621. 1684. 1708. 2470.
 5 1991-01-06 1618. 1687. 1723. 2485.
 6 1991-01-07 1611. 1672. 1714. 2467.
 7 1991-01-08 1631. 1683. 1734. 2488.
 8 1991-01-09 1640. 1704. 1757. 2508.
 9 1991-01-10 1635. 1698. 1754  2510.
10 1991-01-11 1646. 1716. 1754. 2497.
# ℹ 1,850 more rows

Let’s also make it more “plotable”.

stock_df_long <- stock_df %>% pivot_longer(-date,names_to="symbol",values_to="close")
print(stock_df_long)
# A tibble: 7,440 × 3
   date       symbol close
   <date>     <chr>  <dbl>
 1 1991-01-02 DAX    1629.
 2 1991-01-02 SMI    1678.
 3 1991-01-02 CAC    1773.
 4 1991-01-02 FTSE   2444.
 5 1991-01-03 DAX    1614.
 6 1991-01-03 SMI    1688.
 7 1991-01-03 CAC    1750.
 8 1991-01-03 FTSE   2460.
 9 1991-01-04 DAX    1607.
10 1991-01-04 SMI    1679.
# ℹ 7,430 more rows

Line plot (cont.) geom_line

Let’s generate a plot to compare all four stock market indices.

ggplot(stock_df_long,aes(x=date,y=close,color=symbol)) + geom_line()

Line plot (cont.) geom_line

Let’s theme it up a bit with the dark theme.

ggplot(stock_df_long,aes(x=date,y=close,color=symbol)) + 
    geom_line() + 
    labs(x="Date",y="Index Level at Close",
    title="Comparison of Stock Market Indices",subtitle="Period between 1991-1996",color="Index") + 
    theme_dark()

Bar plot geom_bar

Back to starwars. Let’s see the frequency of “eye color” of characters and draw a bar plot.

eye_color_df <- starwars %>% count(eye_color)
eye_color_df
# A tibble: 15 × 2
   eye_color         n
   <chr>         <int>
 1 black            10
 2 blue             19
 3 blue-gray         1
 4 brown            21
 5 dark              1
 6 gold              1
 7 green, yellow     1
 8 hazel             3
 9 orange            8
10 pink              1
11 red               5
12 red, blue         1
13 unknown           3
14 white             1
15 yellow           11
ggplot(eye_color_df,aes(x=eye_color,y=n)) + geom_bar(stat="identity")

Bar plot (cont.) geom_bar

Let’s make it more detailed with whether the eye color belongs to a human or not and sort by frequency. Notice how we

eye_color_species_df <- starwars %>% mutate(is_human=(species == "Human" & !is.na(species))) %>% group_by(is_human) %>% count(eye_color) %>% ungroup()
ggplot(eye_color_species_df,
    aes(x=reorder(eye_color,-n,function(x){sum(x)}),y=n,fill=is_human)) + 
    geom_bar(stat="identity",position="stack") + labs(title="Star Wars Characters by Eye Color",x="Eye Color",y="Number of Characters",fill="is human?")

Pie chart geom_bar + coord_polar

Make some minor modifications (move x to fill) and add coord_polar; now you get a pie chart.

ggplot(eye_color_species_df,
    aes(x="",y=n,fill=reorder(eye_color,n,function(x){sum(x)}))) + 
    geom_bar(stat="identity",position="stack") + 
    labs(title="Star Wars Characters by Eye Color",x="Eye Color",y="Number of Characters",fill="is human?") + 
    coord_polar("y") + theme_minimal()

Advanced ggplot2

There is always more to learn. Here are some examples.

  • There are much more plot types see the cheat sheet
  • facet_grid for multiple plots (see).
  • scale_* functions for fine tuning labels, axes etc.
  • It is possible to add custom themes (see ggthemes library)

See extra resources section on Book of EDA.

Thanks!

Course webpage https://mef-bda503.github.io/.