ODD Retail Sales - 2016 -February

Ozgur Ozdemir - November 2018

I studied the February 2016 ODD data. Here is the link that i get data.

Downloading Raw Data

First, I downloaded the tidyverse package.

library(tidyverse)
## -- Attaching packages ---------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Create a temporary file

tmp<-tempfile(fileext="2016subat.xlsx")

Download file from repository to the temp file. In addition, to avoid getting error when loading URL, I added "mode = ‘wb’ to the code.

download.file("https://github.com/MEF-BDA503/pj18-ozdemiroz/blob/master/2016subat.xlsx?raw=true", mode = 'wb',destfile=tmp)

Read that excel file using readxl package’s read_excel function.

raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)

Remove the temp file

file.remove(tmp)
## [1] TRUE

Remove the last two rows because they are irrelevant (total and empty rows).

raw_data <- raw_data %>% slice(-c(49,50))

Let’s see our raw data

head(raw_data)
## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    75    75    NA    NA     0     0    75    75
## 2 ASTON MARTIN    NA     1     1    NA    NA     0     0     1     1
## 3 AUDI            NA  1200  1200    NA    NA     0     0  1200  1200
## 4 BENTLEY         NA     1     1    NA    NA     0     0     1     1
## 5 BMW             NA  2042  2042    NA    NA     0     0  2042  2042
## 6 CHERY           NA    23    23    NA    NA     0     0    23    23

Making Data Civilized

In order to make the data standardized and workable we need to define column names and remove NA values for this example.

Use the same column names in data.

colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")

Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won’t be confused.

car_data_feb_16 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2016,month=2)
print(car_data_feb_16,width=Inf)
## # A tibble: 48 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       75         75        0        0          0
##  2 ASTON MARTIN        0        1          1        0        0          0
##  3 AUDI                0     1200       1200        0        0          0
##  4 BENTLEY             0        1          1        0        0          0
##  5 BMW                 0     2042       2042        0        0          0
##  6 CHERY               0       23         23        0        0          0
##  7 CITROEN             0      366        366       40      213        253
##  8 DACIA               0     1963       1963        0      228        228
##  9 DS                  0       34         34        0        0          0
## 10 FERRARI             0        1          1        0        0          0
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        75          75  2016     2
##  2         0         1           1  2016     2
##  3         0      1200        1200  2016     2
##  4         0         1           1  2016     2
##  5         0      2042        2042  2016     2
##  6         0        23          23  2016     2
##  7        40       579         619  2016     2
##  8         0      2191        2191  2016     2
##  9         0        34          34  2016     2
## 10         0         1           1  2016     2
## # ... with 38 more rows

Saving Civilized Data

saveRDS(car_data_feb_16,file="~/Desktop/odd_car_sales_data_feb_16.rds")

Finishing With Some Analysis

I wanted to see a list of total sales of brands with domestic and imported vehicle sales ordered in decreasing total sales.

car_data_feb_16 %>% 
  select(brand_name,total_dom,total_imp,total_total) %>%
  filter(total_dom > 0 & total_imp > 0) %>%
  arrange(desc(total_total))
## # A tibble: 10 x 4
##    brand_name total_dom total_imp total_total
##    <chr>          <dbl>     <dbl>       <dbl>
##  1 FORD            4018      2922        6940
##  2 RENAULT         3553      2609        6162
##  3 FIAT            4881       621        5502
##  4 HYUNDAI         1485      1616        3101
##  5 TOYOTA          1735       766        2501
##  6 PEUGEOT           91      2014        2105
##  7 HONDA            510       669        1179
##  8 CITROEN           40       579         619
##  9 MITSUBISHI        35       222         257
## 10 ISUZU            204        34         238