Car Sales Data Import and Analyze

In week-2 we studied with tidyverse package and some functions in this package. This homework is about it. After we downloaded data from ODD official website, we changed the name to ODD_Retail_Sales_201701.xlsx. We will make some example from raw data to final analysis below.

Download Raw Data

Our raw excel file is in our repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.

# Download file from repository to the temp file
# Remove the temp file
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/pj18-ucarsal/blob/master/week2/odd_retail_sales_2017_01.xlsx?raw=true",destfile=tmp,mode = 'wb')
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
file.remove(tmp)
## [1] TRUE
head(raw_data)
## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    17    17    NA    NA     0     0    17    17
## 2 ASTON MARTIN    NA     1     1    NA    NA     0     0     1     1
## 3 AUDI            NA   625   625    NA    NA     0     0   625   625
## 4 BENTLEY         NA     1     1    NA    NA     0     0     1     1
## 5 BMW             NA   570   570    NA    NA     0     0   570   570
## 6 CHERY           NA     0     0    NA    NA     0     0     0     0

It’s ok but needs some work.

Make Data Civilized

In order to make the data standardized and workable we need to define column names and remove NA values for this example. Please use the same column names in your examples also.

#Firstly we should specify the library to use.
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Use the same column names in your data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_jan_17 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2017,month=1)

print(car_data_jan_17,width=Inf)
## # A tibble: 45 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       17         17        0        0          0
##  2 ASTON MARTIN        0        1          1        0        0          0
##  3 AUDI                0      625        625        0        0          0
##  4 BENTLEY             0        1          1        0        0          0
##  5 BMW                 0      570        570        0        0          0
##  6 CHERY               0        0          0        0        0          0
##  7 CITROEN             0      515        515        3      438        441
##  8 DACIA               0     1471       1471        0      235        235
##  9 DS                  0        9          9        0        0          0
## 10 FERRARI             0        0          0        0        0          0
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        17          17  2017     1
##  2         0         1           1  2017     1
##  3         0       625         625  2017     1
##  4         0         1           1  2017     1
##  5         0       570         570  2017     1
##  6         0         0           0  2017     1
##  7         3       953         956  2017     1
##  8         0      1706        1706  2017     1
##  9         0         9           9  2017     1
## 10         0         0           0  2017     1
## # ... with 35 more rows

Save Your Civilized Data

One of the best methods is to save your data to an RDS or RData file. The difference is RDS can hold only one object but RData can hold many. Since we have only one data frame here we will go with RDS.

saveRDS(car_data_jan_17,file="odd_car_sales_data_jan_17.rds")
# You can read that file by readRDS and assigning to an object

Finish With Some Analysis

I just have a look the percentage of commercial vehicle sales over total sales. And I ordered the dataframe according to total commercial sales brand by brand.

# A new column is added named as perc_comm. That is the percentage of commercial sales to total sales.
car_data_jan_17<-car_data_jan_17%>%mutate(perc_comm=(comm_total/total_total)*100)

#How to select columns.
car_data_jan_17%>%
  filter(comm_total>0)%>%
  select(year,month,brand_name,comm_total,total_total,perc_comm)%>%
  arrange(desc(comm_total))
## # A tibble: 17 x 6
##     year month brand_name    comm_total total_total perc_comm
##    <dbl> <dbl> <chr>              <dbl>       <dbl>     <dbl>
##  1  2017     1 FORD                2978        4511     66.0 
##  2  2017     1 FIAT                2245        3866     58.1 
##  3  2017     1 VOLKSWAGEN          1255        4314     29.1 
##  4  2017     1 RENAULT              519        4874     10.6 
##  5  2017     1 CITROEN              441         956     46.1 
##  6  2017     1 TOYOTA               356        2283     15.6 
##  7  2017     1 MERCEDES-BENZ        343         842     40.7 
##  8  2017     1 DACIA                235        1706     13.8 
##  9  2017     1 KIA                  193         619     31.2 
## 10  2017     1 MITSUBISHI           190         218     87.2 
## 11  2017     1 ISUZU                173         173    100   
## 12  2017     1 PEUGEOT              168         805     20.9 
## 13  2017     1 NISSAN               167        1321     12.6 
## 14  2017     1 IVECO                146         146    100   
## 15  2017     1 HYUNDAI              134        2357      5.69
## 16  2017     1 KARSAN                82          82    100   
## 17  2017     1 SSANGYONG              9          19     47.4