Car Sales Data Import and Analyze

In week-2 we studied with tidyverse package and some functions in this package. This homework is about it. After we downloaded data from ODD official website, we changed the name to ODD_Retail_Sales_201806.xlsx. We will make some example from raw data to final analysis below.

Download Data

At first, we load the data in our R workspace via the code below

install.packages("tidyverse", repos = "https://cran.r-project.org")
## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\STUDENT\AppData\Local\Temp\Rtmp8gNkaG\downloaded_packages
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readxl)
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/pj18-muharremcakir81/blob/master/Week2/Odd_Retail_Sales_2018_06.xlsx?raw=true",destfile=tmp,mode = 'wb')
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    10    10    NA    NA     0     0    10    10
## 2 ASTON MARTIN    NA     1     1    NA    NA     0     0     1     1
## 3 AUDI            NA   668   668    NA    NA     0     0   668   668
## 4 BENTLEY         NA     1     1    NA    NA     0     0     1     1
## 5 BMW             NA  1386  1386    NA    NA     0     0  1386  1386
## 6 CITROEN         NA   499   499    NA   520   520     0  1019  1019
file.remove(tmp)
## [1] TRUE

Make Data Civilized

Now we need to remove NA values and define column names to make data standardized.

colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_201806 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=6)
                   
#TO save our data;
saveRDS(car_data_201806,file="Odd_Retail_Sales_201806.rds")
                 
print(car_data_201806,width=Inf)
## # A tibble: 44 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       10         10        0        0          0
##  2 ASTON MARTIN        0        1          1        0        0          0
##  3 AUDI                0      668        668        0        0          0
##  4 BENTLEY             0        1          1        0        0          0
##  5 BMW                 0     1386       1386        0        0          0
##  6 CITROEN             0      499        499        0      520        520
##  7 DACIA               0     2254       2254        0      293        293
##  8 DS                  0        9          9        0        0          0
##  9 FERRARI             0        1          1        0        0          0
## 10 FIAT             3106      139       3245     1689      223       1912
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        10          10  2018     6
##  2         0         1           1  2018     6
##  3         0       668         668  2018     6
##  4         0         1           1  2018     6
##  5         0      1386        1386  2018     6
##  6         0      1019        1019  2018     6
##  7         0      2547        2547  2018     6
##  8         0         9           9  2018     6
##  9         0         1           1  2018     6
## 10      4795       362        5157  2018     6
## # ... with 34 more rows

Let’s make Some Analysis

Our data is ready to make analysis.

In first analyse , we can see the percentage of total_dom and percentage of total_imp for each brand ordered in decreasing Percentage_dom.

car_data_201806 %>% select(year, month, brand_name, total_total,total_dom,total_imp) %>% 
  mutate(Percentage_dom=total_dom/total_total * 100,Percentage_imp=total_imp/total_total * 100)  %>%
  select(year, month, brand_name, total_total,Percentage_dom,Percentage_imp) %>% 
  arrange(desc(Percentage_dom)) 
## # A tibble: 44 x 6
##     year month brand_name total_total Percentage_dom Percentage_imp
##    <dbl> <dbl> <chr>            <dbl>          <dbl>          <dbl>
##  1  2018     6 KARSAN              47         100              0   
##  2  2018     6 FIAT              5157          93.0            7.02
##  3  2018     6 ISUZU              398          91.5            8.54
##  4  2018     6 TOYOTA            2275          82.5           17.5 
##  5  2018     6 HONDA             2515          82.1           17.9 
##  6  2018     6 RENAULT           8132          69.6           30.4 
##  7  2018     6 FORD              5928          62.4           37.6 
##  8  2018     6 HYUNDAI           3356          39.5           60.5 
##  9  2018     6 TOPLAM:          51037          38.9           61.1 
## 10  2018     6 MITSUBISHI         160           8.12          91.9 
## # ... with 34 more rows

In second analyse , we can see the percentage of Auto_total and percentage of comm_total for each brands their percentage of total_dom greater than %50 ordered in decreasing Total Sales..

car_data_201806 %>% select(year, month, brand_name, total_total,total_dom,total_imp,comm_total,auto_total) %>% 
  mutate(Percentage_dom=total_dom/total_total * 100,Percentage_Auto=auto_total/total_total * 100,Percentage_comm=comm_total/total_total * 100) %>%  filter(Percentage_dom > 50 ) %>%
  select(year, month, brand_name,total_total,Percentage_Auto,Percentage_comm) %>% 
  arrange(desc(total_total)) 
## # A tibble: 7 x 6
##    year month brand_name total_total Percentage_Auto Percentage_comm
##   <dbl> <dbl> <chr>            <dbl>           <dbl>           <dbl>
## 1  2018     6 RENAULT           8132            91.3            8.67
## 2  2018     6 FORD              5928            43.1           56.9 
## 3  2018     6 FIAT              5157            62.9           37.1 
## 4  2018     6 HONDA             2515           100              0   
## 5  2018     6 TOYOTA            2275            92.1            7.91
## 6  2018     6 ISUZU              398             0            100   
## 7  2018     6 KARSAN              47             0            100

In third and last analyse , we can see the 14 brands theirs total sales is above the average total sales of all brands.

car_data_201806 %>% select(year, month, brand_name, total_total) %>% 
  mutate(average_Total=mean(total_total) ) %>%  
  filter(total_total > average_Total)  %>%
  arrange(desc(total_total)) 
## # A tibble: 8 x 5
##    year month brand_name total_total average_Total
##   <dbl> <dbl> <chr>            <dbl>         <dbl>
## 1  2018     6 TOPLAM:          51037         2320.
## 2  2018     6 RENAULT           8132         2320.
## 3  2018     6 FORD              5928         2320.
## 4  2018     6 VOLKSWAGEN        5521         2320.
## 5  2018     6 FIAT              5157         2320.
## 6  2018     6 HYUNDAI           3356         2320.
## 7  2018     6 DACIA             2547         2320.
## 8  2018     6 HONDA             2515         2320.