Car Sales Data Import and Analysis

From Raw to Civilized Data

First we need to get our data from the website of Otomotiv Distrubutorleri Dernegi. My choosen data period is May 2017.

Download Raw Data

I had some problems with downloading data but I solved it by adding mode = “wb” statement after download files command. I found this solution from github_forums. In any case, original excel output still available in my repository and here is the link.

# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file with mode statement
download.file("https://github.com/MEF-BDA503/pj18-kkyucel/blob/master/odd_retail_sales_2017_05.xlsx?raw=true",mode = "wb",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)

## [1] TRUE

# I will remove the last two rows since they are empty by using slice
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see how it is look like now
head(raw_data)

## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    24    24    NA    NA     0     0    24    24
## 2 ASTON MARTIN    NA     2     2    NA    NA     0     0     2     2
## 3 AUDI            NA  2109  2109    NA    NA     0     0  2109  2109
## 4 BENTLEY         NA     1     1    NA    NA     0     0     1     1
## 5 BMW             NA  1814  1814    NA    NA     0     0  1814  1814
## 6 CHERY           NA    NA     0    NA    NA     0     0     0     0

Make Data Civilized

In order to make my data more civilized I will re-define column names suitable to work and I will remov N/A values. This way the file would be easier to work with.

# Standart column names will make the file easier to work with 
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now it is time to remove N/A values from data and putting additional info as data date
car_data_may_17 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2017,month=5)
# Let's see if it is ok
print(car_data_may_17, width = Inf)

## # A tibble: 45 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       24         24        0        0          0
##  2 ASTON MARTIN        0        2          2        0        0          0
##  3 AUDI                0     2109       2109        0        0          0
##  4 BENTLEY             0        1          1        0        0          0
##  5 BMW                 0     1814       1814        0        0          0
##  6 CHERY               0        0          0        0        0          0
##  7 CITROEN             0     1504       1504        0     1147       1147
##  8 DACIA               0     3356       3356        0      658        658
##  9 DS                  0        5          5        0        0          0
## 10 FERRARI             0        0          0        0        0          0
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        24          24  2017     5
##  2         0         2           2  2017     5
##  3         0      2109        2109  2017     5
##  4         0         1           1  2017     5
##  5         0      1814        1814  2017     5
##  6         0         0           0  2017     5
##  7         0      2651        2651  2017     5
##  8         0      4014        4014  2017     5
##  9         0         5           5  2017     5
## 10         0         0           0  2017     5
## # ... with 35 more rows

Time to Save Civilized Data

Now it is a good time save my civilized data. There are two different ways to save it, RDS or RData files.Difference between these two files is RDS can hold only one object while RData can hold more than one. Since I have only one data frame RDS would be enough.

saveRDS(car_data_may_17,file="D:/Data Analytics Esentials/week3/odd_assignment/odd_car_sales_data_may_17.rds")
# I can read this saved file anytime by using readRDS
# e.g
# rds_data <- readRDS("D:/Data Analytics Esentials/week3/odd_assignment/odd_car_sales_data_may_17.rds")

My Analysis on the Data

First I want to see which brands are selling import and domestic products in the same time and share of domestic product sales over their total sales. In order to serve the cause let’s list the brands with larger domestic sales first.

car_data_may_17 %>% 
  filter(total_dom > 0 & total_imp > 0) %>%
  select(brand_name,total_dom,total_imp,total_total) %>%
  mutate(Dom_Share = round(total_dom/total_total,2)) %>%
  arrange(desc(Dom_Share))

## # A tibble: 9 x 5
##   brand_name total_dom total_imp total_total Dom_Share
##   <chr>          <dbl>     <dbl>       <dbl>     <dbl>
## 1 ISUZU            190        11         201     0.95 
## 2 FIAT            9718       583       10301     0.94 
## 3 HONDA           1630       660        2290     0.71 
## 4 FORD            6546      4262       10808     0.61 
## 5 RENAULT         6690      4898       11588     0.580
## 6 HYUNDAI         2588      2261        4849     0.53 
## 7 TOPLAM:        30440     54982       85422     0.36 
## 8 MITSUBISHI        10       462         472     0.02 
## 9 PEUGEOT            2      3616        3618     0

Now I am corious about if there are any brands selling automobiles and commercial vehicles in the same time and what is the volume of commercial vehicles in their total sales. Propably it would make more sense the see higger selling brands first.

car_data_may_17 %>% 
  filter(auto_total > 0 & comm_total > 0) %>%
  select(brand_name,auto_total,comm_total,total_total) %>%
  mutate(Auto_Volume = round(auto_total/total_total,2), Comm_Volume = round(comm_total/total_total,2)) %>%
  arrange(desc(total_total))

## # A tibble: 13 x 6
##    brand_name    auto_total comm_total total_total Auto_Volume Comm_Volume
##    <chr>              <dbl>      <dbl>       <dbl>       <dbl>       <dbl>
##  1 TOPLAM:            65799      19623       85422       0.77        0.23 
##  2 RENAULT            10320       1268       11588       0.89        0.11 
##  3 FORD                4516       6292       10808       0.42        0.580
##  4 FIAT                6512       3789       10301       0.63        0.37 
##  5 HYUNDAI             4492        357        4849       0.93        0.07 
##  6 DACIA               3356        658        4014       0.84        0.16 
##  7 PEUGEOT             2651        967        3618       0.73        0.27 
##  8 NISSAN              2835        170        3005       0.94        0.06 
##  9 MERCEDES-BENZ       2213        781        2994       0.74        0.26 
## 10 CITROEN             1504       1147        2651       0.570       0.43 
## 11 KIA                 1110        386        1496       0.74        0.26 
## 12 MITSUBISHI            24        448         472       0.05        0.95 
## 13 SSANGYONG             32         15          47       0.68        0.32