From Raw to Civilized Data

We were assumed to download data from website of Otomotiv Distribütörleri Derneği. I prefered to work with December 2017 data.

Download Raw Data

After downloading and renaimng the file, i created temporary file and put the data there. When reading part finished, it’s easy to remove temp file.

# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file with mode statement
download.file("https://github.com/MEF-BDA503/pj18-ukalender/blob/master/odd_retail_sales_2017_12.xlsx?raw=true",mode = "wb",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
library(tidyverse)
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(46, 47))

# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    27    27    NA    NA     0     0    27    27
## 2 ASTON MARTIN    NA     7     7    NA    NA     0     0     7     7
## 3 AUDI            NA  3124  3124    NA    NA     0     0  3124  3124
## 4 BENTLEY         NA     4     4    NA    NA     0     0     4     4
## 5 BMW             NA  3498  3498    NA    NA     0     0  3498  3498
## 6 CHERY           NA    NA     0    NA    NA     0     0     0     0

Make Data Civilized

Naming columns and fixing NA values are required steps to have more civilized data.

# Standart column names will make the file easier to work with 
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now it is time to remove N/A values from data and putting additional info as data date
car_data_dec_17 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2017,month=12)
# Let's see if it is ok
print(car_data_dec_17, width = Inf)
## # A tibble: 45 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       27         27        0        0          0
##  2 ASTON MARTIN        0        7          7        0        0          0
##  3 AUDI                0     3124       3124        0        0          0
##  4 BENTLEY             0        4          4        0        0          0
##  5 BMW                 0     3498       3498        0        0          0
##  6 CHERY               0        0          0        0        0          0
##  7 CITROEN             0      655        655        0      694        694
##  8 DACIA               0     6685       6685        0      764        764
##  9 DS                  0       30         30        0        0          0
## 10 FERRARI             0        3          3        0        0          0
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        27          27  2017    12
##  2         0         7           7  2017    12
##  3         0      3124        3124  2017    12
##  4         0         4           4  2017    12
##  5         0      3498        3498  2017    12
##  6         0         0           0  2017    12
##  7         0      1349        1349  2017    12
##  8         0      7449        7449  2017    12
##  9         0        30          30  2017    12
## 10         0         3           3  2017    12
## # ... with 35 more rows
saveRDS(car_data_dec_17,file="C:/Users/kalenderoglu/Desktop/BDA_assignments/odd_car_sales_data_dec_17.rds")

My Analysis on the Data

First of all, i want to calculate brand shares (in quantity) in total and then rank brands in descending order.

car_data_dec_17 %>% 
  select(brand_name,total_total) %>%
  mutate_if(is.numeric, funs(round(./sum(.), 2))) %>%
arrange(desc(total_total))
## # A tibble: 45 x 2
##    brand_name total_total
##    <chr>            <dbl>
##  1 RENAULT           0.14
##  2 FIAT              0.13
##  3 FORD              0.12
##  4 VOLKSWAGEN        0.12
##  5 DACIA             0.05
##  6 HYUNDAI           0.05
##  7 PEUGEOT           0.05
##  8 TOYOTA            0.05
##  9 NISSAN            0.04
## 10 BMW               0.03
## # ... with 35 more rows

As we’ve seen, top 4 brands -Renault, Fiat, Ford and Walkswagen- have so close market shares in terms of quantity and they make almost half of the total auto and comm market.

As a last step, it would be nice to get split of domestic sales for brands which have more than 1000 sales in total.

car_data_dec_17 %>% 
  filter(total_total > 1000 ) %>%
  select(brand_name,total_dom,total_imp,total_total) %>%
  mutate(dom_share = round(total_dom/total_total,2), imp_share = round(1 - dom_share,2)) %>%
  arrange(desc(dom_share))
## # A tibble: 18 x 6
##    brand_name    total_dom total_imp total_total dom_share imp_share
##    <chr>             <dbl>     <dbl>       <dbl>     <dbl>     <dbl>
##  1 FIAT              16768      1424       18192      0.92      0.08
##  2 TOYOTA             5851      1609        7460      0.78      0.22
##  3 FORD              12348      4516       16864      0.73      0.27
##  4 HONDA              2638      1145        3783      0.7       0.3 
##  5 RENAULT           11420      7634       19054      0.6       0.4 
##  6 HYUNDAI            3068      3586        6654      0.46      0.54
##  7 AUDI                  0      3124        3124      0         1   
##  8 BMW                   0      3498        3498      0         1   
##  9 CITROEN               0      1349        1349      0         1   
## 10 DACIA                 0      7449        7449      0         1   
## 11 KIA                   0      1642        1642      0         1   
## 12 MERCEDES-BENZ         0      4321        4321      0         1   
## 13 NISSAN                0      5642        5642      0         1   
## 14 OPEL                  0      4594        4594      0         1   
## 15 PEUGEOT               0      6243        6243      0         1   
## 16 SEAT                  0      1363        1363      0         1   
## 17 SKODA                 0      3626        3626      0         1   
## 18 VOLKSWAGEN            0     16293       16293      0         1

It seems that sales of Fiat is driven by domestic production while brands like Audi, BMW, Citroen and Dacia are fully import dependent.