We were assumed to download data from website of Otomotiv Distribütörleri Derneği. I prefered to work with December 2017 data.
After downloading and renaimng the file, i created temporary file and put the data there. When reading part finished, it’s easy to remove temp file.
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file with mode statement
download.file("https://github.com/MEF-BDA503/pj18-ukalender/blob/master/odd_retail_sales_2017_12.xlsx?raw=true",mode = "wb",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
library(tidyverse)
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(46, 47))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 27 27 NA NA 0 0 27 27
## 2 ASTON MARTIN NA 7 7 NA NA 0 0 7 7
## 3 AUDI NA 3124 3124 NA NA 0 0 3124 3124
## 4 BENTLEY NA 4 4 NA NA 0 0 4 4
## 5 BMW NA 3498 3498 NA NA 0 0 3498 3498
## 6 CHERY NA NA 0 NA NA 0 0 0 0
Naming columns and fixing NA values are required steps to have more civilized data.
# Standart column names will make the file easier to work with
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now it is time to remove N/A values from data and putting additional info as data date
car_data_dec_17 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2017,month=12)
# Let's see if it is ok
print(car_data_dec_17, width = Inf)
## # A tibble: 45 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 27 27 0 0 0
## 2 ASTON MARTIN 0 7 7 0 0 0
## 3 AUDI 0 3124 3124 0 0 0
## 4 BENTLEY 0 4 4 0 0 0
## 5 BMW 0 3498 3498 0 0 0
## 6 CHERY 0 0 0 0 0 0
## 7 CITROEN 0 655 655 0 694 694
## 8 DACIA 0 6685 6685 0 764 764
## 9 DS 0 30 30 0 0 0
## 10 FERRARI 0 3 3 0 0 0
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 27 27 2017 12
## 2 0 7 7 2017 12
## 3 0 3124 3124 2017 12
## 4 0 4 4 2017 12
## 5 0 3498 3498 2017 12
## 6 0 0 0 2017 12
## 7 0 1349 1349 2017 12
## 8 0 7449 7449 2017 12
## 9 0 30 30 2017 12
## 10 0 3 3 2017 12
## # ... with 35 more rows
saveRDS(car_data_dec_17,file="C:/Users/kalenderoglu/Desktop/BDA_assignments/odd_car_sales_data_dec_17.rds")
First of all, i want to calculate brand shares (in quantity) in total and then rank brands in descending order.
car_data_dec_17 %>%
select(brand_name,total_total) %>%
mutate_if(is.numeric, funs(round(./sum(.), 2))) %>%
arrange(desc(total_total))
## # A tibble: 45 x 2
## brand_name total_total
## <chr> <dbl>
## 1 RENAULT 0.14
## 2 FIAT 0.13
## 3 FORD 0.12
## 4 VOLKSWAGEN 0.12
## 5 DACIA 0.05
## 6 HYUNDAI 0.05
## 7 PEUGEOT 0.05
## 8 TOYOTA 0.05
## 9 NISSAN 0.04
## 10 BMW 0.03
## # ... with 35 more rows
As we’ve seen, top 4 brands -Renault, Fiat, Ford and Walkswagen- have so close market shares in terms of quantity and they make almost half of the total auto and comm market.
As a last step, it would be nice to get split of domestic sales for brands which have more than 1000 sales in total.
car_data_dec_17 %>%
filter(total_total > 1000 ) %>%
select(brand_name,total_dom,total_imp,total_total) %>%
mutate(dom_share = round(total_dom/total_total,2), imp_share = round(1 - dom_share,2)) %>%
arrange(desc(dom_share))
## # A tibble: 18 x 6
## brand_name total_dom total_imp total_total dom_share imp_share
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FIAT 16768 1424 18192 0.92 0.08
## 2 TOYOTA 5851 1609 7460 0.78 0.22
## 3 FORD 12348 4516 16864 0.73 0.27
## 4 HONDA 2638 1145 3783 0.7 0.3
## 5 RENAULT 11420 7634 19054 0.6 0.4
## 6 HYUNDAI 3068 3586 6654 0.46 0.54
## 7 AUDI 0 3124 3124 0 1
## 8 BMW 0 3498 3498 0 1
## 9 CITROEN 0 1349 1349 0 1
## 10 DACIA 0 7449 7449 0 1
## 11 KIA 0 1642 1642 0 1
## 12 MERCEDES-BENZ 0 4321 4321 0 1
## 13 NISSAN 0 5642 5642 0 1
## 14 OPEL 0 4594 4594 0 1
## 15 PEUGEOT 0 6243 6243 0 1
## 16 SEAT 0 1363 1363 0 1
## 17 SKODA 0 3626 3626 0 1
## 18 VOLKSWAGEN 0 16293 16293 0 1
It seems that sales of Fiat is driven by domestic production while brands like Audi, BMW, Citroen and Dacia are fully import dependent.