First we need to get our data from the website of Otomotiv Distrubutorleri Dernegi. My choosen data period is May 2017.
I had some problems with downloading data but I solved it by adding mode = “wb” statement after download files command. I found this solution from github_forums. In any case, original excel output still available in my repository and here is the link.
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file with mode statement
download.file("https://github.com/MEF-BDA503/pj18-kkyucel/blob/master/odd_retail_sales_2017_05.xlsx?raw=true",mode = "wb",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# I will remove the last two rows since they are empty by using slice
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see how it is look like now
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 24 24 NA NA 0 0 24 24
## 2 ASTON MARTIN NA 2 2 NA NA 0 0 2 2
## 3 AUDI NA 2109 2109 NA NA 0 0 2109 2109
## 4 BENTLEY NA 1 1 NA NA 0 0 1 1
## 5 BMW NA 1814 1814 NA NA 0 0 1814 1814
## 6 CHERY NA NA 0 NA NA 0 0 0 0
In order to make my data more civilized I will re-define column names suitable to work and I will remov N/A values. This way the file would be easier to work with.
# Standart column names will make the file easier to work with
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now it is time to remove N/A values from data and putting additional info as data date
car_data_may_17 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2017,month=5)
# Let's see if it is ok
print(car_data_may_17, width = Inf)
## # A tibble: 45 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 24 24 0 0 0
## 2 ASTON MARTIN 0 2 2 0 0 0
## 3 AUDI 0 2109 2109 0 0 0
## 4 BENTLEY 0 1 1 0 0 0
## 5 BMW 0 1814 1814 0 0 0
## 6 CHERY 0 0 0 0 0 0
## 7 CITROEN 0 1504 1504 0 1147 1147
## 8 DACIA 0 3356 3356 0 658 658
## 9 DS 0 5 5 0 0 0
## 10 FERRARI 0 0 0 0 0 0
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 24 24 2017 5
## 2 0 2 2 2017 5
## 3 0 2109 2109 2017 5
## 4 0 1 1 2017 5
## 5 0 1814 1814 2017 5
## 6 0 0 0 2017 5
## 7 0 2651 2651 2017 5
## 8 0 4014 4014 2017 5
## 9 0 5 5 2017 5
## 10 0 0 0 2017 5
## # ... with 35 more rows
Now it is a good time save my civilized data. There are two different ways to save it, RDS or RData files.Difference between these two files is RDS can hold only one object while RData can hold more than one. Since I have only one data frame RDS would be enough.
saveRDS(car_data_may_17,file="D:/Data Analytics Esentials/week3/odd_assignment/odd_car_sales_data_may_17.rds")
# I can read this saved file anytime by using readRDS
# e.g
# rds_data <- readRDS("D:/Data Analytics Esentials/week3/odd_assignment/odd_car_sales_data_may_17.rds")
First I want to see which brands are selling import and domestic products in the same time and share of domestic product sales over their total sales. In order to serve the cause let’s list the brands with larger domestic sales first.
car_data_may_17 %>%
filter(total_dom > 0 & total_imp > 0) %>%
select(brand_name,total_dom,total_imp,total_total) %>%
mutate(Dom_Share = round(total_dom/total_total,2)) %>%
arrange(desc(Dom_Share))
## # A tibble: 9 x 5
## brand_name total_dom total_imp total_total Dom_Share
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ISUZU 190 11 201 0.95
## 2 FIAT 9718 583 10301 0.94
## 3 HONDA 1630 660 2290 0.71
## 4 FORD 6546 4262 10808 0.61
## 5 RENAULT 6690 4898 11588 0.580
## 6 HYUNDAI 2588 2261 4849 0.53
## 7 TOPLAM: 30440 54982 85422 0.36
## 8 MITSUBISHI 10 462 472 0.02
## 9 PEUGEOT 2 3616 3618 0
Now I am corious about if there are any brands selling automobiles and commercial vehicles in the same time and what is the volume of commercial vehicles in their total sales. Propably it would make more sense the see higger selling brands first.
car_data_may_17 %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,auto_total,comm_total,total_total) %>%
mutate(Auto_Volume = round(auto_total/total_total,2), Comm_Volume = round(comm_total/total_total,2)) %>%
arrange(desc(total_total))
## # A tibble: 13 x 6
## brand_name auto_total comm_total total_total Auto_Volume Comm_Volume
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TOPLAM: 65799 19623 85422 0.77 0.23
## 2 RENAULT 10320 1268 11588 0.89 0.11
## 3 FORD 4516 6292 10808 0.42 0.580
## 4 FIAT 6512 3789 10301 0.63 0.37
## 5 HYUNDAI 4492 357 4849 0.93 0.07
## 6 DACIA 3356 658 4014 0.84 0.16
## 7 PEUGEOT 2651 967 3618 0.73 0.27
## 8 NISSAN 2835 170 3005 0.94 0.06
## 9 MERCEDES-BENZ 2213 781 2994 0.74 0.26
## 10 CITROEN 1504 1147 2651 0.570 0.43
## 11 KIA 1110 386 1496 0.74 0.26
## 12 MITSUBISHI 24 448 472 0.05 0.95
## 13 SSANGYONG 32 15 47 0.68 0.32