Introduction

The dataset subject to our analysis includes retail sales of automobiles and commercial vehicles in Turkey based on brand in monthly basis. The dataset includes data from Jan’16 to Sep’19. The dataset incorporates 12 columns. Details of the column names & description is are represented below:

In order to perform a healthy analysis, the dataset has been adjusted in three aspects:

  1. Delete the invalid row that contains a disclaimer
  2. Delete the rows brand_name = “TOPLAM”
  3. Change of “ASTON MARTİN” to “ASTON MARTIN” in order to merge the two different brand names for the same brand.
setwd('/Users/serhansuer/Desktop/MEF/Data_Analytics_Essentials/ODD')
raw_data = read_rds('car_data_aggregate.rds')
raw_data = subset(raw_data,subset=brand_name != 'TOPLAM:')
raw_data$brand_name = str_replace(raw_data$brand_name,'ASTON MARTİN','ASTON MARTIN')
raw_data = subset(raw_data, subset=(!startsWith(brand_name, 'ODD')))
asd = select(raw_data, contains('name')) %>%
    arrange(brand_name)
distinct(asd)
## # A tibble: 49 x 1
##    brand_name         
##    <chr>              
##  1 ALFA ROMEO         
##  2 "ASTON MART\u0130N"
##  3 ASTON MARTIN       
##  4 AUDI               
##  5 BENTLEY            
##  6 BMW                
##  7 CHERY              
##  8 CITROEN            
##  9 DACIA              
## 10 DS                 
## # ... with 39 more rows

First Analysis

First analysis shows that the total sales of 2016, 2017 and 2018 by month.

raw_data %>% 
  mutate(month=with(raw_data,sprintf("%02d", month)))%>%
  filter(auto_total > 0 & comm_total > 0) %>%
  select(auto_total,comm_total,total_total,year,month) %>%
  group_by(month, year) %>%
  summarize(total = sum(total_total)) %>%
  arrange(desc(total))%>%
  ggplot(aes(x = month, y = total, 
  fill = as.character(year))) + geom_bar(stat = "identity", position = position_stack(reverse = TRUE)) + theme_minimal() + theme(legend.position = "right", legend.direction = "vertical",
  axis.text.x = element_text(vjust = 0.5, hjust = 0.5, size = 12)) +
  guides(fill=guide_legend(title="Year", reverse=TRUE)) + scale_y_continuous(labels = scales::comma) 

Second Analysis

Second analysis illustrates that the total car sales per year in percantage with a pie chart.

   raw_data %>% 
   filter(auto_total > 0 & comm_total > 0) %>%
   select(brand_name,auto_total,comm_total,total_total,year) %>%
   arrange(desc(year),desc(total_total)) %>%
   group_by(brand_name, year) %>%
   summarize(year_total = sum(total_total)) %>%
   arrange(desc(year_total)) %>%
   group_by(year) %>%
   summarize(year_total2 = sum(year_total)) %>%
   mutate(prop=percent(year_total2 / sum(year_total2))) %>%
   ggplot(data = ., aes(x="", y=prop, fill=as.character(year)))+ labs(x = "", y = "", title = "Total Car Sales per Year in Percantage") +
   geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0) +
   theme(plot.title=element_text(hjust=0.5,face='bold',size=16)) +  
   theme_classic() + theme(axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank()) +
   geom_text(aes(label=prop),position=position_stack(vjust = 0.5)) +  guides(fill=guide_legend(title="Year"))