Brief & Preparation

The dataset subject to our analysis includes retail sales of automobiles and commercial vehicles in Turkey based on brand in monthly basis. The dataset includes data from Jan’16 to Sep’19. The dataset incorporates 12 columns. Details of the column names & description is are represented below:

In order to perform a healthy analysis, the dataset has been adjusted in three aspects:

1.  Delete the invalid row that contains a disclaimer
2.  Delete the rows brand_name = "TOPLAM" 
3.  Change of "ASTON MARTIN" to "ASTON MARTIN" in order to merge the two different brand names for the same brand.

`

setwd('C:/Users/win7pro/Desktop/MEF - Big Data/Fall 1/503 - Data Analytics Essentials/Assignments/A4/ODD')
raw_data = read_rds("car_data_aggregate.rds")
raw_data = subset(raw_data,subset=brand_name != 'TOPLAM:' & !startsWith(brand_name, "ODD"))
raw_data$brand_name = str_replace(raw_data$brand_name, "MARTIN", "MARTIN")
#raw_data = raw_data %>% slice(-46)
raw_data[is.na(raw_data)] <- 0
head(raw_data)
## # A tibble: 6 x 12
##   brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##   <chr>         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
## 1 ALFA ROMEO        0       13         13        0        0          0
## 2 ASTON MAR~        0        2          2        0        0          0
## 3 AUDI              0      350        350        0        0          0
## 4 BENTLEY           0        0          0        0        0          0
## 5 BMW               0      158        158        0        0          0
## 6 CITROEN           0      134        134        0      197        197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## #   total_total <dbl>, year <dbl>, month <dbl>

ANALYSIS I:

To start with, let’s check the yearly total sales of the total market:

raw_data %>%
  mutate(month_year=with(data=raw_data,sprintf("%d-%02d", year, month)))%>%
  group_by(brand_name,year)%>%
  summarize(total=sum(total_total))%>%
  ggplot(data = ., aes(x = year, y = total, fill = as.character(year))) + geom_bar(stat = "identity")+
  labs(x = "", y = "", title = "Car Sales Analysis") + theme_minimal() + theme( axis.text.x = element_text(angle = 90, 
  vjust = 0.5, hjust = 0.5, size = 12))+ scale_y_continuous(labels = scales::comma) +
  guides(fill=guide_legend(title="Year"))

It can be concluded that total car sales in Turkey has been decreasing yearly between the years 2016 & 2018. Although we have only 9 months’ data of 2018, it can be said that the trend is still decreasing for 2018 if we make a linear forecast of the whole year based on the 9 months’ actual trend.

ANALYSIS II:

Then, let’s look at the top 5 brands based on total sales performance in 2017:

raw_data %>% 
    filter(auto_total > 0 & comm_total > 0) %>%
    select(brand_name,auto_total,comm_total,total_total,year) %>%
    arrange(desc(year),desc(total_total)) %>%
    filter(brand_name != "TOPLAM:") %>% 
    filter(brand_name != "ODD*") %>%
    filter(year == "2017") %>%  
    group_by(brand_name) %>%
    summarize(year_total = sum(total_total)) %>%
    arrange(desc(year_total)) %>%
    top_n(5) %>% 
    mutate(prop=percent(year_total / sum(year_total))) %>%
    ggplot(data = ., aes(x="", y=prop, fill=brand_name))+ labs(x = "", y = "", title = "Top 5 Car Sales per Brand in 2017") +  
    geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0) +
    theme(plot.title=element_text(hjust=0.5,face='bold',size=16)) +  
    theme_classic() + theme(axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank()) +
    geom_text(aes(label=prop),position=position_stack(vjust = 0.5)) +         guides(fill=guide_legend(title="Brand Name")) 
## Selecting by year_total