The dataset subject to our analysis includes retail sales of automobiles and commercial vehicles in Turkey based on brand in monthly basis. The dataset includes data from Jan’16 to Sep’19. The dataset incorporates 12 columns. Details of the column names & description is are represented below:
• brand_name: Name of the brand
• auto_dom : Sales amount of domestic automobiles
• auto_imp : Sales amount of imported automobiles
• auto_total : Total sales amount of automobiles
• comm_dom : Sales amount of domestic commercial vehicles
• comm_imp : Sales amount of imported commercial vehicles
• comm_total : Total sales amount of commercial vehicles
• total_dom : Total sales of all domestic product vehicles
• total_imp : Total sales of all imported vehicles
• total_total: Total sales amount of all vehicles
• year : Year
• month : Month
In order to perform a healthy analysis, the dataset has been adjusted in three aspects:
1. Delete the invalid row that contains a disclaimer
2. Delete the rows brand_name = “TOPLAM”
3. Change of “ASTON MARTİN” to “ASTON MARTIN” in order to merge the two different brand names for the same brand.
setwd('/home/mustafa/R/odd')
raw_data = read_rds("car_data_aggregate.rds")
raw_data = subset(raw_data,subset=brand_name != 'TOPLAM:' &!startsWith(brand_name,"ODD"))
raw_data$brand_name=str_replace(raw_data$brand_name, "ASTON MARTİN", "ASTON MARTIN")
#raw_data = raw_data %>% slice(-46)
raw_data[is.na(raw_data)] <- 0
head(raw_data)
## # A tibble: 6 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MAR… 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## # total_total <dbl>, year <dbl>, month <dbl>
When we check the monthly seasonality of top 12 brands we can say that car sales increase at spring and at last two months. As the amount of sales increases the fluctuations increase. But fluctuations for the last two months looks different, for further analysis we should check brands pricing policies.
tops= raw_data%>% select(total_total,brand_name) %>%
group_by(brand_name) %>%
summarize(total = sum(total_total)) %>%
arrange(desc(total))%>%
top_n(12)%>%
select(brand_name)
## Selecting by total
raw_data%>%
mutate(month=with(raw_data,sprintf("%02d", month)))%>%
#filter(brand_name %in% c("MERCEDES-BENZ", "FIAT", "OPEL","FORD", "HYUNDAI","VOLKSWAGEN","DACIA","TOYOTA","RENAULT")) %>%
select(total_total,month,brand_name) %>%
filter(brand_name %in% tops$brand_name) %>%
group_by(brand_name, month) %>%
summarize(total=sum(total_total))%>%
ggplot(aes(x = month, y = total,
fill = month)) + geom_bar(stat = "identity") + theme_minimal() + theme(legend.position = "right", legend.direction = "vertical" )+facet_wrap(facets=. ~ brand_name)+xlab("")
When we look at the top 5 brands yearly share, compared to 2016 in 2017 Volkswagen sold 60K less cars, while FIAT selling 15k more and RENAULT selling 9K more and the others staying same. For 2018 FIAT will possibly sell a little more then Volkswagen and all of them will be selling less then 2017 and 2016.
tops2= raw_data%>% select(total_total,brand_name) %>%
group_by(brand_name) %>%
summarize(total = sum(total_total)) %>%
arrange(desc(total))%>%
top_n(5)%>%
select(brand_name)
## Selecting by total
raw_data %>%
filter(brand_name %in% tops2$brand_name) %>%
#filter(year=='2018') %>%
group_by(brand_name,year) %>%
summarize(total = sum(total_total)/1000) %>%
ggplot(data = ., aes(x="", y=total, fill=brand_name))+ labs(x = "", y = "", title = "Top 5 Brand Car Sales Share Per Brand Per Year (1k Cars)") +
theme(axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank()) +
geom_bar(width = 1, stat = "identity") + geom_text(aes(label=as.integer(total)),position=position_stack(vjust = 0.5))+coord_polar("y", start=0) + facet_grid(facets=. ~ year)+guides(fill=guide_legend(title="Brand Name"))