The dataset subject to our analysis includes retail sales of automobiles and commercial vehicles in Turkey based on brand in monthly basis. The dataset includes data from Jan’16 to Sep’19. The dataset incorporates 12 columns. Details of the column names & description is are represented below: . brand_name: Name of the brand . auto_dom : Sales amount of domestic automobiles . auto_imp : Sales amount of imported automobiles . auto_total : Total sales amount of automobiles . comm_dom : Sales amount of domestic commercial vehicles . comm_imp : Sales amount of imported commercial vehicles . comm_total : Total sales amount of commercial vehicles . total_dom : Total sales of all domestic product vehicles . total_imp : Total sales of all imported vehicles . total_total: Total sales amount of all vehicles . year : Year . month : Month
In order to perform a healthy analysis, the dataset has been adjusted in three aspects: 1. Delete the invalid row that contains a disclaimer 2. Delete the rows brand_name = “TOPLAM” 3. Change of “ASTON MARTIN” to “ASTON MARTIN” in order to merge the two different brand names for the same brand.
setwd('C:/Ufuk/BDA_503_Data_Analytics_Essentials/Homeworks/ODD')
raw_data = read_rds("car_data_aggregate.rds")
raw_data = subset(raw_data,subset=brand_name != 'TOPLAM:' & !startsWith(brand_name, 'ODD'))
raw_data$brand_name=str_replace(raw_data$brand_name, "MARTIN", "MARTIN")
raw_data[is.na(raw_data)] <- 0
head(raw_data)
## # A tibble: 6 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MAR~ 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## # total_total <dbl>, year <dbl>, month <dbl>
raw_data %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,auto_total,comm_total,total_total,year) %>%
arrange(desc(year),desc(total_total)) %>%
group_by(brand_name, year) %>%
summarize(year_total = sum(total_total)) %>%
arrange(desc(year_total))%>%
ggplot(data = ., aes(x = brand_name, y = year_total,
fill = as.character(year))) + geom_bar(stat = "identity") + aes(x = reorder(brand_name, -year_total),
y = year_total) + labs(x = "", y = "", title = "Total Sales per Brand") + theme_bw() + theme( axis.text.x = element_text(angle = 45,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Year", reverse=TRUE))
Renault has the highest sales in overall, followed by Fiat, Ford and Volkswagen. With a quick look, 2018 sales has a decrease in general which might be as a result of current economic crisis and lack of special consumption tax in prior years.
raw_data %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,auto_total,comm_total,total_total,year) %>%
arrange(desc(year),desc(total_total)) %>%
filter(year == "2018") %>%
group_by(brand_name) %>%
summarize(year_total = sum(total_total)) %>%
arrange(desc(year_total)) %>%
top_n(5) %>%
mutate(prop=percent(year_total / sum(year_total))) %>%
ggplot(data = ., aes(x="", y=prop, fill=brand_name))+ labs(x = "", y = "", title = "Top 5 Car Sales per Brand in 2018") +
geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0) +
theme(plot.title=element_text(hjust=0.5,face='bold',size=16)) +
theme_classic() + theme(axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank()) + geom_text(aes(label=prop),position=position_stack(vjust = 0.5)) + guides(fill=guide_legend(title="Brand Name"))
## Selecting by year_total
When we focus on top 5 companies of total sales 2018, we can conclude that Volkswagen has a catch up in 2018. While it has the 4th ranking in 2016-2018 total sales, shares the 2nd place with Ford in 2018.