First part of this assignment was to create a sub-analysis and a part of the whole data consist of one month per person. After finishing the this process data merged and you can find it here.
I will use readRDS command to start my analysis by reading the merged data
# Since this data is in my local drive I provided a local path to it
cars <- readRDS("D:/Data Analytics Esentials/week3/odd_assignment/car_data_aggregate.rds")
Since there are a lot of car brands existing in the market analyzing all of them might complicate things. So let’s start with identifying most important ones. I will take top 5 largest import car selling brands as of last year end.
##Top Import Car 5 Sellers in last year end##
top_sellers <- cars %>%
filter(auto_total > 0 & brand_name != "TOPLAM:" & year == 2017 & month == 12) %>%
select(brand_name, auto_imp) %>%
mutate(market_share = percent(auto_imp / sum(auto_imp))) %>%
arrange(desc(auto_imp)) %>%
top_n(n = 5, wt = auto_imp)
## Let's see if it worked
head(top_sellers)
## # A tibble: 6 x 3
## brand_name auto_imp market_share
## <chr> <dbl> <chr>
## 1 VOLKSWAGEN 13003 19.1%
## 2 DACIA 6685 9.8%
## 3 NISSAN 5225 7.7%
## 4 RENAULT 4820 7.1%
## 5 OPEL 4594 6.7%
## 6 PEUGEOT 4594 6.7%
ggplot(data = top_sellers, aes(x = "", y = auto_imp, fill = brand_name)) + geom_bar(width = 1, stat = "identity") +
theme(legend.position="bottom", axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank()) +
coord_polar("y", start=0)
As you already saw I started by identifying top 5 import car saler because who doesn’t like import car? Anyway let’s continue our analysis by shaping main data in a way that we can use easier.
##If a company did not sell any cars we don't need it, also total lines are not necessary since our analysis is based on individual brands##
top_auto <- cars %>%
filter(auto_total > 0 & brand_name != "TOPLAM:") %>%
select(brand_name, auto_dom, auto_imp, auto_total, month, year) %>%
## Our data has a monthly frequency but I am adding a day column for formatting puposes
mutate(day = 01, imp_ratio = round(auto_imp/auto_total,2)) %>%
arrange(desc(year, month))
## Is it working?
head(top_auto)
## # A tibble: 6 x 8
## brand_name auto_dom auto_imp auto_total month year day imp_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 9 2018 1 1
## 2 ASTON MARTIN 0 2 2 9 2018 1 1
## 3 AUDI 0 350 350 9 2018 1 1
## 4 BMW 0 158 158 9 2018 1 1
## 5 CITROEN 0 134 134 9 2018 1 1
## 6 DACIA 0 1141 1141 9 2018 1 1
Now I have two tables. First one is telling me which brands should I check in the second one. I could just put that brands in the filtering line but for the sake of laziness let’s join them togather. Maybe we need to change the criteria for our analysis.
##using inner_join will allow me to alter my analysis by only changing a simple filter in the first table##
mydata <- inner_join(top_sellers, top_auto, by =c("brand_name"))
## Time to check
head(mydata)
## # A tibble: 6 x 10
## brand_name auto_imp.x market_share auto_dom auto_imp.y auto_total month
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 VOLKSWAGEN 13003 19.1% 0 1763 1763 9
## 2 VOLKSWAGEN 13003 19.1% 0 2567 2567 8
## 3 VOLKSWAGEN 13003 19.1% 0 4217 4217 7
## 4 VOLKSWAGEN 13003 19.1% 0 4428 4428 6
## 5 VOLKSWAGEN 13003 19.1% 0 5913 5913 5
## 6 VOLKSWAGEN 13003 19.1% 0 5801 5801 4
## # ... with 3 more variables: year <dbl>, day <dbl>, imp_ratio <dbl>
Did you remember I added an fictional column for days? It is time to use it because I will use historic information in my analysis and it needs to be in data format. We will use lubridate library for this one.
plot <- mydata %>% rowwise() %>%
mutate(date = lubridate::as_date(paste(year, as.integer(month), as.integer(day), sep = "-")))
We have everything that we needed to analyze this data. Best way to do it is to convert it a line graph.
ggplot(data = plot, aes(x = date, y = auto_imp.y, color = brand_name)) + geom_line(size = 1) +
ggtitle("Sales trend of top 5 import seller") + scale_x_date(breaks = pretty_breaks(10)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) + theme(legend.position="bottom")
We started with a month of information and now have almost 3 years of data in a compact graph. We can conclude following results from above graph:
1- It is clear that all 5 of the companies reach their peak in each December. I think it is safe to say these are campaign impacts in order to get rid of the old stock.
2- Altough there are some visible differences in the monthly pattern we can say these brands are performing very similar MoM
3- Volkswagen is selling a lot! Of course this is an expected situation due to current quality perception of the brand. On the other hand, in the campaign months, they able to increase the gap and widen their market share hugely