From pieces to whole

First part of this assignment was to create a sub-analysis and a part of the whole data consist of one month per person. After finishing the this process data merged and you can find it here.

Let’s start

I will use readRDS command to start my analysis by reading the merged data

# Since this data is in my local drive I provided a local path to it
cars <- readRDS("D:/Data Analytics Esentials/week3/odd_assignment/car_data_aggregate.rds")

Aspect of my analysis

Since there are a lot of car brands existing in the market analyzing all of them might complicate things. So let’s start with identifying most important ones. I will take top 5 largest import car selling brands as of last year end.

##Top Import Car 5 Sellers in last year end##
top_sellers <- cars %>%
  filter(auto_total > 0 & brand_name != "TOPLAM:" & year == 2017 & month == 12) %>%
  select(brand_name, auto_imp) %>%
  mutate(market_share = percent(auto_imp / sum(auto_imp))) %>%
  arrange(desc(auto_imp)) %>%
  top_n(n = 5, wt = auto_imp)
## Let's see if it worked
head(top_sellers)
## # A tibble: 6 x 3
##   brand_name auto_imp market_share
##   <chr>         <dbl> <chr>       
## 1 VOLKSWAGEN    13003 19.1%       
## 2 DACIA          6685 9.8%        
## 3 NISSAN         5225 7.7%        
## 4 RENAULT        4820 7.1%        
## 5 OPEL           4594 6.7%        
## 6 PEUGEOT        4594 6.7%

Let’s try a pie chart for top 5 import seller

ggplot(data = top_sellers, aes(x = "", y = auto_imp, fill = brand_name)) + geom_bar(width = 1, stat = "identity") +
  theme(legend.position="bottom", axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank()) +
  coord_polar("y", start=0)

Import car sales

As you already saw I started by identifying top 5 import car saler because who doesn’t like import car? Anyway let’s continue our analysis by shaping main data in a way that we can use easier.

##If a company did not sell any cars we don't need it, also total lines are not necessary since our analysis is based on individual brands##
top_auto <- cars %>%
  filter(auto_total > 0 & brand_name != "TOPLAM:") %>%
  select(brand_name, auto_dom, auto_imp, auto_total, month, year) %>%
## Our data has a monthly frequency but I am adding a day column for formatting puposes
  mutate(day = 01, imp_ratio = round(auto_imp/auto_total,2)) %>%
  arrange(desc(year, month))
## Is it working?
head(top_auto)
## # A tibble: 6 x 8
##   brand_name   auto_dom auto_imp auto_total month  year   day imp_ratio
##   <chr>           <dbl>    <dbl>      <dbl> <dbl> <dbl> <dbl>     <dbl>
## 1 ALFA ROMEO          0       13         13     9  2018     1         1
## 2 ASTON MARTIN        0        2          2     9  2018     1         1
## 3 AUDI                0      350        350     9  2018     1         1
## 4 BMW                 0      158        158     9  2018     1         1
## 5 CITROEN             0      134        134     9  2018     1         1
## 6 DACIA               0     1141       1141     9  2018     1         1

Joining forces

Now I have two tables. First one is telling me which brands should I check in the second one. I could just put that brands in the filtering line but for the sake of laziness let’s join them togather. Maybe we need to change the criteria for our analysis.

##using inner_join will allow me to alter my analysis by only changing a simple filter in the first table##
mydata <- inner_join(top_sellers, top_auto, by =c("brand_name"))
## Time to check
head(mydata)
## # A tibble: 6 x 10
##   brand_name auto_imp.x market_share auto_dom auto_imp.y auto_total month
##   <chr>           <dbl> <chr>           <dbl>      <dbl>      <dbl> <dbl>
## 1 VOLKSWAGEN      13003 19.1%               0       1763       1763     9
## 2 VOLKSWAGEN      13003 19.1%               0       2567       2567     8
## 3 VOLKSWAGEN      13003 19.1%               0       4217       4217     7
## 4 VOLKSWAGEN      13003 19.1%               0       4428       4428     6
## 5 VOLKSWAGEN      13003 19.1%               0       5913       5913     5
## 6 VOLKSWAGEN      13003 19.1%               0       5801       5801     4
## # ... with 3 more variables: year <dbl>, day <dbl>, imp_ratio <dbl>

Final preparations

Did you remember I added an fictional column for days? It is time to use it because I will use historic information in my analysis and it needs to be in data format. We will use lubridate library for this one.

plot <- mydata %>% rowwise() %>%
  mutate(date = lubridate::as_date(paste(year, as.integer(month), as.integer(day), sep = "-")))

Let’s make a line graph

We have everything that we needed to analyze this data. Best way to do it is to convert it a line graph.

ggplot(data = plot, aes(x = date, y = auto_imp.y, color = brand_name)) + geom_line(size = 1) +
  ggtitle("Sales trend of top 5 import seller") + scale_x_date(breaks = pretty_breaks(10)) +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) + theme(legend.position="bottom")

Everything is set

We started with a month of information and now have almost 3 years of data in a compact graph. We can conclude following results from above graph:

1- It is clear that all 5 of the companies reach their peak in each December. I think it is safe to say these are campaign impacts in order to get rid of the old stock.

2- Altough there are some visible differences in the monthly pattern we can say these brands are performing very similar MoM

3- Volkswagen is selling a lot! Of course this is an expected situation due to current quality perception of the brand. On the other hand, in the campaign months, they able to increase the gap and widen their market share hugely