First I created a sub-analysis and a 2016 March part of the whole data set. After finishing the this process whole data,between 2016-2018, merged and I did some analysis on the whole data.
I used readRDS command to start my analysis by reading the merged data which is in my local drive
# My local path
explanatory <- readRDS("home/murat/Masaüstü/pj18-mrtgocer/files/car_data_aggregate.rds")
I started with top 5 largest import car selling brands as of last year end.
##Top Import Car 5 Sellers in last year end##
top_sellers <- cars %>%
filter(auto_total > 0 & brand_name != "TOPLAM:" & year == 2017 & month == 12) %>%
select(brand_name, auto_imp) %>%
mutate(market_share = percent(auto_imp / sum(auto_imp))) %>%
arrange(desc(auto_imp)) %>%
top_n(n = 5, wt = auto_imp)
## Let's see if it worked
head(top_sellers)
## # A tibble: 6 x 3
## brand_name auto_imp market_share
## <chr> <dbl> <chr>
## 1 VOLKSWAGEN 13003 19.1%
## 2 DACIA 6685 9.8%
## 3 NISSAN 5225 7.7%
## 4 RENAULT 4820 7.1%
## 5 OPEL 4594 6.7%
## 6 PEUGEOT 4594 6.7%
I continued my analysis by shaping main data in a way that we can use easier.
##I filtered the companies whose sells are more than 0##
top_auto <- cars %>%
filter(auto_total > 0 & brand_name != "TOPLAM:") %>%
select(brand_name, auto_dom, auto_imp, auto_total, month, year) %>%
## The data has a monthly frequency
mutate(day = 01, imp_ratio = round(auto_imp/auto_total,2)) %>%
arrange(desc(year, month))
## I checked if it is working
head(top_auto)
## # A tibble: 6 x 8
## brand_name auto_dom auto_imp auto_total month year day imp_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 9 2018 1 1
## 2 ASTON MARTIN 0 2 2 9 2018 1 1
## 3 AUDI 0 350 350 9 2018 1 1
## 4 BMW 0 158 158 9 2018 1 1
## 5 CITROEN 0 134 134 9 2018 1 1
## 6 DACIA 0 1141 1141 9 2018 1 1
Now I joined the tables.
mydata <- inner_join(top_sellers, top_auto, by =c("brand_name"))
## Checking
head(mydata)
## # A tibble: 6 x 10
## brand_name auto_imp.x market_share auto_dom auto_imp.y auto_total month
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 VOLKSWAGEN 13003 19.1% 0 1763 1763 9
## 2 VOLKSWAGEN 13003 19.1% 0 2567 2567 8
## 3 VOLKSWAGEN 13003 19.1% 0 4217 4217 7
## 4 VOLKSWAGEN 13003 19.1% 0 4428 4428 6
## 5 VOLKSWAGEN 13003 19.1% 0 5913 5913 5
## 6 VOLKSWAGEN 13003 19.1% 0 5801 5801 4
## # ... with 3 more variables: year <dbl>, day <dbl>, imp_ratio <dbl>
I used lubridate library for histroical information in the analysis.
plot <- mydata %>% rowwise() %>%
mutate(date = lubridate::as_date(paste(year, as.integer(month), as.integer(day), sep = "-")))