Install Package

In week 2 everyone was studied just one period of ODD data. Now we combine all data and we can study all periods to make deep analyses. First of all we have to install some package that will be used below.

install.packages("tidyverse", repos = "https://cran.r-project.org")
## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\muharrem.cakir\AppData\Local\Temp\RtmpkBjMYP\downloaded_packages
install.packages("formattable", repos = "https://cran.r-project.org")
## package 'formattable' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\muharrem.cakir\AppData\Local\Temp\RtmpkBjMYP\downloaded_packages
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 3.1.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.8
## <U+221A> tidyr   0.8.2     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(formattable)
library(ggplot2)

Download Data

Here we read data from github repository and assign it to a dataframe.

githubURL <- ("https://github.com/MEF-BDA503/pj18-muharremcakir81/blob/master/Week4/car_data_aggregate.rds?raw=true")
All_data<- readRDS(url(githubURL))
head(All_data)
## # A tibble: 6 x 12
##   brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##   <chr>         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
## 1 ALFA ROMEO        0       13         13        0        0          0
## 2 ASTON MAR~        0        2          2        0        0          0
## 3 AUDI              0      350        350        0        0          0
## 4 BENTLEY           0        0          0        0        0          0
## 5 BMW               0      158        158        0        0          0
## 6 CITROEN           0      134        134        0      197        197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## #   total_total <dbl>, year <dbl>, month <dbl>

DATA UNDERSTANDING & CLEANING CONTROL

Comment: when we grouped data by brandname we saw that some rows were appear like ‘TOPLAM’ or starts with ‘ODD…’ We cleaned some rows which’s brand_name variable include ‘TOPLAM’ and “ODD…..” value.

Control_Data <- All_data%>%
  group_by (brand_name)%>%
  summarize(count=n())


Cleaned_data <-All_data%>%
  filter(brand_name != "TOPLAM:")%>%
  filter(!grepl(pattern ="ODD", brand_name))

head(Cleaned_data)
## # A tibble: 6 x 12
##   brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##   <chr>         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
## 1 ALFA ROMEO        0       13         13        0        0          0
## 2 ASTON MAR~        0        2          2        0        0          0
## 3 AUDI              0      350        350        0        0          0
## 4 BENTLEY           0        0          0        0        0          0
## 5 BMW               0      158        158        0        0          0
## 6 CITROEN           0      134        134        0      197        197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## #   total_total <dbl>, year <dbl>, month <dbl>
#View(Cleaned_data)

DATA MODIFY

we just select brand_name and total Sales in 201809, 201808 and 201709 periods from all data respectively and assigng each period to different dataframe. Soma Data manipulation and enrichment operations are done to make comparison between 201809, 201808 and 201709 periods. So that we can see the increase rate for each brand.

total_201809 <- Cleaned_data %>%
            filter (year == 2018, month == 9) %>%
            select (brand_name,total_total) %>%
            rename(Total_201809 = total_total)
#View(total_201809)

#I just select brand_name and total Sales in 201808 period from all data. 

total_201808 <- Cleaned_data %>%
  filter (year == 2018, month == 8) %>%
  select (brand_name,total_total) %>%
  rename(Total_201808 = total_total)
#View(total_201808)

#I just select brand_name and total Sales in 201709 period from all data. 
total_201709 <- Cleaned_data %>%
  filter (year == 2017, month == 9) %>%
  select (brand_name,total_total) %>%
  rename(Total_201709 = total_total)
#View(total_201709)

#I just select brand_name and sum total sales of  first 9 months periods in 2018 from all data. 

total_2018_YTD <- Cleaned_data %>%
  filter (year == 2018) %>%
  group_by (brand_name)%>%
  summarize(Total_2018_YTD = sum(total_total)) %>%
  select (brand_name,Total_2018_YTD)

#View(total_2018_YTD)

#I just select brand_name and sum total sales of  first 9 months periods in 2017 from all data. 

total_2017_YTD <- Cleaned_data %>%
  filter (year == 2017,month <= 9) %>%
  group_by (brand_name)%>%
  summarize(Total_2017_YTD = sum(total_total)) %>%
  select (brand_name,Total_2017_YTD)

#View(total_2017_YTD)

JOIN ALL TEMP DATA SET

After join all temp dataframe each other , we replaced NA values to 0 in all variable.

Denormalize_data <- total_201709 %>%
  full_join(total_201808, by="brand_name")  %>%
  full_join(total_201809, by="brand_name") %>%
  full_join(total_2017_YTD, by="brand_name") %>%
  full_join(total_2018_YTD, by="brand_name")

Denormalize_data <- mutate_all(Denormalize_data,funs(replace(., is.na(.), 0)))
#View(Denormalize_data)

CALCULATE NEW VARIABLE

We calculate new variables to make strong analyses. Descriptions of these variables are: CM_PM_Total_Rate : change rate between Current month (201809) and previous month (201808) CM_PYM_Total_Rate : change rate between Current month (201809) and current month of previous year (201709) YTD_PYTD_Total_Rate :change rate between first 9 months of this year (201801-201809) and first 9 months of previous year (201701-201709)

Brand_Increase_Rate <- Denormalize_data%>%
  mutate(CM_PM_Total_Rate = round((ifelse (Total_201808  == 0 , NA, #(Total_201809 - Total_201808) *100,
                                    (Total_201809 - Total_201808) / Total_201808 *100)),1)) %>%
  mutate(CM_PYM_Total_Rate = round((ifelse (Total_201709  == 0 ,NA, #(Total_201809 - Total_201709) *100,
                                     (Total_201809 - Total_201709) / Total_201709 *100)),1)) %>%
  mutate(YTD_PYTD_Total_Rate = round((ifelse (Total_2017_YTD  == 0 ,(Total_2018_YTD - Total_2017_YTD) *100,
                                     (Total_2018_YTD - Total_2017_YTD) / Total_2017_YTD *100)),1)) %>%
  mutate(Percentage_2018_YTD = round((Total_2018_YTD / sum(Total_2018_YTD) *100),2), Percentage_2017_YTD = round((Total_2017_YTD / sum(Total_2017_YTD) *100),2)) %>%
  select(brand_name,Total_201809,Total_201808,Total_201709,Total_2018_YTD,Percentage_2018_YTD,Total_2017_YTD,Percentage_2017_YTD,
         CM_PM_Total_Rate,CM_PYM_Total_Rate,YTD_PYTD_Total_Rate)%>%
  arrange(desc(Percentage_2018_YTD))

head(Brand_Increase_Rate)
## # A tibble: 6 x 11
##   brand_name Total_201809 Total_201808 Total_201709 Total_2018_YTD
##   <chr>             <dbl>        <dbl>        <dbl>          <dbl>
## 1 RENAULT            3186         4102         9129          66517
## 2 FORD               2356         3709         7525          50489
## 3 VOLKSWAGEN         2239         3585            0          50435
## 4 FIAT               1677         3007         9693          49745
## 5 HYUNDAI            1535         1301         4148          27639
## 6 TOYOTA             1127         2274            0          23409
## # ... with 6 more variables: Percentage_2018_YTD <dbl>,
## #   Total_2017_YTD <dbl>, Percentage_2017_YTD <dbl>,
## #   CM_PM_Total_Rate <dbl>, CM_PYM_Total_Rate <dbl>,
## #   YTD_PYTD_Total_Rate <dbl>

FORMAT TABLE

To recognize the descreasing the sales amount from 2017 to 2018 and to see which brands are sold above the avearage, we formatted table… It’s clear that most of all brands’ selling is decrease from 2017 to 2018. VOLKWAGEN is sold %8.14 of all cars in 2017 and sold %10.9 of all data in 2018.So Volkwagen’s selling perfomance is increasing. Also it’s able to increase total selling in 9 months in 2018 across in 9 months in 2017 (%5,4) Also VOLVO’s performance is very impressive. its selling amount is increased %35.1 percentage in 9 months in 2018.

color <- formatter("span", style = x ~ style(color = ifelse(x > 0, "green",ifelse(x < 0, "red", "black"))))


avg_bold <- formatter("span",  style = x ~ style("font-weight" = ifelse(x > mean(x), "bold", NA)))


formattable(Brand_Increase_Rate, 
            list(Total_2018_YTD = avg_bold,
                 Total_2017_YTD = avg_bold, 
                 CM_PM_Total_Rate = color, 
                 CM_PYM_Total_Rate = color,
                 YTD_PYTD_Total_Rate = color))
brand_name Total_201809 Total_201808 Total_201709 Total_2018_YTD Percentage_2018_YTD Total_2017_YTD Percentage_2017_YTD CM_PM_Total_Rate CM_PYM_Total_Rate YTD_PYTD_Total_Rate
RENAULT 3186 4102 9129 66517 14.37 85622 14.57 -22.3 -65.1 -22.3
FORD 2356 3709 7525 50489 10.91 71918 12.23 -36.5 -68.7 -29.8
VOLKSWAGEN 2239 3585 0 50435 10.90 47841 8.14 -37.5 NA 5.4
FIAT 1677 3007 9693 49745 10.75 78785 13.40 -44.2 -82.7 -36.9
HYUNDAI 1535 1301 4148 27639 5.97 35704 6.07 18.0 -63.0 -22.6
TOYOTA 1127 2274 0 23409 5.06 19026 3.24 -50.4 NA 23.0
DACIA 1460 1687 3371 22079 4.77 32134 5.47 -13.5 -56.7 -31.3
PEUGEOT 717 1629 2621 21991 4.75 25438 4.33 -56.0 -72.6 -13.6
HONDA 1973 1913 2059 20388 4.40 18274 3.11 3.1 -4.2 11.6
MERCEDES-BENZ 1163 1427 3053 18577 4.01 23669 4.03 -18.5 -61.9 -21.5
OPEL 857 1688 2966 16944 3.66 29685 5.05 -49.2 -71.1 -42.9
NISSAN 1217 1381 1754 15881 3.43 20695 3.52 -11.9 -30.6 -23.3
SKODA 618 1098 2344 15773 3.41 15629 2.66 -43.7 -73.6 0.9
CITROEN 331 693 1913 9782 2.11 18092 3.08 -52.2 -82.7 -45.9
BMW 158 1005 1487 9704 2.10 11592 1.97 -84.3 -89.4 -16.3
AUDI 350 737 1352 9448 2.04 12350 2.10 -52.5 -74.1 -23.5
SEAT 285 862 763 8540 1.84 11514 1.96 -66.9 -62.6 -25.8
KIA 311 500 1093 7199 1.56 10051 1.71 -37.8 -71.5 -28.4
VOLVO 427 375 352 3623 0.78 2681 0.46 13.9 21.3 35.1
MITSUBISHI 213 400 294 3132 0.68 2585 0.44 -46.8 -27.6 21.2
SUZUKI 70 229 431 1841 0.40 2643 0.45 -69.4 -83.8 -30.3
ISUZU 76 67 187 1551 0.34 1563 0.27 13.4 -59.4 -0.8
JEEP 90 91 238 1491 0.32 1538 0.26 -1.1 -62.2 -3.1
IVECO 110 82 229 1146 0.25 1717 0.29 34.1 -52.0 -33.3
SUBARU 89 77 118 1042 0.23 1029 0.18 15.6 -24.6 1.3
LAND ROVER 45 22 186 859 0.19 1254 0.21 104.5 -75.8 -31.5
MAZDA 110 52 91 811 0.18 967 0.16 111.5 20.9 -16.1
MINI 45 70 137 804 0.17 887 0.15 -35.7 -67.2 -9.4
KARSAN 109 80 114 702 0.15 1023 0.17 36.2 -4.4 -31.4
PORSCHE 32 75 47 470 0.10 482 0.08 -57.3 -31.9 -2.5
SSANGYONG 4 34 46 279 0.06 506 0.09 -88.2 -91.3 -44.9
DS 9 64 21 202 0.04 89 0.02 -85.9 -57.1 127.0
ALFA ROMEO 13 15 15 141 0.03 246 0.04 -13.3 -13.3 -42.7
JAGUAR 17 8 34 125 0.03 254 0.04 112.5 -50.0 -50.8
LEXUS 2 1 21 39 0.01 73 0.01 100.0 -90.5 -46.6
MASERATI 2 2 6 34 0.01 44 0.01 0.0 -66.7 -22.7
ASTON MARTIN 2 1 0 10 0.00 15 0.00 100.0 NA -33.3
BENTLEY 0 1 3 6 0.00 11 0.00 -100.0 -100.0 -45.5
CHERY 0 0 0 0 0.00 0 0.00 NA NA 0.0
FERRARI 3 2 0 16 0.00 9 0.00 50.0 NA 77.8
GEELY 0 0 0 0 0.00 0 0.00 NA NA 0.0
INFINITI 0 0 6 16 0.00 78 0.01 NA -100.0 -79.5
LAMBORGHINI 0 0 0 2 0.00 4 0.00 NA NA -50.0
OTOKAR 0 0 0 0 0.00 0 0.00 NA NA 0.0
SMART 0 0 2 22 0.00 102 0.02 NA -100.0 -78.4

Finaly there are two graphic analyses here. We take 4 brands (RENAUT,FORD,VOLKSWAGEN,FIAT,HYUNDAI) that achive %50 of total car’s seling.

trend_analyses <- Cleaned_data %>%
  filter (brand_name %in% c("RENAULT","FORD","VOLKSWAGEN","FIAT","HYUNDAI")  & year == 2018)%>%
  select(brand_name,month,total_total)
#View(trend_analyses)

ggplot(trend_analyses,aes(x=month,y=total_total ,color = brand_name)) +geom_line()

brand_analyses <- Cleaned_data %>%
  filter (brand_name %in% c("RENAULT","FORD","VOLKSWAGEN","FIAT","HYUNDAI")  & year== 2018)%>%
  group_by(brand_name) %>%
  summarize(total_by_brand = sum(total_total))%>%
  select(brand_name,total_by_brand)
#View(brand_analyses)

ggplot(brand_analyses,aes(x=brand_name,y=total_by_brand)) +geom_col()