In week 2 everyone was studied just one period of ODD data. Now we combine all data and we can study all periods to make deep analyses. First of all we have to install some package that will be used below.
install.packages("tidyverse", repos = "https://cran.r-project.org")
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\muharrem.cakir\AppData\Local\Temp\RtmpkBjMYP\downloaded_packages
install.packages("formattable", repos = "https://cran.r-project.org")
## package 'formattable' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\muharrem.cakir\AppData\Local\Temp\RtmpkBjMYP\downloaded_packages
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 3.1.0 <U+221A> purrr 0.2.5
## <U+221A> tibble 1.4.2 <U+221A> dplyr 0.7.8
## <U+221A> tidyr 0.8.2 <U+221A> stringr 1.3.1
## <U+221A> readr 1.1.1 <U+221A> forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(formattable)
library(ggplot2)
Here we read data from github repository and assign it to a dataframe.
githubURL <- ("https://github.com/MEF-BDA503/pj18-muharremcakir81/blob/master/Week4/car_data_aggregate.rds?raw=true")
All_data<- readRDS(url(githubURL))
head(All_data)
## # A tibble: 6 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MAR~ 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## # total_total <dbl>, year <dbl>, month <dbl>
Comment: when we grouped data by brandname we saw that some rows were appear like ‘TOPLAM’ or starts with ‘ODD…’ We cleaned some rows which’s brand_name variable include ‘TOPLAM’ and “ODD…..” value.
Control_Data <- All_data%>%
group_by (brand_name)%>%
summarize(count=n())
Cleaned_data <-All_data%>%
filter(brand_name != "TOPLAM:")%>%
filter(!grepl(pattern ="ODD", brand_name))
head(Cleaned_data)
## # A tibble: 6 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MAR~ 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## # total_total <dbl>, year <dbl>, month <dbl>
#View(Cleaned_data)
we just select brand_name and total Sales in 201809, 201808 and 201709 periods from all data respectively and assigng each period to different dataframe. Soma Data manipulation and enrichment operations are done to make comparison between 201809, 201808 and 201709 periods. So that we can see the increase rate for each brand.
total_201809 <- Cleaned_data %>%
filter (year == 2018, month == 9) %>%
select (brand_name,total_total) %>%
rename(Total_201809 = total_total)
#View(total_201809)
#I just select brand_name and total Sales in 201808 period from all data.
total_201808 <- Cleaned_data %>%
filter (year == 2018, month == 8) %>%
select (brand_name,total_total) %>%
rename(Total_201808 = total_total)
#View(total_201808)
#I just select brand_name and total Sales in 201709 period from all data.
total_201709 <- Cleaned_data %>%
filter (year == 2017, month == 9) %>%
select (brand_name,total_total) %>%
rename(Total_201709 = total_total)
#View(total_201709)
#I just select brand_name and sum total sales of first 9 months periods in 2018 from all data.
total_2018_YTD <- Cleaned_data %>%
filter (year == 2018) %>%
group_by (brand_name)%>%
summarize(Total_2018_YTD = sum(total_total)) %>%
select (brand_name,Total_2018_YTD)
#View(total_2018_YTD)
#I just select brand_name and sum total sales of first 9 months periods in 2017 from all data.
total_2017_YTD <- Cleaned_data %>%
filter (year == 2017,month <= 9) %>%
group_by (brand_name)%>%
summarize(Total_2017_YTD = sum(total_total)) %>%
select (brand_name,Total_2017_YTD)
#View(total_2017_YTD)
After join all temp dataframe each other , we replaced NA values to 0 in all variable.
Denormalize_data <- total_201709 %>%
full_join(total_201808, by="brand_name") %>%
full_join(total_201809, by="brand_name") %>%
full_join(total_2017_YTD, by="brand_name") %>%
full_join(total_2018_YTD, by="brand_name")
Denormalize_data <- mutate_all(Denormalize_data,funs(replace(., is.na(.), 0)))
#View(Denormalize_data)
We calculate new variables to make strong analyses. Descriptions of these variables are: CM_PM_Total_Rate : change rate between Current month (201809) and previous month (201808) CM_PYM_Total_Rate : change rate between Current month (201809) and current month of previous year (201709) YTD_PYTD_Total_Rate :change rate between first 9 months of this year (201801-201809) and first 9 months of previous year (201701-201709)
Brand_Increase_Rate <- Denormalize_data%>%
mutate(CM_PM_Total_Rate = round((ifelse (Total_201808 == 0 , NA, #(Total_201809 - Total_201808) *100,
(Total_201809 - Total_201808) / Total_201808 *100)),1)) %>%
mutate(CM_PYM_Total_Rate = round((ifelse (Total_201709 == 0 ,NA, #(Total_201809 - Total_201709) *100,
(Total_201809 - Total_201709) / Total_201709 *100)),1)) %>%
mutate(YTD_PYTD_Total_Rate = round((ifelse (Total_2017_YTD == 0 ,(Total_2018_YTD - Total_2017_YTD) *100,
(Total_2018_YTD - Total_2017_YTD) / Total_2017_YTD *100)),1)) %>%
mutate(Percentage_2018_YTD = round((Total_2018_YTD / sum(Total_2018_YTD) *100),2), Percentage_2017_YTD = round((Total_2017_YTD / sum(Total_2017_YTD) *100),2)) %>%
select(brand_name,Total_201809,Total_201808,Total_201709,Total_2018_YTD,Percentage_2018_YTD,Total_2017_YTD,Percentage_2017_YTD,
CM_PM_Total_Rate,CM_PYM_Total_Rate,YTD_PYTD_Total_Rate)%>%
arrange(desc(Percentage_2018_YTD))
head(Brand_Increase_Rate)
## # A tibble: 6 x 11
## brand_name Total_201809 Total_201808 Total_201709 Total_2018_YTD
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 RENAULT 3186 4102 9129 66517
## 2 FORD 2356 3709 7525 50489
## 3 VOLKSWAGEN 2239 3585 0 50435
## 4 FIAT 1677 3007 9693 49745
## 5 HYUNDAI 1535 1301 4148 27639
## 6 TOYOTA 1127 2274 0 23409
## # ... with 6 more variables: Percentage_2018_YTD <dbl>,
## # Total_2017_YTD <dbl>, Percentage_2017_YTD <dbl>,
## # CM_PM_Total_Rate <dbl>, CM_PYM_Total_Rate <dbl>,
## # YTD_PYTD_Total_Rate <dbl>
To recognize the descreasing the sales amount from 2017 to 2018 and to see which brands are sold above the avearage, we formatted table… It’s clear that most of all brands’ selling is decrease from 2017 to 2018. VOLKWAGEN is sold %8.14 of all cars in 2017 and sold %10.9 of all data in 2018.So Volkwagen’s selling perfomance is increasing. Also it’s able to increase total selling in 9 months in 2018 across in 9 months in 2017 (%5,4) Also VOLVO’s performance is very impressive. its selling amount is increased %35.1 percentage in 9 months in 2018.
color <- formatter("span", style = x ~ style(color = ifelse(x > 0, "green",ifelse(x < 0, "red", "black"))))
avg_bold <- formatter("span", style = x ~ style("font-weight" = ifelse(x > mean(x), "bold", NA)))
formattable(Brand_Increase_Rate,
list(Total_2018_YTD = avg_bold,
Total_2017_YTD = avg_bold,
CM_PM_Total_Rate = color,
CM_PYM_Total_Rate = color,
YTD_PYTD_Total_Rate = color))
brand_name | Total_201809 | Total_201808 | Total_201709 | Total_2018_YTD | Percentage_2018_YTD | Total_2017_YTD | Percentage_2017_YTD | CM_PM_Total_Rate | CM_PYM_Total_Rate | YTD_PYTD_Total_Rate |
---|---|---|---|---|---|---|---|---|---|---|
RENAULT | 3186 | 4102 | 9129 | 66517 | 14.37 | 85622 | 14.57 | -22.3 | -65.1 | -22.3 |
FORD | 2356 | 3709 | 7525 | 50489 | 10.91 | 71918 | 12.23 | -36.5 | -68.7 | -29.8 |
VOLKSWAGEN | 2239 | 3585 | 0 | 50435 | 10.90 | 47841 | 8.14 | -37.5 | NA | 5.4 |
FIAT | 1677 | 3007 | 9693 | 49745 | 10.75 | 78785 | 13.40 | -44.2 | -82.7 | -36.9 |
HYUNDAI | 1535 | 1301 | 4148 | 27639 | 5.97 | 35704 | 6.07 | 18.0 | -63.0 | -22.6 |
TOYOTA | 1127 | 2274 | 0 | 23409 | 5.06 | 19026 | 3.24 | -50.4 | NA | 23.0 |
DACIA | 1460 | 1687 | 3371 | 22079 | 4.77 | 32134 | 5.47 | -13.5 | -56.7 | -31.3 |
PEUGEOT | 717 | 1629 | 2621 | 21991 | 4.75 | 25438 | 4.33 | -56.0 | -72.6 | -13.6 |
HONDA | 1973 | 1913 | 2059 | 20388 | 4.40 | 18274 | 3.11 | 3.1 | -4.2 | 11.6 |
MERCEDES-BENZ | 1163 | 1427 | 3053 | 18577 | 4.01 | 23669 | 4.03 | -18.5 | -61.9 | -21.5 |
OPEL | 857 | 1688 | 2966 | 16944 | 3.66 | 29685 | 5.05 | -49.2 | -71.1 | -42.9 |
NISSAN | 1217 | 1381 | 1754 | 15881 | 3.43 | 20695 | 3.52 | -11.9 | -30.6 | -23.3 |
SKODA | 618 | 1098 | 2344 | 15773 | 3.41 | 15629 | 2.66 | -43.7 | -73.6 | 0.9 |
CITROEN | 331 | 693 | 1913 | 9782 | 2.11 | 18092 | 3.08 | -52.2 | -82.7 | -45.9 |
BMW | 158 | 1005 | 1487 | 9704 | 2.10 | 11592 | 1.97 | -84.3 | -89.4 | -16.3 |
AUDI | 350 | 737 | 1352 | 9448 | 2.04 | 12350 | 2.10 | -52.5 | -74.1 | -23.5 |
SEAT | 285 | 862 | 763 | 8540 | 1.84 | 11514 | 1.96 | -66.9 | -62.6 | -25.8 |
KIA | 311 | 500 | 1093 | 7199 | 1.56 | 10051 | 1.71 | -37.8 | -71.5 | -28.4 |
VOLVO | 427 | 375 | 352 | 3623 | 0.78 | 2681 | 0.46 | 13.9 | 21.3 | 35.1 |
MITSUBISHI | 213 | 400 | 294 | 3132 | 0.68 | 2585 | 0.44 | -46.8 | -27.6 | 21.2 |
SUZUKI | 70 | 229 | 431 | 1841 | 0.40 | 2643 | 0.45 | -69.4 | -83.8 | -30.3 |
ISUZU | 76 | 67 | 187 | 1551 | 0.34 | 1563 | 0.27 | 13.4 | -59.4 | -0.8 |
JEEP | 90 | 91 | 238 | 1491 | 0.32 | 1538 | 0.26 | -1.1 | -62.2 | -3.1 |
IVECO | 110 | 82 | 229 | 1146 | 0.25 | 1717 | 0.29 | 34.1 | -52.0 | -33.3 |
SUBARU | 89 | 77 | 118 | 1042 | 0.23 | 1029 | 0.18 | 15.6 | -24.6 | 1.3 |
LAND ROVER | 45 | 22 | 186 | 859 | 0.19 | 1254 | 0.21 | 104.5 | -75.8 | -31.5 |
MAZDA | 110 | 52 | 91 | 811 | 0.18 | 967 | 0.16 | 111.5 | 20.9 | -16.1 |
MINI | 45 | 70 | 137 | 804 | 0.17 | 887 | 0.15 | -35.7 | -67.2 | -9.4 |
KARSAN | 109 | 80 | 114 | 702 | 0.15 | 1023 | 0.17 | 36.2 | -4.4 | -31.4 |
PORSCHE | 32 | 75 | 47 | 470 | 0.10 | 482 | 0.08 | -57.3 | -31.9 | -2.5 |
SSANGYONG | 4 | 34 | 46 | 279 | 0.06 | 506 | 0.09 | -88.2 | -91.3 | -44.9 |
DS | 9 | 64 | 21 | 202 | 0.04 | 89 | 0.02 | -85.9 | -57.1 | 127.0 |
ALFA ROMEO | 13 | 15 | 15 | 141 | 0.03 | 246 | 0.04 | -13.3 | -13.3 | -42.7 |
JAGUAR | 17 | 8 | 34 | 125 | 0.03 | 254 | 0.04 | 112.5 | -50.0 | -50.8 |
LEXUS | 2 | 1 | 21 | 39 | 0.01 | 73 | 0.01 | 100.0 | -90.5 | -46.6 |
MASERATI | 2 | 2 | 6 | 34 | 0.01 | 44 | 0.01 | 0.0 | -66.7 | -22.7 |
ASTON MARTIN | 2 | 1 | 0 | 10 | 0.00 | 15 | 0.00 | 100.0 | NA | -33.3 |
BENTLEY | 0 | 1 | 3 | 6 | 0.00 | 11 | 0.00 | -100.0 | -100.0 | -45.5 |
CHERY | 0 | 0 | 0 | 0 | 0.00 | 0 | 0.00 | NA | NA | 0.0 |
FERRARI | 3 | 2 | 0 | 16 | 0.00 | 9 | 0.00 | 50.0 | NA | 77.8 |
GEELY | 0 | 0 | 0 | 0 | 0.00 | 0 | 0.00 | NA | NA | 0.0 |
INFINITI | 0 | 0 | 6 | 16 | 0.00 | 78 | 0.01 | NA | -100.0 | -79.5 |
LAMBORGHINI | 0 | 0 | 0 | 2 | 0.00 | 4 | 0.00 | NA | NA | -50.0 |
OTOKAR | 0 | 0 | 0 | 0 | 0.00 | 0 | 0.00 | NA | NA | 0.0 |
SMART | 0 | 0 | 2 | 22 | 0.00 | 102 | 0.02 | NA | -100.0 | -78.4 |
trend_analyses <- Cleaned_data %>%
filter (brand_name %in% c("RENAULT","FORD","VOLKSWAGEN","FIAT","HYUNDAI") & year == 2018)%>%
select(brand_name,month,total_total)
#View(trend_analyses)
ggplot(trend_analyses,aes(x=month,y=total_total ,color = brand_name)) +geom_line()
brand_analyses <- Cleaned_data %>%
filter (brand_name %in% c("RENAULT","FORD","VOLKSWAGEN","FIAT","HYUNDAI") & year== 2018)%>%
group_by(brand_name) %>%
summarize(total_by_brand = sum(total_total))%>%
select(brand_name,total_by_brand)
#View(brand_analyses)
ggplot(brand_analyses,aes(x=brand_name,y=total_by_brand)) +geom_col()