Data Structure
Let’s have a quick look to the data structure.
tmp<-tempfile(fileext=".rds")
download.file("https://github.com/MEF-BDA503/pj18-yildizmust/blob/master/car_data_aggregate.rds?raw=true",destfile=tmp,mode='wb')
car_data_aggregate <- readRDS(tmp)
cda=car_data_aggregate
str(cda)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1490 obs. of 12 variables:
## $ brand_name : chr "ALFA ROMEO" "ASTON MARTIN" "AUDI" "BENTLEY" ...
## $ auto_dom : num 0 0 0 0 0 0 0 0 0 632 ...
## $ auto_imp : num 13 2 350 0 158 ...
## $ auto_total : num 13 2 350 0 158 ...
## $ comm_dom : num 0 0 0 0 0 0 0 0 0 789 ...
## $ comm_imp : num 0 0 0 0 0 197 319 0 0 199 ...
## $ comm_total : num 0 0 0 0 0 197 319 0 0 988 ...
## $ total_dom : num 0 0 0 0 0 ...
## $ total_imp : num 13 2 350 0 158 331 1460 9 3 256 ...
## $ total_total: num 13 2 350 0 158 ...
## $ year : num 2018 2018 2018 2018 2018 ...
## $ month : num 9 9 9 9 9 9 9 9 9 9 ...
head(cda)
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MARTIN 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## total_dom total_imp total_total year month
## 1 0 13 13 2018 9
## 2 0 2 2 2018 9
## 3 0 350 350 2018 9
## 4 0 0 0 2018 9
## 5 0 158 158 2018 9
## 6 0 331 331 2018 9
tail(cda)
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 1485 SSANGYONG 0 19 19 0 3 3
## 1486 TATA 0 0 0 0 9 9
## 1487 TOYOTA 1298 149 1447 0 34 34
## 1488 VOLKSWAGEN 0 2792 2792 0 1736 1736
## 1489 VOLVO 0 187 187 0 0 0
## 1490 TOPLAM: 7375 15983 23358 4815 4540 9355
## total_dom total_imp total_total year month
## 1485 0 22 22 2016 1
## 1486 0 9 9 2016 1
## 1487 1298 183 1481 2016 1
## 1488 0 4528 4528 2016 1
## 1489 0 187 187 2016 1
## 1490 12190 20523 32713 2016 1
Data Cleaning
There are some rows that have “TOPLAM:”, we should remove these rows. Secondly, assume that zero value total-total cells are not needed, can not be observd, so we should also remove these total_total rows that have zero values.
cda<-cda[!(cda$brand_name=="TOPLAM:"),]
cda<-cda[!(cda$total_total==0),]
Max Sales
Let’s find the max total sales of cars in the selected period.
head(cda[order(cda$total_total,decreasing = TRUE),],10)
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 957 RENAULT 11492 6998 18490 0 2588 2588
## 419 RENAULT 11420 4820 16240 0 2814 2814
## 394 FIAT 7947 401 8348 8821 1023 9844
## 966 VOLKSWAGEN 0 14539 14539 0 3543 3543
## 395 FORD 706 3840 4546 11642 676 12318
## 930 FIAT 7522 422 7944 7680 1074 8754
## 427 VOLKSWAGEN 0 13003 13003 0 3290 3290
## 931 FORD 1938 3263 5201 9894 685 10579
## 1014 VOLKSWAGEN 0 10587 10587 0 3772 3772
## 464 RENAULT 7692 4283 11975 0 2009 2009
## total_dom total_imp total_total year month
## 957 11492 9586 21078 2016 12
## 419 11420 7634 19054 2017 12
## 394 16768 1424 18192 2017 12
## 966 0 18082 18082 2016 12
## 395 12348 4516 16864 2017 12
## 930 15202 1496 16698 2016 12
## 427 0 16293 16293 2017 12
## 931 11832 3948 15780 2016 12
## 1014 0 14359 14359 2016 11
## 464 7692 6292 13984 2017 11
head(cda[order(cda$total_total,decreasing = FALSE),],10)
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 44 ASTON MARTIN 0 1 1 0 0 0
## 46 BENTLEY 0 1 1 0 0 0
## 65 LEXUS 0 1 1 0 0 0
## 86 ASTON MARTIN 0 1 1 0 0 0
## 128 ASTON MARTIN 0 1 1 0 0 0
## 130 BENTLEY 0 1 1 0 0 0
## 135 FERRARI 0 1 1 0 0 0
## 171 ASTON MARTIN 0 1 1 0 0 0
## 173 BENTLEY 0 1 1 0 0 0
## 178 FERRARI 0 1 1 0 0 0
## total_dom total_imp total_total year month
## 44 0 1 1 2018 8
## 46 0 1 1 2018 8
## 65 0 1 1 2018 8
## 86 0 1 1 2018 7
## 128 0 1 1 2018 6
## 130 0 1 1 2018 6
## 135 0 1 1 2018 6
## 171 0 1 1 2018 5
## 173 0 1 1 2018 5
## 178 0 1 1 2018 5
At the first ten max sales rows, Renault and Wolkswagen have three rows each, and fiat and ford have two rows each.
Is There any Seasonality in the Car Sales Data
Let’s calculate the average total car sales for each month of selected years.
for (i in 1:12) as.numeric(print.table(mean(cda$total_total[cda$month == i])))
## [1] 893.7565
## [1] 1241.5
## [1] 1972.449
## [1] 1776.298
## [1] 1917.742
## [1] 1834.886
## [1] 1613.033
## [1] 1474.694
## [1] 1297.43
## [1] 2063.688
## [1] 2755.16
## [1] 3392.098
There is a significant seasonality at the last months (November, December) of years.
Plot the Total Car Sales
By using ggplot2, plot a column diagram.Firstly, convert last to columns($year and $month) into date format.
cda$date1=paste0(cda$month,"/",1,"/",cda$year)
cda$date1=as.Date(cda$date1, format='%m/%d/%Y')
Then plot the Total Car Sales graph.
library(ggplot2)
g1= ggplot(data = cda) +
geom_col(aes(x = date1, y = cda$total_total),
color = "#09557f",
alpha = 0.6,
size = 0.6) +
labs(x = "Date",
y = "Total Sales",
title = "Car Sales") +
theme_minimal()
g1