Examine The Full Car Sales Data

Data Structure

Let’s have a quick look to the data structure.

tmp<-tempfile(fileext=".rds")
download.file("https://github.com/MEF-BDA503/pj18-yildizmust/blob/master/car_data_aggregate.rds?raw=true",destfile=tmp,mode='wb')
car_data_aggregate <- readRDS(tmp)
cda=car_data_aggregate
str(cda)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1490 obs. of  12 variables:
##  $ brand_name : chr  "ALFA ROMEO" "ASTON MARTIN" "AUDI" "BENTLEY" ...
##  $ auto_dom   : num  0 0 0 0 0 0 0 0 0 632 ...
##  $ auto_imp   : num  13 2 350 0 158 ...
##  $ auto_total : num  13 2 350 0 158 ...
##  $ comm_dom   : num  0 0 0 0 0 0 0 0 0 789 ...
##  $ comm_imp   : num  0 0 0 0 0 197 319 0 0 199 ...
##  $ comm_total : num  0 0 0 0 0 197 319 0 0 988 ...
##  $ total_dom  : num  0 0 0 0 0 ...
##  $ total_imp  : num  13 2 350 0 158 331 1460 9 3 256 ...
##  $ total_total: num  13 2 350 0 158 ...
##  $ year       : num  2018 2018 2018 2018 2018 ...
##  $ month      : num  9 9 9 9 9 9 9 9 9 9 ...
head(cda)
##     brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 1   ALFA ROMEO        0       13         13        0        0          0
## 2 ASTON MARTIN        0        2          2        0        0          0
## 3         AUDI        0      350        350        0        0          0
## 4      BENTLEY        0        0          0        0        0          0
## 5          BMW        0      158        158        0        0          0
## 6      CITROEN        0      134        134        0      197        197
##   total_dom total_imp total_total year month
## 1         0        13          13 2018     9
## 2         0         2           2 2018     9
## 3         0       350         350 2018     9
## 4         0         0           0 2018     9
## 5         0       158         158 2018     9
## 6         0       331         331 2018     9
tail(cda)
##      brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 1485  SSANGYONG        0       19         19        0        3          3
## 1486       TATA        0        0          0        0        9          9
## 1487     TOYOTA     1298      149       1447        0       34         34
## 1488 VOLKSWAGEN        0     2792       2792        0     1736       1736
## 1489      VOLVO        0      187        187        0        0          0
## 1490    TOPLAM:     7375    15983      23358     4815     4540       9355
##      total_dom total_imp total_total year month
## 1485         0        22          22 2016     1
## 1486         0         9           9 2016     1
## 1487      1298       183        1481 2016     1
## 1488         0      4528        4528 2016     1
## 1489         0       187         187 2016     1
## 1490     12190     20523       32713 2016     1

Data Cleaning

There are some rows that have “TOPLAM:”, we should remove these rows. Secondly, assume that zero value total-total cells are not needed, can not be observd, so we should also remove these total_total rows that have zero values.

cda<-cda[!(cda$brand_name=="TOPLAM:"),]
cda<-cda[!(cda$total_total==0),]

Max Sales

Let’s find the max total sales of cars in the selected period.

head(cda[order(cda$total_total,decreasing = TRUE),],10)
##      brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 957     RENAULT    11492     6998      18490        0     2588       2588
## 419     RENAULT    11420     4820      16240        0     2814       2814
## 394        FIAT     7947      401       8348     8821     1023       9844
## 966  VOLKSWAGEN        0    14539      14539        0     3543       3543
## 395        FORD      706     3840       4546    11642      676      12318
## 930        FIAT     7522      422       7944     7680     1074       8754
## 427  VOLKSWAGEN        0    13003      13003        0     3290       3290
## 931        FORD     1938     3263       5201     9894      685      10579
## 1014 VOLKSWAGEN        0    10587      10587        0     3772       3772
## 464     RENAULT     7692     4283      11975        0     2009       2009
##      total_dom total_imp total_total year month
## 957      11492      9586       21078 2016    12
## 419      11420      7634       19054 2017    12
## 394      16768      1424       18192 2017    12
## 966          0     18082       18082 2016    12
## 395      12348      4516       16864 2017    12
## 930      15202      1496       16698 2016    12
## 427          0     16293       16293 2017    12
## 931      11832      3948       15780 2016    12
## 1014         0     14359       14359 2016    11
## 464       7692      6292       13984 2017    11
head(cda[order(cda$total_total,decreasing = FALSE),],10)
##       brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 44  ASTON MARTIN        0        1          1        0        0          0
## 46       BENTLEY        0        1          1        0        0          0
## 65         LEXUS        0        1          1        0        0          0
## 86  ASTON MARTIN        0        1          1        0        0          0
## 128 ASTON MARTIN        0        1          1        0        0          0
## 130      BENTLEY        0        1          1        0        0          0
## 135      FERRARI        0        1          1        0        0          0
## 171 ASTON MARTIN        0        1          1        0        0          0
## 173      BENTLEY        0        1          1        0        0          0
## 178      FERRARI        0        1          1        0        0          0
##     total_dom total_imp total_total year month
## 44          0         1           1 2018     8
## 46          0         1           1 2018     8
## 65          0         1           1 2018     8
## 86          0         1           1 2018     7
## 128         0         1           1 2018     6
## 130         0         1           1 2018     6
## 135         0         1           1 2018     6
## 171         0         1           1 2018     5
## 173         0         1           1 2018     5
## 178         0         1           1 2018     5

At the first ten max sales rows, Renault and Wolkswagen have three rows each, and fiat and ford have two rows each.

Is There any Seasonality in the Car Sales Data

Let’s calculate the average total car sales for each month of selected years.

for (i in 1:12) as.numeric(print.table(mean(cda$total_total[cda$month == i])))
## [1] 893.7565
## [1] 1241.5
## [1] 1972.449
## [1] 1776.298
## [1] 1917.742
## [1] 1834.886
## [1] 1613.033
## [1] 1474.694
## [1] 1297.43
## [1] 2063.688
## [1] 2755.16
## [1] 3392.098

There is a significant seasonality at the last months (November, December) of years.

Plot the Total Car Sales

By using ggplot2, plot a column diagram.Firstly, convert last to columns($year and $month) into date format.

cda$date1=paste0(cda$month,"/",1,"/",cda$year)
cda$date1=as.Date(cda$date1, format='%m/%d/%Y')

Then plot the Total Car Sales graph.

library(ggplot2)
g1= ggplot(data = cda) +
  geom_col(aes(x = date1, y = cda$total_total), 
            color = "#09557f",
            alpha = 0.6,
            size = 0.6) +
  labs(x = "Date", 
       y = "Total Sales",
       title = "Car Sales") +
  theme_minimal()
g1