In the 2nd week; I have done analysis on ODD Car Sales Data for one month. In This week, sales data for 2016,2017 and 2018 for all months is gathered together. The analysis is done for a bigger data.
Firstly, I call the libraries that we need, download the .rds file from github and do some cleaning on the data.
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 3.0.0 <U+221A> purrr 0.2.5
## <U+221A> tibble 1.4.2 <U+221A> dplyr 0.7.6
## <U+221A> tidyr 0.8.1 <U+221A> stringr 1.3.1
## <U+221A> readr 1.1.1 <U+221A> forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
library(dplyr)
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
tmp<-tempfile(fileext=".rds")
download.file("https://github.com/MEF-BDA503/mef-bda503.github.io/blob/master/files/car_data_aggregate.rds?raw=true",destfile=tmp,mode = 'wb')
raw_data<-read_rds(tmp)
file.remove(tmp)
## [1] TRUE
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total","year","month")
car_data <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.)))
car_data <- car_data %>%
filter(!(year==2017 & month==2 & total_dom==0 & total_imp==0 & total_total==0) ) %>%
filter(brand_name != "TOPLAM:")
car_data <- car_data %>% mutate(day=1)
car_data <- car_data %>% mutate(date=paste(year, month, day, sep="-")) %>% mutate(date= ymd(date))
head(raw_data)
## # A tibble: 6 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MAR~ 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## # total_total <dbl>, year <dbl>, month <dbl>
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total","year","month")
all_car_data <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.)))
all_car_data <- all_car_data %>%
filter(!(year==2017 & month==2 & total_dom==0 & total_imp==0 & total_total==0) ) %>%
filter(brand_name != "TOPLAM:")
all_car_data <- all_car_data %>% mutate(day=1)
all_car_data <- all_car_data %>% mutate(date=paste(year, month, day, sep="-")) %>% mutate(date= ymd(date))
I choose 6 brands from the dataset and explore the sales trend in months for each year in the dataset.
comm_veh=all_car_data%>%filter(comm_total>0,brand_name %in% c("RENAULT", "FIAT", "FORD", "VOLKSWAGEN","ISUZU","IVECO"),month)%>%select(brand_name,comm_total,year,month)
View(comm_veh)
As it is clearly seen from the visuals, sales trends on the commercial vehicles get peak amount on the last months of the year. Contraversially In 2018, Sales numbers on the same months are strictly decreasing.