ODD Data Analaysis-II

In week-2 home work we had ODD sales data in Excel for one month. We have copmltede data now. Codes below makes file operations and excludes unsuitable rows from data. Also makes new date column.

library(tidyverse)
## -- Attaching packages ---------------------------------------------------------- tidyverse 1.2.1 --
## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0
## -- Conflicts ------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(lubridate
        )
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
tmp<-tempfile(fileext=".rds")

download.file("https://github.com/MEF-BDA503/mef-bda503.github.io/blob/master/files/car_data_aggregate.rds?raw=true",destfile=tmp,mode = 'wb')
raw_data<-read_rds(tmp)
file.remove(tmp)
## [1] TRUE
head(raw_data)
## # A tibble: 6 x 12
##   brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##   <chr>         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
## 1 ALFA ROMEO        0       13         13        0        0          0
## 2 ASTON MAR~        0        2          2        0        0          0
## 3 AUDI              0      350        350        0        0          0
## 4 BENTLEY           0        0          0        0        0          0
## 5 BMW               0      158        158        0        0          0
## 6 CITROEN           0      134        134        0      197        197
## # ... with 5 more variables: total_dom <dbl>, total_imp <dbl>,
## #   total_total <dbl>, year <dbl>, month <dbl>
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total","year","month")
car_data <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) 

car_data <- car_data  %>% 
  filter(!(year==2017 & month==2 & total_dom==0 & total_imp==0 & total_total==0) ) %>%
  filter(brand_name != "TOPLAM:")
car_data <- car_data %>% mutate(day=1)
car_data <- car_data %>% mutate(date=paste(year, month, day, sep="-")) %>% mutate(date= ymd(date))  

I want to explore total sales trend for the best twelve brand. For that reason, I need best twelve brand in total sales.

cardataSummaryOrder <- car_data %>% 
  group_by(brand_name) %>% 
  summarise(total_total=sum(total_total)) %>%
  arrange(desc(total_total)) %>%
  slice(1:12)

If we have total sales values per date and brand, we can check which brands in top 12. Below codes makes aggregation on data and outputs best 12 brands and their monthly total data.

cardataSummaryBrand <- car_data %>% 
  group_by(brand_name, date) %>% 
  summarise(total_total=sum(total_total)) 

cardatasummaryBrandforP <- semi_join(cardataSummaryBrand, cardataSummaryOrder,by="brand_name")

The visualization below shows us, genaral total sales trend for best performed brands. Trend lines tells us, all brands increase or decreases their sales at the same time. We couldn’t see any increase or decrease that only one brand has this fluctuation, generally brands sales moved to the same way simultaneously. There isn’t a brand with a distinguishing performance.

In the second analysis year to date total sales have been analyzed. Changes in YTD total sales visualized with geom_point chart. Colors shows increase/decrease percentage of YTD sales.

cardataSummaryYTDTotal <- car_data %>% 
  group_by(brand_name, year) %>% 
  summarise(ytd_total=sum(total_total)) %>%
  arrange(brand_name,year) %>%
  transmute(year, ytd_total, prev_YTD = lag(ytd_total), ytdChange=((ytd_total-prev_YTD)/ytd_total))