Loading Data

library(tidyverse)

## -- Attaching packages --------------------- tidyverse 1.2.1 --

## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.8
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0

## -- Conflicts ------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(ggplot2)

data <- readRDS("car_data_aggregate.rds")

A Summary of Data

summary(data)

##   brand_name           auto_dom        auto_imp       auto_total   
##  Length:1259        Min.   :    0   Min.   :    0   Min.   :    0  
##  Class :character   1st Qu.:    0   1st Qu.:    3   1st Qu.:    3  
##  Mode  :character   Median :    0   Median :  116   Median :  127  
##                     Mean   :  451   Mean   : 1080   Mean   : 1531  
##                     3rd Qu.:    0   3rd Qu.: 1344   3rd Qu.: 1928  
##                     Max.   :20665   Max.   :48030   Max.   :65799  
##     comm_dom          comm_imp        comm_total        total_dom      
##  Min.   :    0.0   Min.   :   0.0   Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:    0.0   1st Qu.:   0.0   1st Qu.:    0.0   1st Qu.:    0.0  
##  Median :    0.0   Median :   0.0   Median :    0.0   Median :    0.0  
##  Mean   :  248.6   Mean   : 219.0   Mean   :  467.6   Mean   :  699.7  
##  3rd Qu.:    0.0   3rd Qu.: 193.5   3rd Qu.:  220.0   3rd Qu.:    0.0  
##  Max.   :11642.0   Max.   :9848.0   Max.   :19623.0   Max.   :30440.0  
##    total_imp      total_total         year          month       
##  Min.   :    0   Min.   :    0   Min.   :2016   Min.   : 1.000  
##  1st Qu.:    8   1st Qu.:   13   1st Qu.:2016   1st Qu.: 4.000  
##  Median :  171   Median :  188   Median :2017   Median : 7.000  
##  Mean   : 1299   Mean   : 1998   Mean   :2017   Mean   : 6.441  
##  3rd Qu.: 1566   3rd Qu.: 2182   3rd Qu.:2018   3rd Qu.: 9.000  
##  Max.   :57355   Max.   :85422   Max.   :2018   Max.   :12.000

About Data

If we look at the data, there are no missing values in it.Thus, we can start to analyze this dataset. This dataset is merge of different months of car data. First I should ask questions about data and find the answers about it. Gradually, the dataset bring some clarity about my problems. Step by step, I will be investigating this dataset and conclude some cases about it.

Question 1

By looking total values, which brands are the most valuable? To search this, I should use arrange function to look at the most valuable brands

data %>%
  group_by(brand_name)%>%
  summarize(avgTotal = mean(auto_total)) %>%
  select(brand_name,avgTotal) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 3500) %>%
  ggplot(data = ., aes(x = brand_name, y = avgTotal, 
    fill = brand_name)) + geom_bar(stat = "identity")

By looking into auto_total value, FIAT,HYUNDAI,RENAULT,OPEL and VOLKSWAGEN is the most valuble brands.

if we look for commercial total:

data %>%
  group_by(brand_name)%>%
  summarize(avgTotal = mean(comm_total)) %>%
  select(brand_name,avgTotal) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 1000) %>%
  ggplot(data = ., aes(x = brand_name, y = avgTotal, 
    fill = brand_name)) + geom_bar(stat = "identity")

Here, we see that FIAT,FORD;RENAULT and VOLKSWAGEN are the highest brands which commercing in total.

Lastly, we should look at total numbers of car business.If we collect data as total:

data %>%
  group_by(brand_name)%>%
  summarize(avgTotal = mean(total_total)) %>%
  select(brand_name,avgTotal) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 4000) %>%
  ggplot(data = ., aes(x = brand_name, y = avgTotal, 
    fill = brand_name)) + geom_bar(stat = "identity")

So here the which brands have made most total purchases. You can see the FIAT,FORD,HYNDAI,RENAULT and VOLKSWAGEN have made the most valuable trade.

Question 2

We see that which companies have made the most valuble trades but also we should seek the improvement of brands. For this, we must look at the difference between two years and see which brands have most improvements.

First, we must look auto_total changes by year:

data %>%
  group_by(brand_name,year)%>%
  summarize(avgTotal = mean(auto_total)) %>%
  select(brand_name,avgTotal,year) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 3000) %>%
  ggplot(data = ., aes(x = year, y = avgTotal, 
    color = brand_name)) + geom_line()

Respectively, we can look the total commercial changes:

data %>%
  group_by(brand_name,year)%>%
  summarize(avgTotal = mean(comm_total)) %>%
  select(brand_name,avgTotal,year) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 2000) %>%
  ggplot(data = ., aes(x = year, y = avgTotal, 
    color = brand_name)) + geom_line()

Lastly, we can look at the total values. How have they changed? To investigate this, lets look at the total changes

data %>%
  group_by(brand_name,year)%>%
  summarize(avgTotal = mean(total_total)) %>%
  select(brand_name,avgTotal,year) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 3000) %>%
  ggplot(data = ., aes(x = year, y = avgTotal, 
    color = brand_name)) + geom_line()

All the total values of different variables falls after 2017. So, we can say, there is economic affect on brands at beginning of 2017.

Question 3

What are the changes of monthly values of most valuable brands? We should investigate it by looking in the chart

data %>%
  group_by(brand_name,year,month)%>%
  summarize(avgTotal = mean(total_total)) %>%
  select(brand_name,avgTotal,month) %>%
  arrange(desc(avgTotal)) %>%
  filter(avgTotal > 3000) %>%
  ggplot(data = ., aes(x = month, y = avgTotal, 
    color = brand_name)) + geom_line()

## Adding missing grouping variables: `year`