Week-2 -2018 April ODD Data Analysis

ODD 2018 April Data Analaysis

In week-2 home work we had ODD sales data in Excel. After removing header and footer columns in excel, you can load the data in your R worksapce via the code below

#install.packages("tidyverse", repos = "https://cran.r-project.org") 
##if installed it is not essential
# Load the package to the session
library(tidyverse)

## -- Attaching packages ------------------------------------------------------------ tidyverse 1.2.1 --

## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0

## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)
# Download file from repository to the temp file
# Remove the temp file
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/pj18-omerbayir/blob/master/week2/201804.xlsx?raw=true",destfile=tmp,mode = 'wb')
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
file.remove(tmp)

## [1] TRUE

## my excel file name is 201804.xlsx
## raw_data <- read_excel("201804.xlsx", col_names = FALSE) 
## you can do the same thing  with this code but also you should check your working directory

# Remove the last two rows because they are irrelevant (total and empty rows)
# Let's see our raw data
head(raw_data)

## # A tibble: 6 x 10
##   X__1     X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 DS         NA    20    20    NA    NA     0     0    20    20
## 2 FERRARI    NA     3     3    NA    NA     0     0     3     3
## 3 FIAT     4175   130  4305  3183   249  3432  7358   379  7737
## 4 FORD      280  2506  2786  4301   247  4548  4581  2753  7334
## 5 HONDA    1881   646  2527    NA    NA     0  1881   646  2527
## 6 HYUNDAI  1631  2555  4186    NA   215   215  1631  2770  4401

We required to name our data columns with standardised names. Also if we have NA cells we replaced with them 0 or ‘.’ . My RDS file is ready and waiting to be merged by merger classmate. As seen below :

colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")

car_data_april_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=4)

saveRDS(car_data_april_18,file="odd_car_sales_data_april_18.rds")

You can get the R data file from here

After loading our data frame I’ve made two analysis.

car_data_april_18 %>%
  tbl_df()

## # A tibble: 35 x 12
##    brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 DS                0       20         20        0        0          0
##  2 FERRARI           0        3          3        0        0          0
##  3 FIAT           4175      130       4305     3183      249       3432
##  4 FORD            280     2506       2786     4301      247       4548
##  5 HONDA          1881      646       2527        0        0          0
##  6 HYUNDAI        1631     2555       4186        0      215        215
##  7 INFINITI          0        2          2        0        0          0
##  8 ISUZU             0        0          0      161       19        180
##  9 IVECO             0        0          0        0      146        146
## 10 JAGUAR            0       18         18        0        0          0
## # ... with 25 more rows, and 5 more variables: total_dom <dbl>,
## #   total_imp <dbl>, total_total <dbl>, year <dbl>, month <dbl>

## First Analysis
car_data_april_18 %>% select(year, month, brand_name, total_total) %>% 
  mutate(all_Total=sum(total_total),percentage=total_total/sum(total_total)*100 ) %>%  
    filter(total_total==min(total_total) | total_total==max(total_total))

## # A tibble: 2 x 6
##    year month brand_name  total_total all_Total percentage
##   <dbl> <dbl> <chr>             <dbl>     <dbl>      <dbl>
## 1  2018     4 LAMBORGHINI           1     63641    0.00157
## 2  2018     4 RENAULT           11161     63641   17.5

In the first analysis you can see the best performed brand and the worst brands. Their percentage of total sold car in April 2018 can be shown.

## Second Analysis
car_data_april_18 %>% select(year, month, brand_name, total_total) %>% 
  mutate(percentage=total_total/sum(total_total)*100 ) %>%  
  arrange(desc(percentage)) %>% slice(1:10)

## # A tibble: 10 x 5
##     year month brand_name    total_total percentage
##    <dbl> <dbl> <chr>               <dbl>      <dbl>
##  1  2018     4 RENAULT             11161      17.5 
##  2  2018     4 VOLKSWAGEN           7790      12.2 
##  3  2018     4 FIAT                 7737      12.2 
##  4  2018     4 FORD                 7334      11.5 
##  5  2018     4 HYUNDAI              4401       6.92
##  6  2018     4 OPEL                 3476       5.46
##  7  2018     4 PEUGEOT              3392       5.33
##  8  2018     4 TOYOTA               3230       5.08
##  9  2018     4 MERCEDES-BENZ        2702       4.25
## 10  2018     4 HONDA                2527       3.97

Second analysis shows us total sales brand by brand and their percentage of total sales. The analysis shows only the top ten performer brands.

Week-2 -2018 April ODD Data Analysis

Ömer Bayır

29 Ekim 2018

ODD 2018 April Data Analaysis