Busra Koc 31/10/18

From Raw to Civilized Data

First I find the data on Otomotiv Distibutorleri Dernegi website. I am interested in July 2018 sales. I download the data change the name to odd_retail_sales_2018_07.xlsx . I will make a reproducible example of data analysis from the raw data located somewhere to the final analysis.

Downloading Raw Data

Before I start, I downloaded the tidyverse package.

# Before I start, I downloaded the tidyverse package.  
library(tidyverse)
# Create a temporary file
tmp<-tempfile(fileext="odd_retail_sales_2018_07.xlsx")

# Download file from repository to the temp file. In addition, to avoid getting error when loading URL, I added "mode = 'wb' to the code.
download.file("https://github.com/MEF-BDA503/pj18-busraakoc/blob/master/odd_retail_sales_2018_07.xlsx?raw=true",mode = 'wb',destfile=tmp)

# Read that excel file using readxl package's read_excel function. 
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)

# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows).
raw_data <- raw_data %>% slice(-c(43,44))

# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    14    14    NA    NA     0     0    14    14
## 2 ASTON MARTIN    NA     1     1    NA    NA     0     0     1     1
## 3 AUDI            NA   840   840    NA    NA     0     0   840   840
## 4 BENTLEY         NA    NA     0    NA    NA     0     0     0     0
## 5 BMW             NA   760   760    NA    NA     0     0   760   760
## 6 CITROEN         NA   922   922    NA   470   470     0  1392  1392

Making Data Civilized

In order to make the data standardized and workable we need to define column names and remove NA values for this example.

# Use the same column names in data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")

# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_jul_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=7)

print(car_data_jul_18,width=Inf)
## # A tibble: 42 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       14         14        0        0          0
##  2 ASTON MARTIN        0        1          1        0        0          0
##  3 AUDI                0      840        840        0        0          0
##  4 BENTLEY             0        0          0        0        0          0
##  5 BMW                 0      760        760        0        0          0
##  6 CITROEN             0      922        922        0      470        470
##  7 DACIA               0     2343       2343        0      259        259
##  8 DS                  0       10         10        0        0          0
##  9 FERRARI             0        2          2        0        0          0
## 10 FIAT             3720      112       3832     1787      248       2035
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        14          14  2018     7
##  2         0         1           1  2018     7
##  3         0       840         840  2018     7
##  4         0         0           0  2018     7
##  5         0       760         760  2018     7
##  6         0      1392        1392  2018     7
##  7         0      2602        2602  2018     7
##  8         0        10          10  2018     7
##  9         0         2           2  2018     7
## 10      5507       360        5867  2018     7
## # ... with 32 more rows

Saving Civilized Data

saveRDS(car_data_jul_18,file="~/Desktop/odd_car_sales_data_jul_18.rds")

Finishing With Some Analysis

I wanted to see a list of total sales of brands with domestic and imported vehicle sales ordered in decreasing total sales.

car_data_jul_18 %>% 
  select(brand_name,total_dom,total_imp,total_total) %>%
  filter(total_dom > 0 & total_imp > 0) %>%
  arrange(desc(total_total))
## # A tibble: 8 x 4
##   brand_name total_dom total_imp total_total
##   <chr>          <dbl>     <dbl>       <dbl>
## 1 RENAULT         4774      2371        7145
## 2 FIAT            5507       360        5867
## 3 FORD            3643      2200        5843
## 4 TOYOTA          2827       321        3148
## 5 HYUNDAI         1107      1507        2614
## 6 HONDA           2041       402        2443
## 7 MITSUBISHI        12       317         329
## 8 ISUZU             68        55         123