Assignment

From Raw to Civilized Data First we find the data on Otomotiv Distibütörleri Derneği website. We are interested in April 2016 sales. We download the data change the name to odd_retail_sales_2016_04.xlsx . We will make a reproducible example of data analysis from the raw data located somewhere to the final analysis. Download Raw Data Our raw excel file is in our repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.

tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/pj18-yildizmust/blob/master/odd_retail_sales_2016_04.xlsx?raw=true",destfile=tmp,mode='wb')
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)

## [1] TRUE

library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------ tidyverse 1.2.1 --

## <U+221A> ggplot2 3.1.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0

## -- Conflicts --------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(49,50))

# Let's see our raw data
head(raw_data)

## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    58    58    NA    NA     0     0    58    58
## 2 ASTON MARTIN    NA     4     4    NA    NA     0     0     4     4
## 3 AUDI            NA  2148  2148    NA    NA     0     0  2148  2148
## 4 BENTLEY         NA     0     0    NA    NA     0     0     0     0
## 5 BMW             NA  2332  2332    NA    NA     0     0  2332  2332
## 6 CHERY           NA     9     9    NA    NA     0     0     9     9

Make Data Civilized In order to make the data standardized and workable we need to define column names and remove NA values for this example. Please use the same column names in your examples also.

# Use the same column names in your data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_apr_16 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2016,month=4)

print(car_data_apr_16,width=Inf)

## # A tibble: 48 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       58         58        0        0          0
##  2 ASTON MARTIN        0        4          4        0        0          0
##  3 AUDI                0     2148       2148        0        0          0
##  4 BENTLEY             0        0          0        0        0          0
##  5 BMW                 0     2332       2332        0        0          0
##  6 CHERY               0        9          9        0        0          0
##  7 CITROEN             0     1684       1684       95      670        765
##  8 DACIA               0     3555       3555        0      512        512
##  9 DS                  0       35         35        0        0          0
## 10 FERRARI             0        4          4        0        0          0
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        58          58  2016     4
##  2         0         4           4  2016     4
##  3         0      2148        2148  2016     4
##  4         0         0           0  2016     4
##  5         0      2332        2332  2016     4
##  6         0         9           9  2016     4
##  7        95      2354        2449  2016     4
##  8         0      4067        4067  2016     4
##  9         0        35          35  2016     4
## 10         0         4           4  2016     4
## # ... with 38 more rows

Save Your Civilized Data One of the best methods is to save your data to an RDS or RData file. The difference is RDS can hold only one object but RData can hold many. Since we have only one data frame here we will go with RDS.

saveRDS(car_data_apr_16,file="C:/Users/LENOVO/Desktop/odd_car_sales_data_apr_16.rds")
# You can read that file by readRDS and assigning to an object 
# e.g 
# rds_data <- readRDS("C:\Users\LENOVO\Desktop/odd_car_sales_data_apr_16.rds")

Finish With Some Analysis You are free to make any analysis here. I wanted to see a list of total sales of brands with both automobile and commercial vehicle sales ordered in decreasing total sales.

car_data_apr_16 %>% 
  filter(auto_total > 0 & comm_total > 0) %>%
  select(brand_name,total_total) %>%
  arrange(desc(total_total))

## # A tibble: 14 x 2
##    brand_name    total_total
##    <chr>               <dbl>
##  1 RENAULT             12075
##  2 VOLKSWAGEN          10485
##  3 FIAT                 9521
##  4 FORD                 9151
##  5 HYUNDAI              4608
##  6 TOYOTA               4474
##  7 DACIA                4067
##  8 MERCEDES-BENZ        2976
##  9 NISSAN               2810
## 10 PEUGEOT              2571
## 11 CITROEN              2449
## 12 KIA                  1785
## 13 MITSUBISHI            414
## 14 SSANGYONG              53

Assignment_3

Mustafa Yıldız

15 Kasım 2018