Car Sales Data Import

Download Raw Data

The raw excel file is in my repositoy.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/pj18-orkunberkyuzbasioglu/raw/master/odd_retail_sales_2018_08.xlsx",destfile=tmp,mode="wb")
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)

## [1] TRUE

# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(43,44))

# Let's see our raw data
head(raw_data)

## # A tibble: 6 x 10
##   X__1          X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 X__10
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO      NA    15    15    NA    NA     0     0    15    15
## 2 ASTON MARTIN    NA     1     1    NA    NA     0     0     1     1
## 3 AUDI            NA   737   737    NA    NA     0     0   737   737
## 4 BENTLEY         NA     1     1    NA    NA     0     0     1     1
## 5 BMW             NA  1005  1005    NA    NA     0     0  1005  1005
## 6 CITROEN         NA   495   495    NA   198   198     0   693   693

This is not completely OK.

##Make Data Civilized

In order to make the data standardized and workable we need to define column names and remove NA values for this example. I used the same column names in my examples as the example.

# Same column names for my data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_sep_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=8)

print(car_data_sep_18,width=Inf)

## # A tibble: 42 x 12
##    brand_name   auto_dom auto_imp auto_total comm_dom comm_imp comm_total
##    <chr>           <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>
##  1 ALFA ROMEO          0       15         15        0        0          0
##  2 ASTON MARTIN        0        1          1        0        0          0
##  3 AUDI                0      737        737        0        0          0
##  4 BENTLEY             0        1          1        0        0          0
##  5 BMW                 0     1005       1005        0        0          0
##  6 CITROEN             0      495        495        0      198        198
##  7 DACIA               0     1220       1220        0      467        467
##  8 DS                  0       64         64        0        0          0
##  9 FERRARI             0        2          2        0        0          0
## 10 FIAT             1602       79       1681     1090      236       1326
##    total_dom total_imp total_total  year month
##        <dbl>     <dbl>       <dbl> <dbl> <dbl>
##  1         0        15          15  2018     8
##  2         0         1           1  2018     8
##  3         0       737         737  2018     8
##  4         0         1           1  2018     8
##  5         0      1005        1005  2018     8
##  6         0       693         693  2018     8
##  7         0      1687        1687  2018     8
##  8         0        64          64  2018     8
##  9         0         2           2  2018     8
## 10      2692       315        3007  2018     8
## # ... with 32 more rows

##Save Your Civilized Data The data was saved to an RDS object.

saveRDS(car_data_sep_18,file="C:/Users/orkun/OneDrive/Desktop/odd_car_sales_data_aug_18.rds")
# You can read that file by readRDS and assigning to an object 
# e.g 
# rds_data <- readRDS("~/YOUR_OWN_PATH/odd_car_sales_data_sep_18.rds")

#Finish With Some Analysis

car_data_sep_18 %>% 
  filter(auto_total > 0 & comm_total > 0) %>%
  select(brand_name,total_total) %>%
  arrange(desc(total_total))

## # A tibble: 14 x 2
##    brand_name    total_total
##    <chr>               <dbl>
##  1 RENAULT              4102
##  2 FORD                 3709
##  3 VOLKSWAGEN           3585
##  4 FIAT                 3007
##  5 TOYOTA               2274
##  6 DACIA                1687
##  7 PEUGEOT              1629
##  8 MERCEDES-BENZ        1427
##  9 NISSAN               1381
## 10 HYUNDAI              1301
## 11 CITROEN               693
## 12 KIA                   500
## 13 MITSUBISHI            400
## 14 SSANGYONG              34

Car Sales Data Import

Orkun Berk Yuzbasioglu

October 29, 2018

From Raw to Civilized Data

Download Raw Data