Busra Koc 31/10/18
First I find the data on Otomotiv Distibutorleri Dernegi website. I am interested in July 2018 sales. I download the data change the name to odd_retail_sales_2018_07.xlsx . I will make a reproducible example of data analysis from the raw data located somewhere to the final analysis.
Before I start, I downloaded the tidyverse package.
# Before I start, I downloaded the tidyverse package.
library(tidyverse)
# Create a temporary file
tmp<-tempfile(fileext="odd_retail_sales_2018_07.xlsx")
# Download file from repository to the temp file. In addition, to avoid getting error when loading URL, I added "mode = 'wb' to the code.
download.file("https://github.com/MEF-BDA503/pj18-busraakoc/blob/master/odd_retail_sales_2018_07.xlsx?raw=true",mode = 'wb',destfile=tmp)
# Read that excel file using readxl package's read_excel function.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows).
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 14 14 NA NA 0 0 14 14
## 2 ASTON MARTIN NA 1 1 NA NA 0 0 1 1
## 3 AUDI NA 840 840 NA NA 0 0 840 840
## 4 BENTLEY NA NA 0 NA NA 0 0 0 0
## 5 BMW NA 760 760 NA NA 0 0 760 760
## 6 CITROEN NA 922 922 NA 470 470 0 1392 1392
In order to make the data standardized and workable we need to define column names and remove NA values for this example.
# Use the same column names in data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_jul_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=7)
print(car_data_jul_18,width=Inf)
## # A tibble: 42 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 14 14 0 0 0
## 2 ASTON MARTIN 0 1 1 0 0 0
## 3 AUDI 0 840 840 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 760 760 0 0 0
## 6 CITROEN 0 922 922 0 470 470
## 7 DACIA 0 2343 2343 0 259 259
## 8 DS 0 10 10 0 0 0
## 9 FERRARI 0 2 2 0 0 0
## 10 FIAT 3720 112 3832 1787 248 2035
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 14 14 2018 7
## 2 0 1 1 2018 7
## 3 0 840 840 2018 7
## 4 0 0 0 2018 7
## 5 0 760 760 2018 7
## 6 0 1392 1392 2018 7
## 7 0 2602 2602 2018 7
## 8 0 10 10 2018 7
## 9 0 2 2 2018 7
## 10 5507 360 5867 2018 7
## # ... with 32 more rows
saveRDS(car_data_jul_18,file="~/Desktop/odd_car_sales_data_jul_18.rds")
I wanted to see a list of total sales of brands with domestic and imported vehicle sales ordered in decreasing total sales.
car_data_jul_18 %>%
select(brand_name,total_dom,total_imp,total_total) %>%
filter(total_dom > 0 & total_imp > 0) %>%
arrange(desc(total_total))
## # A tibble: 8 x 4
## brand_name total_dom total_imp total_total
## <chr> <dbl> <dbl> <dbl>
## 1 RENAULT 4774 2371 7145
## 2 FIAT 5507 360 5867
## 3 FORD 3643 2200 5843
## 4 TOYOTA 2827 321 3148
## 5 HYUNDAI 1107 1507 2614
## 6 HONDA 2041 402 2443
## 7 MITSUBISHI 12 317 329
## 8 ISUZU 68 55 123