In week-2 we studied with tidyverse package and some functions in this package. This homework is about it. After we downloaded data from ODD official website, we changed the name to ODD_Retail_Sales_201806.xlsx. We will make some example from raw data to final analysis below.
At first, we load the data in our R workspace via the code below
install.packages("tidyverse", repos = "https://cran.r-project.org")
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\STUDENT\AppData\Local\Temp\Rtmp8gNkaG\downloaded_packages
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/pj18-muharremcakir81/blob/master/Week2/Odd_Retail_Sales_2018_06.xlsx?raw=true",destfile=tmp,mode = 'wb')
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 10 10 NA NA 0 0 10 10
## 2 ASTON MARTIN NA 1 1 NA NA 0 0 1 1
## 3 AUDI NA 668 668 NA NA 0 0 668 668
## 4 BENTLEY NA 1 1 NA NA 0 0 1 1
## 5 BMW NA 1386 1386 NA NA 0 0 1386 1386
## 6 CITROEN NA 499 499 NA 520 520 0 1019 1019
file.remove(tmp)
## [1] TRUE
Now we need to remove NA values and define column names to make data standardized.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_201806 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=6)
#TO save our data;
saveRDS(car_data_201806,file="Odd_Retail_Sales_201806.rds")
print(car_data_201806,width=Inf)
## # A tibble: 44 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 10 10 0 0 0
## 2 ASTON MARTIN 0 1 1 0 0 0
## 3 AUDI 0 668 668 0 0 0
## 4 BENTLEY 0 1 1 0 0 0
## 5 BMW 0 1386 1386 0 0 0
## 6 CITROEN 0 499 499 0 520 520
## 7 DACIA 0 2254 2254 0 293 293
## 8 DS 0 9 9 0 0 0
## 9 FERRARI 0 1 1 0 0 0
## 10 FIAT 3106 139 3245 1689 223 1912
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 10 10 2018 6
## 2 0 1 1 2018 6
## 3 0 668 668 2018 6
## 4 0 1 1 2018 6
## 5 0 1386 1386 2018 6
## 6 0 1019 1019 2018 6
## 7 0 2547 2547 2018 6
## 8 0 9 9 2018 6
## 9 0 1 1 2018 6
## 10 4795 362 5157 2018 6
## # ... with 34 more rows
Our data is ready to make analysis.
In first analyse , we can see the percentage of total_dom and percentage of total_imp for each brand ordered in decreasing Percentage_dom.
car_data_201806 %>% select(year, month, brand_name, total_total,total_dom,total_imp) %>%
mutate(Percentage_dom=total_dom/total_total * 100,Percentage_imp=total_imp/total_total * 100) %>%
select(year, month, brand_name, total_total,Percentage_dom,Percentage_imp) %>%
arrange(desc(Percentage_dom))
## # A tibble: 44 x 6
## year month brand_name total_total Percentage_dom Percentage_imp
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2018 6 KARSAN 47 100 0
## 2 2018 6 FIAT 5157 93.0 7.02
## 3 2018 6 ISUZU 398 91.5 8.54
## 4 2018 6 TOYOTA 2275 82.5 17.5
## 5 2018 6 HONDA 2515 82.1 17.9
## 6 2018 6 RENAULT 8132 69.6 30.4
## 7 2018 6 FORD 5928 62.4 37.6
## 8 2018 6 HYUNDAI 3356 39.5 60.5
## 9 2018 6 TOPLAM: 51037 38.9 61.1
## 10 2018 6 MITSUBISHI 160 8.12 91.9
## # ... with 34 more rows
In second analyse , we can see the percentage of Auto_total and percentage of comm_total for each brands their percentage of total_dom greater than %50 ordered in decreasing Total Sales..
car_data_201806 %>% select(year, month, brand_name, total_total,total_dom,total_imp,comm_total,auto_total) %>%
mutate(Percentage_dom=total_dom/total_total * 100,Percentage_Auto=auto_total/total_total * 100,Percentage_comm=comm_total/total_total * 100) %>% filter(Percentage_dom > 50 ) %>%
select(year, month, brand_name,total_total,Percentage_Auto,Percentage_comm) %>%
arrange(desc(total_total))
## # A tibble: 7 x 6
## year month brand_name total_total Percentage_Auto Percentage_comm
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2018 6 RENAULT 8132 91.3 8.67
## 2 2018 6 FORD 5928 43.1 56.9
## 3 2018 6 FIAT 5157 62.9 37.1
## 4 2018 6 HONDA 2515 100 0
## 5 2018 6 TOYOTA 2275 92.1 7.91
## 6 2018 6 ISUZU 398 0 100
## 7 2018 6 KARSAN 47 0 100
In third and last analyse , we can see the 14 brands theirs total sales is above the average total sales of all brands.
car_data_201806 %>% select(year, month, brand_name, total_total) %>%
mutate(average_Total=mean(total_total) ) %>%
filter(total_total > average_Total) %>%
arrange(desc(total_total))
## # A tibble: 8 x 5
## year month brand_name total_total average_Total
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2018 6 TOPLAM: 51037 2320.
## 2 2018 6 RENAULT 8132 2320.
## 3 2018 6 FORD 5928 2320.
## 4 2018 6 VOLKSWAGEN 5521 2320.
## 5 2018 6 FIAT 5157 2320.
## 6 2018 6 HYUNDAI 3356 2320.
## 7 2018 6 DACIA 2547 2320.
## 8 2018 6 HONDA 2515 2320.