Ozgur Ozdemir - November 2018
I studied the February 2016 ODD data. Here is the link that i get data.
First, I downloaded the tidyverse package.
library(tidyverse)
## -- Attaching packages ---------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
tmp<-tempfile(fileext="2016subat.xlsx")
download.file("https://github.com/MEF-BDA503/pj18-ozdemiroz/blob/master/2016subat.xlsx?raw=true", mode = 'wb',destfile=tmp)
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
file.remove(tmp)
## [1] TRUE
raw_data <- raw_data %>% slice(-c(49,50))
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 75 75 NA NA 0 0 75 75
## 2 ASTON MARTIN NA 1 1 NA NA 0 0 1 1
## 3 AUDI NA 1200 1200 NA NA 0 0 1200 1200
## 4 BENTLEY NA 1 1 NA NA 0 0 1 1
## 5 BMW NA 2042 2042 NA NA 0 0 2042 2042
## 6 CHERY NA 23 23 NA NA 0 0 23 23
In order to make the data standardized and workable we need to define column names and remove NA values for this example.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
car_data_feb_16 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2016,month=2)
print(car_data_feb_16,width=Inf)
## # A tibble: 48 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 75 75 0 0 0
## 2 ASTON MARTIN 0 1 1 0 0 0
## 3 AUDI 0 1200 1200 0 0 0
## 4 BENTLEY 0 1 1 0 0 0
## 5 BMW 0 2042 2042 0 0 0
## 6 CHERY 0 23 23 0 0 0
## 7 CITROEN 0 366 366 40 213 253
## 8 DACIA 0 1963 1963 0 228 228
## 9 DS 0 34 34 0 0 0
## 10 FERRARI 0 1 1 0 0 0
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 75 75 2016 2
## 2 0 1 1 2016 2
## 3 0 1200 1200 2016 2
## 4 0 1 1 2016 2
## 5 0 2042 2042 2016 2
## 6 0 23 23 2016 2
## 7 40 579 619 2016 2
## 8 0 2191 2191 2016 2
## 9 0 34 34 2016 2
## 10 0 1 1 2016 2
## # ... with 38 more rows
saveRDS(car_data_feb_16,file="~/Desktop/odd_car_sales_data_feb_16.rds")
I wanted to see a list of total sales of brands with domestic and imported vehicle sales ordered in decreasing total sales.
car_data_feb_16 %>%
select(brand_name,total_dom,total_imp,total_total) %>%
filter(total_dom > 0 & total_imp > 0) %>%
arrange(desc(total_total))
## # A tibble: 10 x 4
## brand_name total_dom total_imp total_total
## <chr> <dbl> <dbl> <dbl>
## 1 FORD 4018 2922 6940
## 2 RENAULT 3553 2609 6162
## 3 FIAT 4881 621 5502
## 4 HYUNDAI 1485 1616 3101
## 5 TOYOTA 1735 766 2501
## 6 PEUGEOT 91 2014 2105
## 7 HONDA 510 669 1179
## 8 CITROEN 40 579 619
## 9 MITSUBISHI 35 222 257
## 10 ISUZU 204 34 238