Car sales data downloaded from “Otomotiv Distributorleri Dernegi” website. My part was to obtain January 2016 sales and name of the excel file changed to “odd_retail_sales_2016_01.xlsx”.
My raw excel file is in my repository. I downloaded that file and put it in a temporary file. Then I read that excel document into R and removed the temp file. Last two rows also deleted from the dataset since it consists of total sales info which is unnecessary. Then I showed first 10 rows of the dataset.
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/pj18-mkerimacar/blob/master/odd_retail_sales_2016_01.xlsx?raw=true",destfile=tmp)
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
file.remove(tmp)
## [1] TRUE
raw_data <- raw_data %>% slice(-c(49,50))
head(raw_data,10)
## # A tibble: 10 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 12 12 NA NA 0 0 12 12
## 2 "ASTON MART\u013~ NA 2 2 NA NA 0 0 2 2
## 3 AUDI NA 911 911 NA NA 0 0 911 911
## 4 BENTLEY NA NA 0 NA NA 0 0 0 0
## 5 BMW NA 496 496 NA NA 0 0 496 496
## 6 CHERY NA 30 30 NA NA 0 0 30 30
## 7 CITROEN NA 394 394 41 207 248 41 601 642
## 8 DACIA NA 1235 1235 NA 221 221 0 1456 1456
## 9 DS NA 8 8 NA NA 0 0 8 8
## 10 FERRARI NA 3 3 NA NA 0 0 3 3
In order to make the data standardized and workable I renamed column names and removed NA values from the dataset.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
car_data_jan_16 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2016,month=1)
print(car_data_jan_16,width=Inf)
## # A tibble: 48 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 12 12 0 0
## 2 "ASTON MART\u0130N" 0 2 2 0 0
## 3 AUDI 0 911 911 0 0
## 4 BENTLEY 0 0 0 0 0
## 5 BMW 0 496 496 0 0
## 6 CHERY 0 30 30 0 0
## 7 CITROEN 0 394 394 41 207
## 8 DACIA 0 1235 1235 0 221
## 9 DS 0 8 8 0 0
## 10 FERRARI 0 3 3 0 0
## comm_total total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 12 12 2016 1
## 2 0 0 2 2 2016 1
## 3 0 0 911 911 2016 1
## 4 0 0 0 0 2016 1
## 5 0 0 496 496 2016 1
## 6 0 0 30 30 2016 1
## 7 248 41 601 642 2016 1
## 8 221 0 1456 1456 2016 1
## 9 0 0 8 8 2016 1
## 10 0 0 3 3 2016 1
## # ... with 38 more rows
I wanted to see relative frequencies of total sales and I created a column named as total_relfreqs next to total sales (rounded to 3 decimals).
car_data_jan_16 %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total)) %>%
mutate(total_relfreqs = (total_total/sum(total_total))) %>%
mutate(total_relfreqs = sprintf("%0.3f", total_relfreqs))
## # A tibble: 13 x 3
## brand_name total_total total_relfreqs
## <chr> <dbl> <chr>
## 1 VOLKSWAGEN 4528 0.175
## 2 RENAULT 4519 0.174
## 3 FIAT 3843 0.148
## 4 FORD 3770 0.145
## 5 HYUNDAI 2301 0.089
## 6 TOYOTA 1481 0.057
## 7 DACIA 1456 0.056
## 8 MERCEDES-BENZ 1451 0.056
## 9 PEUGEOT 1111 0.043
## 10 CITROEN 642 0.025
## 11 KIA 418 0.016
## 12 MITSUBISHI 388 0.015
## 13 SSANGYONG 22 0.001