Serhan SÜER
First we find the data on Otomotiv Distibütörleri Derneği website. We are interested in January 2018 sales. We download the data change the name to odd_retail_sales_2018_01.xlsx.
Our raw excel file is in our repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/pj18-SerhanSuer/blob/master/odd_retail_sales_2018_01.xlsx?raw=true",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 5 5 NA NA 0 0 5 5
## 2 ASTON MARTIN NA 1 1 NA NA 0 0 1 1
## 3 AUDI NA 1013 1013 NA NA 0 0 1013 1013
## 4 BENTLEY NA 0 0 NA NA 0 0 0 0
## 5 BMW NA 626 626 NA NA 0 0 626 626
## 6 CHERY NA NA 0 NA NA 0 0 0 0
In order to make the data standardized and workable we need to define column names and remove NA values for this example.
# Use the same column names in your data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_jan_18 <- raw_data %>%
mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>%
mutate(year=2018,month=1)
print(car_data_jan_18,width=Inf)
## # A tibble: 43 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 5 5 0 0 0
## 2 ASTON MARTIN 0 1 1 0 0 0
## 3 AUDI 0 1013 1013 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 626 626 0 0 0
## 6 CHERY 0 0 0 0 0 0
## 7 CITROEN 0 407 407 0 94 94
## 8 DACIA 0 937 937 0 254 254
## 9 DS 0 0 0 0 0 0
## 10 FERRARI 0 1 1 0 0 0
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 5 5 2018 1
## 2 0 1 1 2018 1
## 3 0 1013 1013 2018 1
## 4 0 0 0 2018 1
## 5 0 626 626 2018 1
## 6 0 0 0 2018 1
## 7 0 501 501 2018 1
## 8 0 1191 1191 2018 1
## 9 0 0 0 2018 1
## 10 0 1 1 2018 1
## # ... with 33 more rows
I wanted to make four analysis related to this dataset.
car_data_jan_18 %>%
filter(total_imp == 0 & total_total != 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 1 x 2
## brand_name total_total
## <chr> <dbl>
## 1 KARSAN 49
car_data_jan_18 %>%
filter(total_dom == 0 & total_total != 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 28 x 2
## brand_name total_total
## <chr> <dbl>
## 1 VOLKSWAGEN 4127
## 2 MERCEDES-BENZ 1670
## 3 SKODA 1564
## 4 OPEL 1327
## 5 DACIA 1191
## 6 AUDI 1013
## 7 PEUGEOT 987
## 8 NISSAN 862
## 9 SEAT 663
## 10 BMW 626
## # ... with 18 more rows
car_data_jan_18 %>%
filter(comm_total == 0 & total_total != 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 20 x 2
## brand_name total_total
## <chr> <dbl>
## 1 SKODA 1564
## 2 HONDA 1398
## 3 OPEL 1327
## 4 AUDI 1013
## 5 SEAT 663
## 6 BMW 626
## 7 JEEP 101
## 8 SUBARU 86
## 9 LAND ROVER 84
## 10 SUZUKI 67
## 11 MAZDA 62
## 12 MINI 55
## 13 PORSCHE 43
## 14 JAGUAR 12
## 15 ALFA ROMEO 5
## 16 MASERATI 4
## 17 INFINITI 3
## 18 LEXUS 2
## 19 ASTON MARTIN 1
## 20 FERRARI 1
car_data_jan_18 %>%
filter(auto_total == 0 & total_total != 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 3 x 2
## brand_name total_total
## <chr> <dbl>
## 1 IVECO 122
## 2 ISUZU 121
## 3 KARSAN 49