First we find the data on Otomotiv Distibütörleri Derneği website. We are interested in September 2018 sales. We download the data change the name to odd_retail_sales_2018_09.xlsx
(change yours accordingly). We will make a reproducible example of data analysis from the raw data located somewhere to the final analysis.
Our raw excel file is in our repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/mef-bda503.github.io/blob/master/files/odd_retail_sales_2018_09.xlsx?raw=true",destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 13 13 NA NA 0 0 13 13
## 2 ASTON MARTIN NA 2 2 NA NA 0 0 2 2
## 3 AUDI NA 350 350 NA NA 0 0 350 350
## 4 BENTLEY NA 0 0 NA NA 0 0 0 0
## 5 BMW NA 158 158 NA NA 0 0 158 158
## 6 CITROEN NA 134 134 NA 197 197 0 331 331
It’s ok but needs some work.
In order to make the data standardized and workable we need to define column names and remove NA values for this example. Please use the same column names in your examples also.
# Use the same column names in your data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_sep_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=9)
print(car_data_sep_18,width=Inf)
## # A tibble: 42 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 13 13 0 0 0
## 2 ASTON MARTIN 0 2 2 0 0 0
## 3 AUDI 0 350 350 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 158 158 0 0 0
## 6 CITROEN 0 134 134 0 197 197
## 7 DACIA 0 1141 1141 0 319 319
## 8 DS 0 9 9 0 0 0
## 9 FERRARI 0 3 3 0 0 0
## 10 FIAT 632 57 689 789 199 988
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 13 13 2018 9
## 2 0 2 2 2018 9
## 3 0 350 350 2018 9
## 4 0 0 0 2018 9
## 5 0 158 158 2018 9
## 6 0 331 331 2018 9
## 7 0 1460 1460 2018 9
## 8 0 9 9 2018 9
## 9 0 3 3 2018 9
## 10 1421 256 1677 2018 9
## # ... with 32 more rows
One of the best methods is to save your data to an RDS
or RData
file. The difference is RDS can hold only one object but RData can hold many. Since we have only one data frame here we will go with RDS.
saveRDS(car_data_sep_18,file="~/YOUR_OWN_PATH/odd_car_sales_data_sep_18.rds")
# You can read that file by readRDS and assigning to an object
# e.g
# rds_data <- readRDS("~/YOUR_OWN_PATH/odd_car_sales_data_sep_18.rds")
You are free to make any analysis here. I wanted to see a list of total sales of brands with both automobile and commercial vehicle sales ordered in decreasing total sales.
car_data_sep_18 %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 13 x 2
## brand_name total_total
## <chr> <dbl>
## 1 RENAULT 3186
## 2 FORD 2356
## 3 VOLKSWAGEN 2239
## 4 FIAT 1677
## 5 HYUNDAI 1535
## 6 DACIA 1460
## 7 NISSAN 1217
## 8 MERCEDES-BENZ 1163
## 9 TOYOTA 1127
## 10 PEUGEOT 717
## 11 CITROEN 331
## 12 KIA 311
## 13 MITSUBISHI 213
?functionname
command.