First we find the data on Otomotiv Distibütörleri Derneği website. We are interested in September 2018 sales. We download the data change the name to odd_retail_sales_2016_03.xlsx
(change yours accordingly). We will make a reproducible example of data analysis from the raw data located somewhere to the final analysis.
Our raw excel file is in our repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/pj18-mrtgocer/blob/master/files/odd_retail_sales_2016_03.xlsx?raw=true",mode = 'wb',destfile=tmp)
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 149 149 NA NA 0 0 149 149
## 2 ASTON MARTIN NA 2 2 NA NA 0 0 2 2
## 3 AUDI NA 1455 1455 NA NA 0 0 1455 1455
## 4 BENTLEY NA NA 0 NA NA 0 0 0 0
## 5 BMW NA 2080 2080 NA NA 0 0 2080 2080
## 6 CHERY NA 12 12 NA NA 0 0 12 12
It’s ok but needs some work.
In order to make the data standardized and workable we need to define column names and remove NA values for this example. Please use the same column names in your examples also.
# Use the same column names in your data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_march_16 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2016,month=3)
print(car_data_march_16,width=Inf)
## # A tibble: 48 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 149 149 0 0 0
## 2 ASTON MARTIN 0 2 2 0 0 0
## 3 AUDI 0 1455 1455 0 0 0
## 4 BENTLEY 0 0 0 0 0 0
## 5 BMW 0 2080 2080 0 0 0
## 6 CHERY 0 12 12 0 0 0
## 7 CITROEN 0 1562 1562 127 752 879
## 8 DACIA 0 3402 3402 0 380 380
## 9 DS 0 42 42 0 0 0
## 10 FERRARI 0 0 0 0 0 0
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 149 149 2016 3
## 2 0 2 2 2016 3
## 3 0 1455 1455 2016 3
## 4 0 0 0 2016 3
## 5 0 2080 2080 2016 3
## 6 0 12 12 2016 3
## 7 127 2314 2441 2016 3
## 8 0 3782 3782 2016 3
## 9 0 42 42 2016 3
## 10 0 0 0 2016 3
## # ... with 38 more rows
One of the best methods is to save your data to an RDS
or RData
file. The difference is RDS can hold only one object but RData can hold many. Since we have only one data frame here we will go with RDS.
saveRDS(car_data_march_16,file="/home/murat/Masaüstü/pj18-mrtgocer/files/odd_car_sales_data_march_16.rds")
# You can read that file by readRDS and assigning to an object
# e.g
# rds_data <- readRDS("~/YOUR_OWN_PATH/odd_car_sales_data_march_16.rds")
You are free to make any analysis here. I wanted to see a list of total sales of brands with both automobile and commercial vehicle sales ordered in decreasing total sales.
car_data_march_16 %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 15 x 2
## brand_name total_total
## <chr> <dbl>
## 1 TOPLAM: 82948
## 2 RENAULT 11293
## 3 VOLKSWAGEN 11293
## 4 FORD 9758
## 5 FIAT 8400
## 6 HYUNDAI 4567
## 7 DACIA 3782
## 8 TOYOTA 3630
## 9 MERCEDES-BENZ 3405
## 10 NISSAN 3164
## 11 PEUGEOT 3159
## 12 CITROEN 2441
## 13 KIA 1467
## 14 MITSUBISHI 436
## 15 SSANGYONG 53