The data for this post is provided from the web page of Otomotiv Distribütörleri Dernegi. The study is based on the August 2018 car sales. The data set is renamed as odd_retail_sales_2018_08.xlsx
.
The raw excel file is in my repositoy.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
# Create a temporary file
tmp<-tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/pj18-orkunberkyuzbasioglu/raw/master/odd_retail_sales_2018_08.xlsx",destfile=tmp,mode="wb")
# Read that excel file using readxl package's read_excel function. You might need to adjust the parameters (skip, col_names) according to your raw file's format.
raw_data<-readxl::read_excel(tmp,skip=7,col_names=FALSE)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove the last two rows because they are irrelevant (total and empty rows)
raw_data <- raw_data %>% slice(-c(43,44))
# Let's see our raw data
head(raw_data)
## # A tibble: 6 x 10
## X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO NA 15 15 NA NA 0 0 15 15
## 2 ASTON MARTIN NA 1 1 NA NA 0 0 1 1
## 3 AUDI NA 737 737 NA NA 0 0 737 737
## 4 BENTLEY NA 1 1 NA NA 0 0 1 1
## 5 BMW NA 1005 1005 NA NA 0 0 1005 1005
## 6 CITROEN NA 495 495 NA 198 198 0 693 693
This is not completely OK.
##Make Data Civilized
In order to make the data standardized and workable we need to define column names and remove NA values for this example. I used the same column names in my examples as the example.
# Same column names for my data.
colnames(raw_data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")
# Now we replace NA values with 0 and label the time period with year and month, so when we merge the data we won't be confused.
car_data_sep_18 <- raw_data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=8)
print(car_data_sep_18,width=Inf)
## # A tibble: 42 x 12
## brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALFA ROMEO 0 15 15 0 0 0
## 2 ASTON MARTIN 0 1 1 0 0 0
## 3 AUDI 0 737 737 0 0 0
## 4 BENTLEY 0 1 1 0 0 0
## 5 BMW 0 1005 1005 0 0 0
## 6 CITROEN 0 495 495 0 198 198
## 7 DACIA 0 1220 1220 0 467 467
## 8 DS 0 64 64 0 0 0
## 9 FERRARI 0 2 2 0 0 0
## 10 FIAT 1602 79 1681 1090 236 1326
## total_dom total_imp total_total year month
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 15 15 2018 8
## 2 0 1 1 2018 8
## 3 0 737 737 2018 8
## 4 0 1 1 2018 8
## 5 0 1005 1005 2018 8
## 6 0 693 693 2018 8
## 7 0 1687 1687 2018 8
## 8 0 64 64 2018 8
## 9 0 2 2 2018 8
## 10 2692 315 3007 2018 8
## # ... with 32 more rows
##Save Your Civilized Data The data was saved to an RDS
object.
saveRDS(car_data_sep_18,file="C:/Users/orkun/OneDrive/Desktop/odd_car_sales_data_aug_18.rds")
# You can read that file by readRDS and assigning to an object
# e.g
# rds_data <- readRDS("~/YOUR_OWN_PATH/odd_car_sales_data_sep_18.rds")
#Finish With Some Analysis
car_data_sep_18 %>%
filter(auto_total > 0 & comm_total > 0) %>%
select(brand_name,total_total) %>%
arrange(desc(total_total))
## # A tibble: 14 x 2
## brand_name total_total
## <chr> <dbl>
## 1 RENAULT 4102
## 2 FORD 3709
## 3 VOLKSWAGEN 3585
## 4 FIAT 3007
## 5 TOYOTA 2274
## 6 DACIA 1687
## 7 PEUGEOT 1629
## 8 MERCEDES-BENZ 1427
## 9 NISSAN 1381
## 10 HYUNDAI 1301
## 11 CITROEN 693
## 12 KIA 500
## 13 MITSUBISHI 400
## 14 SSANGYONG 34