Loading Library

First I need to open xlsx file. For this, I need xlsx library. In order to install the this library, I simply use “library(xlsx)” library function code for after usage of this.

library(xlsx)

“read.xlsx()” functions provides to upload data to R platform. By this, I can easily manuplate,change and play with data. With specific second code “1”, I manage the spreadsheet number which is needed for this function requirement.

data <- read.xlsx("automative.xlsx", 1 )

After that cleaning data is a must. For this reason, I must select and delete some columns which have NA values in it. with -c() structure, I include the some rows in data[] body and delete the rows which I don’t want.

data <- data[-c(1:5,51,52),]
head(data)
##             NA. NA..1 NA..2 NA..3 NA..4 NA..5 NA..6 NA..7 NA..8 NA..9
## 6    ALFA ROMEO  <NA>    13    13  <NA>  <NA>     0     0    13    13
## 7  ASTON MARTIN  <NA>     1     1  <NA>  <NA>     0     0     1     1
## 8          AUDI  <NA>  1552  1552  <NA>  <NA>     0     0  1552  1552
## 9       BENTLEY  <NA>     0     0  <NA>  <NA>     0     0     0     0
## 10          BMW  <NA>  1586  1586  <NA>  <NA>     0     0  1586  1586
## 11        CHERY  <NA>  <NA>     0  <NA>  <NA>     0     0     0     0

Column names are mess. For that, I need to change the column names with reasonable types of names. with a “colnames()” function of R, I can manage and put as variable the names of columns. After that, with a vector, I simply put the required names to this data type.

colnames(data) <- c("brand_name","auto_dom","auto_imp","auto_total","comm_dom","comm_imp","comm_total","total_dom","total_imp","total_total")

Here comes the tidyverse library. Tidyverse library is so intense and powerful in R. So that, I can create chains of functions with %>% pipe operator to mutate some columns as well as I can filter some data in data set.

library(tidyverse)
## -- Attaching packages --------------------- tidyverse 1.2.1 --
## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0
## -- Conflicts ------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
car_data_march_18 <- data %>% mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>% mutate(year=2018,month=3)
print(car_data_march_18,width=Inf)
##       brand_name auto_dom auto_imp auto_total comm_dom comm_imp comm_total
## 1     ALFA ROMEO     <NA>       13         13     <NA>     <NA>          0
## 2   ASTON MARTIN     <NA>        1          1     <NA>     <NA>          0
## 3           AUDI     <NA>     1552       1552     <NA>     <NA>          0
## 4        BENTLEY     <NA>        0          0     <NA>     <NA>          0
## 5            BMW     <NA>     1586       1586     <NA>     <NA>          0
## 6          CHERY     <NA>     <NA>          0     <NA>     <NA>          0
## 7        CITROEN     <NA>     1086       1086     <NA>      706        706
## 8          DACIA     <NA>     3600       3600     <NA>      335        335
## 9             DS     <NA>       81         81     <NA>     <NA>          0
## 10       FERRARI     <NA>        2          2     <NA>     <NA>          0
## 11          FIAT     4496      142       4638     3392      401       3793
## 12          FORD      412     2638       3050     5217      251       5468
## 13         HONDA     2710      469       3179     <NA>     <NA>          0
## 14       HYUNDAI     2013     2376       4389     <NA>       28         28
## 15      INFINITI     <NA>        2          2     <NA>     <NA>          0
## 16         ISUZU     <NA>     <NA>          0      123       19        142
## 17         IVECO     <NA>     <NA>          0     <NA>      161        161
## 18        JAGUAR     <NA>       24         24     <NA>     <NA>          0
## 19          JEEP     <NA>      255        255     <NA>     <NA>          0
## 20        KARSAN     <NA>     <NA>          0      100     <NA>        100
## 21           KIA     <NA>      872        872     <NA>      207        207
## 22   LAMBORGHINI     <NA>        0          0     <NA>     <NA>          0
## 23    LAND ROVER     <NA>      162        162     <NA>     <NA>          0
## 24         LEXUS     <NA>        8          8     <NA>     <NA>          0
## 25      MASERATI     <NA>        6          6     <NA>     <NA>          0
## 26         MAZDA     <NA>      135        135     <NA>     <NA>          0
## 27 MERCEDES-BENZ     <NA>     2031       2031     <NA>     1121       1121
## 28          MINI     <NA>      116        116     <NA>     <NA>          0
## 29    MITSUBISHI     <NA>       21         21       24      403        427
## 30        NISSAN     <NA>     2908       2908     <NA>      197        197
## 31          OPEL     <NA>     2989       2989     <NA>     <NA>          0
## 32       PEUGEOT     <NA>     3038       3038     <NA>      485        485
## 33       PORSCHE     <NA>       61         61     <NA>     <NA>          0
## 34       RENAULT     7127     1821       8948     <NA>     1087       1087
## 35          SEAT     <NA>     1226       1226     <NA>     <NA>          0
## 36         SKODA     <NA>     2911       2911     <NA>     <NA>          0
## 37         SMART     <NA>        4          4     <NA>     <NA>          0
## 38     SSANGYONG     <NA>       38         38     <NA>        6          6
## 39        SUBARU     <NA>      224        224     <NA>     <NA>          0
## 40        SUZUKI     <NA>      323        323     <NA>     <NA>          0
## 41        TOYOTA     2961      294       3255     <NA>      310        310
## 42    VOLKSWAGEN     <NA>     6561       6561     <NA>     1974       1974
## 43         VOLVO     <NA>      503        503     <NA>     <NA>          0
## 44          <NA>     <NA>     <NA>       <NA>     <NA>     <NA>       <NA>
## 45       TOPLAM:    19719    40079      59798     8856     7691      16547
##    total_dom total_imp total_total year month
## 1          0        13          13 2018     3
## 2          0         1           1 2018     3
## 3          0      1552        1552 2018     3
## 4          0         0           0 2018     3
## 5          0      1586        1586 2018     3
## 6          0         0           0 2018     3
## 7          0      1792        1792 2018     3
## 8          0      3935        3935 2018     3
## 9          0        81          81 2018     3
## 10         0         2           2 2018     3
## 11      7888       543        8431 2018     3
## 12      5629      2889        8518 2018     3
## 13      2710       469        3179 2018     3
## 14      2013      2404        4417 2018     3
## 15         0         2           2 2018     3
## 16       123        19         142 2018     3
## 17         0       161         161 2018     3
## 18         0        24          24 2018     3
## 19         0       255         255 2018     3
## 20       100         0         100 2018     3
## 21         0      1079        1079 2018     3
## 22         0         0           0 2018     3
## 23         0       162         162 2018     3
## 24         0         8           8 2018     3
## 25         0         6           6 2018     3
## 26         0       135         135 2018     3
## 27         0      3152        3152 2018     3
## 28         0       116         116 2018     3
## 29        24       424         448 2018     3
## 30         0      3105        3105 2018     3
## 31         0      2989        2989 2018     3
## 32         0      3523        3523 2018     3
## 33         0        61          61 2018     3
## 34      7127      2908       10035 2018     3
## 35         0      1226        1226 2018     3
## 36         0      2911        2911 2018     3
## 37         0         4           4 2018     3
## 38         0        44          44 2018     3
## 39         0       224         224 2018     3
## 40         0       323         323 2018     3
## 41      2961       604        3565 2018     3
## 42         0      8535        8535 2018     3
## 43         0       503         503 2018     3
## 44      <NA>      <NA>        <NA> 2018     3
## 45     28575     47770       76345 2018     3

I must save the data in a safe folder to continue my progress. In order to do this, “saveRDS()” function helps me.

saveRDS(car_data_march_18,file="C:\\Users\\Baris\\Desktop\\Big Data Analytics MEF 2018-2019\\TERM1\\Data Anaytics Essential\\odd_car_sales_data_sep_18.rds")

Again with tidyverse I filter, select and arrange the columns of data set. Filter works with boolean operatos and select() just selects the required columns which I described. arrange() simply arranges the data key to described column in it.

car_data_march_18 %>% 
  filter(as.numeric(auto_total) > 0 & as.numeric(comm_total) > 0) %>%
  select(brand_name,total_total) %>%
  arrange(desc(total_total))
##       brand_name total_total
## 1     VOLKSWAGEN        8535
## 2           FORD        8518
## 3           FIAT        8431
## 4             DS          81
## 5          LEXUS           8
## 6        TOPLAM:       76345
## 7        PORSCHE          61
## 8       MASERATI           6
## 9          VOLVO         503
## 10    MITSUBISHI         448
## 11       HYUNDAI        4417
## 12     SSANGYONG          44
## 13         SMART           4
## 14         DACIA        3935
## 15        TOYOTA        3565
## 16       PEUGEOT        3523
## 17        SUZUKI         323
## 18         HONDA        3179
## 19 MERCEDES-BENZ        3152
## 20        NISSAN        3105
## 21          OPEL        2989
## 22         SKODA        2911
## 23          JEEP         255
## 24        JAGUAR          24
## 25        SUBARU         224
## 26       FERRARI           2
## 27      INFINITI           2
## 28       CITROEN        1792
## 29    LAND ROVER         162
## 30         IVECO         161
## 31           BMW        1586
## 32          AUDI        1552
## 33         ISUZU         142
## 34         MAZDA         135
## 35    ALFA ROMEO          13
## 36          SEAT        1226
## 37          MINI         116
## 38           KIA        1079
## 39       RENAULT       10035
## 40        KARSAN         100
## 41  ASTON MARTIN           1
## 42       BENTLEY           0
## 43         CHERY           0
## 44   LAMBORGHINI           0

An lastly, I filter and original table with filter, select, mutate and arrange. First I filter numeric data if these variables which I described are positive. After that, I calculate differences of comm_total and auto_total and values which comm_total bigger than auto_total. After I select the mutated values and arrange them from biggest to lowest.

car_data_march_18 %>%
  filter(as.numeric(auto_total) > 0 & as.numeric(comm_total) > 0) %>%
  mutate(CommAutoDif = as.numeric(comm_total) >as.numeric(auto_total),CommAutoDif2 = as.numeric(auto_total) - as.numeric(comm_total)) %>%
   select(brand_name,total_total,CommAutoDif,CommAutoDif2) %>%
  arrange(desc(CommAutoDif2))
##       brand_name total_total CommAutoDif CommAutoDif2
## 1             DS          81       FALSE           35
## 2        RENAULT       10035       FALSE           35
## 3          LEXUS           8       FALSE           34
## 4        PORSCHE          61       FALSE           32
## 5       MASERATI           6       FALSE           31
## 6          VOLVO         503       FALSE           29
## 7            KIA        1079       FALSE           27
## 8          SMART           4       FALSE           26
## 9     VOLKSWAGEN        8535       FALSE           25
## 10       TOPLAM:       76345       FALSE           24
## 11        SUZUKI         323       FALSE           22
## 12         HONDA        3179       FALSE           21
## 13          OPEL        2989       FALSE           18
## 14       HYUNDAI        4417       FALSE           17
## 15         SKODA        2911       FALSE           17
## 16          FIAT        8431       FALSE           15
## 17          JEEP         255       FALSE           15
## 18        JAGUAR          24       FALSE           14
## 19        SUBARU         224       FALSE           13
## 20         DACIA        3935       FALSE           12
## 21        TOYOTA        3565       FALSE           12
## 22       FERRARI           2       FALSE           10
## 23      INFINITI           2       FALSE           10
## 24    LAND ROVER         162       FALSE            9
## 25        NISSAN        3105       FALSE            9
## 26           BMW        1586       FALSE            8
## 27 MERCEDES-BENZ        3152       FALSE            8
## 28     SSANGYONG          44       FALSE            8
## 29          AUDI        1552       FALSE            7
## 30         MAZDA         135       FALSE            6
## 31    ALFA ROMEO          13       FALSE            5
## 32          FORD        8518       FALSE            4
## 33       PEUGEOT        3523       FALSE            4
## 34          SEAT        1226       FALSE            4
## 35          MINI         116       FALSE            3
## 36  ASTON MARTIN           1       FALSE            1
## 37       BENTLEY           0       FALSE            0
## 38         CHERY           0       FALSE            0
## 39   LAMBORGHINI           0       FALSE            0
## 40        KARSAN         100        TRUE           -1
## 41    MITSUBISHI         448        TRUE           -2
## 42         ISUZU         142        TRUE           -4
## 43         IVECO         161        TRUE           -5
## 44       CITROEN        1792        TRUE          -16