December 19, 2017

Packages Employed

library(plyr)
library(scales)
library(tidyverse)
library(ggplot2)
library(ggcorrplot)
library(ggthemes)
library(formattable)
library(htmlwidgets)
library(ggalt)
library(party)
library(rpart)
library(rpart.plot)
library(pROC)

Human Resources Data Set

Human Resources Analytics Data from kaggle with 14999x10 rows&columns

## Observations: 14,999
## Variables: 10
## $ satisfaction_level    <dbl> 0.38, 0.80, 0.11, 0.72, 0.37, 0.41, 0.10...
## $ last_evaluation       <dbl> 0.53, 0.86, 0.88, 0.87, 0.52, 0.50, 0.77...
## $ number_project        <int> 2, 5, 7, 5, 2, 2, 6, 5, 5, 2, 2, 6, 4, 2...
## $ average_montly_hours  <int> 157, 262, 272, 223, 159, 153, 247, 259, ...
## $ time_spend_company    <int> 3, 6, 4, 5, 3, 3, 4, 5, 5, 3, 3, 4, 5, 3...
## $ Work_accident         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ left                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ promotion_last_5years <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ departments           <fctr> sales, sales, sales, sales, sales, sale...
## $ salary                <fctr> low, medium, medium, low, low, low, low...

Data Description

  • satisfaction_level: Job satisfaction level (0.0 - 1.0)
  • last_evaluation: Last evaluation score (0.0 - 1.0)
  • number_project: Number of projects worked on (yearly basis)
  • average_montly_hours: Average monthly working hours
  • time_spend_company: Time spent in the company (in years)
  • Work_accident: Whether they have had a work accident in the last 2 years (if yes 1, else 0)
  • promotion_last_5years: Whether they have had a promotion in the last 5 years
  • departments: Name of the department
  • salary: Salary (low-medium-high)
  • left: Whether the employee has left (if yes 1, else 0)

Descriptive Statistics of the Numerical Features

##  satisfaction_level last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900     Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4400     1st Qu.:0.5600   1st Qu.:3.000   1st Qu.:156.0       
##  Median :0.6400     Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6128     Mean   :0.7161   Mean   :3.803   Mean   :201.1       
##  3rd Qu.:0.8200     3rd Qu.:0.8700   3rd Qu.:5.000   3rd Qu.:245.0       
##  Max.   :1.0000     Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##  time_spend_company Work_accident         left       
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 3.000     Median :0.0000   Median :0.0000  
##  Mean   : 3.498     Mean   :0.1446   Mean   :0.2381  
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :10.000     Max.   :1.0000   Max.   :1.0000  
##  promotion_last_5years
##  Min.   :0.00000      
##  1st Qu.:0.00000      
##  Median :0.00000      
##  Mean   :0.02127      
##  3rd Qu.:0.00000      
##  Max.   :1.00000

Departmentwise Salary Frequencies

Histograms of the Numerical Variables

How Many Left from Which Department?

Department Left Rate by Satisfaction & Salary

Satisfaction vs. Working Hours

Satisfaction vs. Time Spent

Group Characteristics

  • Group 1: Low satisfaction level, long working hours and leaves in medium term (3-5 years)
  • Group 2: Mediocre satisfaction level, short working hours and leaves early (around 3 years)
  • Group 3: High satisfaction level and high working hours and leaves late (5-6 years)

Correlation Matrix

PCA: Biplot

PCA: Cumulative Variance

K-Means Clustering

##   satisfaction_level last_evaluation number_project average_montly_hours
## 1          0.6100000       0.8891911       4.936889             243.6409
## 2          0.4159149       0.5306140       2.178723             144.8322
## 3          0.2511361       0.8628964       5.780275             285.0799
##   time_spend_company Work_accident promotion_last_5years
## 1           4.778667    0.05333333          0.0008888889
## 2           3.071733    0.04741641          0.0091185410
## 3           4.262172    0.03870162          0.0037453184

Cluster-Group Correspondence

  • Cluster 1: High satisfaction level, long working hours, leaves late (corresponds to Group 3)
  • Cluster 2: Low satisfaction level, very long working hours, leaves at medium term (corresponds to Group 1)
  • Cluster 3: Medium satisfaction level, short working hours, leaves early (corresponds to Group 2)

Decision Tree Analysis And Prediction

Secondary factor that determines the satisfied employees quit decision is time spent in the company. If someone is working in the same company between 4.5 - 6.5 years, they break their comfort zone and flee because they work for long hours (> 216 h monthly basis) even if their bosses are very happy with their work (evaluation score > 0.80). We belive that this category corresponds to patient, hardworking, motivated employees of Cluster 1 (i.e. Group 3).

Conclusion

  • Satisfaction determines resignation. If low they quit!
  • People w/ VERY LOW satisfaction can work for VERY LONG working hours, ends the contract in medium terms.
  • People w/ MEDIOCRE satisfaction DO NOT work for long hours, resign quickly.
  • Patient, HIGHLY satisfied, hardworking workers quit latest.
  • Promotion and salary system must be improved for the good of the employees.

References