I checked for a data set online and found Human Resources data-set on Kaggle. I thought I might find some interesting realtionships about the working conditions. I downloaded the csv file and import the data to RMarkdown by using read.csv function.
getwd()
## [1] "C:/Users/Lenovo/Desktop"
HumanResources <- read.csv("HR_comma_sep.csv")
I wanted to see a summary of the data. I see the name of the columns some simple statistical outputs like range, interquartile range, min and max.
summary(HumanResources)
## satisfaction_level last_evaluation number_project average_montly_hours
## Min. :0.0900 Min. :0.3600 Min. :2.000 Min. : 96.0
## 1st Qu.:0.4400 1st Qu.:0.5600 1st Qu.:3.000 1st Qu.:156.0
## Median :0.6400 Median :0.7200 Median :4.000 Median :200.0
## Mean :0.6128 Mean :0.7161 Mean :3.803 Mean :201.1
## 3rd Qu.:0.8200 3rd Qu.:0.8700 3rd Qu.:5.000 3rd Qu.:245.0
## Max. :1.0000 Max. :1.0000 Max. :7.000 Max. :310.0
##
## time_spend_company Work_accident left
## Min. : 2.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 3.000 Median :0.0000 Median :0.0000
## Mean : 3.498 Mean :0.1446 Mean :0.2381
## 3rd Qu.: 4.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :10.000 Max. :1.0000 Max. :1.0000
##
## promotion_last_5years sales salary
## Min. :0.00000 sales :4140 high :1237
## 1st Qu.:0.00000 technical :2720 low :7316
## Median :0.00000 support :2229 medium:6446
## Mean :0.02127 IT :1227
## 3rd Qu.:0.00000 product_mng: 902
## Max. :1.00000 marketing : 858
## (Other) :2923
str("HumanResources")
## chr "HumanResources"
names("HumanResources")
## NULL
After I called ggplot2 to make some simple diagrams. I wanted to see the bar chart of number of the project that the workers do.
library(ggplot2)
qplot(x=number_project, data=HumanResources)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
After I wanted to checked the monthly hours that the workers work. I decided the bin range to be 10.
qplot(x=average_montly_hours, data=HumanResources, binwidth=10)
It made me wonder if there is any relationship between the average monthly hours and the number of the project that workers take. I used facet_wrap function. Then when I checked the below charts I can say that the workers who take more projects are tend to work more hours.
qplot(x=average_montly_hours, data=HumanResources, binwidth=10)+
facet_wrap(~number_project)