Explanatory Data Analysis on School Life Expectancy of Different Countries by Gender

First of all I set my working environment.

setwd('C:/Users/Duygu/Documents/GitHub/pj-cand/files')

The csv file is read:

data <- read.csv('SchoolLifeExpectancy.csv')

The dimensions of my data are:

dim(data)
## [1] 1827    7

Let’s have a quick look to the data file:

head(data, 5)
##   Country.or.Area Subgroup Year                      Source  Unit Value
## 1     Afghanistan   Female 2004 UNESCO_UIS Database_Sep2007 Years     4
## 2     Afghanistan   Female 2003 UNESCO_UIS Database_Sep2007 Years     4
## 3     Afghanistan     Male 2004 UNESCO_UIS Database_Sep2007 Years     9
## 4     Afghanistan     Male 2003 UNESCO_UIS Database_Sep2007 Years     8
## 5         Albania   Female 2004 UNESCO_UIS Database_Sep2007 Years    12
##   Value.Footnotes
## 1               1
## 2               1
## 3               1
## 4               1
## 5               1

And the file ends like this:

tail(data,5)
##      Country.or.Area             Subgroup Year                      Source
## 1823        Zimbabwe                 Male 2001 UNESCO_UIS Database_Sep2007
## 1824        Zimbabwe                 Male 2000 UNESCO_UIS Database_Sep2007
## 1825         fnSeqID             Footnote   NA                            
## 1826               1      UIS estimation.   NA                            
## 1827               2 National Estimation.   NA                            
##       Unit Value Value.Footnotes
## 1823 Years    10               1
## 1824 Years    10               1
## 1825          NA              NA
## 1826          NA              NA
## 1827          NA              NA

Here is the structure of the data:

str(data)
## 'data.frame':    1827 obs. of  7 variables:
##  $ Country.or.Area: Factor w/ 187 levels "1","2","Afghanistan",..: 3 3 3 3 4 4 4 4 4 4 ...
##  $ Subgroup       : Factor w/ 5 levels "Female","Footnote",..: 1 1 3 3 1 1 1 1 1 1 ...
##  $ Year           : int  2004 2003 2004 2003 2004 2003 2002 2001 2000 1999 ...
##  $ Source         : Factor w/ 2 levels "","UNESCO_UIS Database_Sep2007": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Unit           : Factor w/ 2 levels "","Years": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Value          : int  4 4 9 8 12 11 11 11 11 11 ...
##  $ Value.Footnotes: int  1 1 1 1 1 NA 1 1 1 1 ...

Summary of the data is:

summary(data)
##    Country.or.Area                 Subgroup        Year     
##  Aruba     :  14   Female              :912   Min.   :1999  
##  Australia :  14   Footnote            :  1   1st Qu.:2000  
##  Austria   :  14   Male                :912   Median :2002  
##  Azerbaijan:  14   National Estimation.:  1   Mean   :2002  
##  Bahamas   :  14   UIS estimation.     :  1   3rd Qu.:2004  
##  Belarus   :  14                              Max.   :2005  
##  (Other)   :1743                              NA's   :3     
##                          Source        Unit          Value     
##                             :   3        :   3   Min.   : 2.0  
##  UNESCO_UIS Database_Sep2007:1824   Years:1824   1st Qu.:11.0  
##                                                  Median :12.5  
##                                                  Mean   :12.3  
##                                                  3rd Qu.:14.0  
##                                                  Max.   :21.0  
##                                                  NA's   :3     
##  Value.Footnotes
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :1.000  
##  Mean   :1.068  
##  3rd Qu.:1.000  
##  Max.   :2.000  
##  NA's   :503

That’s all for now.