Here are the libraries that we are going to use;
library(tidyverse) #Tidyverse package
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(readr) #Reading xlsx files
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
library(ggplot2) # visualization
library(scales) # visualization
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr) # data manipulation
library(knitr)
UNODC Homicide Statistics provide users with a reference for the largest number of countries and the longest time series on homicide data possible. A variety of national and international sources on homicide have been considered to compile UNODC Homicide Statistics and, in order to present accurate and comparable statistics, data have been selected which conform as much as possible to the definition of intentional homicide used by UNODC for statistical purpose: ‘‘unlawful death purposefully inflicted on a person by another person”.
All existing data sources on intentional homicides, both at national and international level, stem from either criminal justice or public health systems. In the former case, data are generated by law enforcement or criminal justice authorities in the process of recording and investigating a crime event while, in the latter, data are produced by health authorities certifying the cause of death of an individual. Public health data on homicides were derived from databases on deaths by cause disseminated by WHO, both at central level and through some of its regional offices.
Data included in the dataset correspond to the original value provided by the source of origin, since no statistical procedure or modelling was used to change collected values or to create new or revised figures. http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1
1.1 Load UN Data and set as a working directory :
setwd("C:/Users/merye/Desktop")
1.2 Check Dimension of Data :
library(readr)
UNdata_Export_20171026_130851047 <- read_csv("C:/Users/merye/Desktop/UNdata_Export_20171026_130851047.zip")
## Parsed with column specification:
## cols(
## `Country or Area` = col_character(),
## Year = col_integer(),
## Count = col_integer(),
## Rate = col_double(),
## Source = col_character(),
## `Source Type` = col_character()
## )
View(UNdata_Export_20171026_130851047)
1.2 Check Dimension of Data :
dim(UNdata_Export_20171026_130851047)
## [1] 1719 6
1.3 Glimpse data :
glimpse(UNdata_Export_20171026_130851047)
## Observations: 1,719
## Variables: 6
## $ `Country or Area` <chr> "Afghanistan", "Albania", "Albania", "Albani...
## $ Year <int> 2008, 2010, 2009, 2008, 2007, 2006, 2005, 20...
## $ Count <int> 712, 127, 85, 93, 105, 87, 131, 119, 144, 23...
## $ Rate <dbl> 2.4, 4.0, 2.7, 2.9, 3.3, 2.8, 4.2, 3.8, 4.6,...
## $ Source <chr> "WHO", "CTS/Transmonee", "CTS/Transmonee", "...
## $ `Source Type` <chr> "PH", "CJ", "CJ", "CJ", "CJ", "CJ", "CJ", "C...
1.4 Summary of data :
summary(UNdata_Export_20171026_130851047)
## Country or Area Year Count Rate
## Length:1719 Min. :1995 Min. : 0 Min. : 0.000
## Class :character 1st Qu.:2000 1st Qu.: 46 1st Qu.: 1.600
## Mode :character Median :2004 Median : 240 Median : 4.200
## Mean :2003 Mean : 2183 Mean : 9.497
## 3rd Qu.:2007 3rd Qu.: 879 3rd Qu.: 11.100
## Max. :2011 Max. :45559 Max. :139.100
## Source Source Type
## Length:1719 Length:1719
## Class :character Class :character
## Mode :character Mode :character
##
##
##
2.1 Visualized data with using ggplot and check in bars :
p1 <- ggplot(data = UNdata_Export_20171026_130851047, aes(x=Year, y=Rate))
p1 + geom_bar(stat="identity")
#According to bar, 2008 is peak year
2.2 Visualized data with geom_points :
p3 <- ggplot(data = UNdata_Export_20171026_130851047, aes(x=Year, y=Count))
p3 + geom_point(stat = "identity")
2.3 Filtered data for Turkey :
filter1<- UNdata_Export_20171026_130851047 %>%
group_by(`Country or Area`) %>%
filter(`Country or Area`=="Turkey") %>% ggplot(aes(x=Year, y = Count)) + geom_bar(stat = "identity")
filter1
2.4 Filtered data for Luxembourg :
filter2<- UNdata_Export_20171026_130851047 %>%
group_by(`Country or Area`) %>%
filter(`Country or Area`=="Luxembourg") %>% ggplot(aes(x=Year, y = Count)) + geom_bar(stat = "identity")
filter2
2.5 Filtered data for India :
filter3<- UNdata_Export_20171026_130851047 %>%
group_by(`Country or Area`) %>%
filter(`Country or Area`=="India") %>% ggplot(aes(x=Year, y = Rate)) + geom_bar(stat = "identity")
filter3
2.6 Filtred data for Brazil :
filter4<- UNdata_Export_20171026_130851047 %>%
group_by(`Country or Area`) %>%
filter(`Country or Area`=="Brazil") %>% ggplot(aes(x=Year, y = Count)) + geom_bar(stat = "identity")
filter4
We check homicide rates for some countries and for whole countries.