This is my first data analysis on R. Data is related with Calls made to London Fire Brigade between Jan 01 2017 to April 30 2017. Dataset is taken from Kaggle|London Fire Data.
I’ll do preliminary analysis on data and answer some questions and some visualizations.
Hope you enjoy :)
Load libraries used for this study.
# Load packages
library('ggplot2') # visualization
library('ggthemes') # visualization
library('dplyr') # data manipulation
library('scales') # visualization
Let’s import data set without header line.
# Read the data and omit the first row
calls = read.csv("~/london_fire_brigade_service_calls.csv", header = TRUE)
# See the dimension of our data
dim(calls)
## [1] 32247 32
Our data consisting of 32 columns and 32247 rows. Let’s what are these columns and their contents
# Structure and sample contents
glimpse(calls)
## Observations: 32,247
## Variables: 32
## $ ï..address_qualifier <fctr> Within same buildi...
## $ borough_code <fctr> E09000007, E090000...
## $ borough_name <fctr> CAMDEN, NEWHAM, WA...
## $ cal_year <int> 2017, 2017, 2017, 2...
## $ date_of_call <fctr> 2017-01-20, 2017-0...
## $ easting_m <int> 529459, NA, 536990,...
## $ easting_rounded <int> 529450, 539650, 536...
## $ first_pump_arriving_attendance_time <int> 359, 211, NA, 295, ...
## $ first_pump_arriving_deployed_from_station <fctr> Euston, Stratford,...
## $ frs <fctr> London, London, Lo...
## $ hour_of_call <int> 8, 17, 18, 11, 17, ...
## $ incident_group <fctr> False Alarm, Speci...
## $ incident_number <fctr> 008148-20012017, 0...
## $ incident_station_ground <fctr> Euston, Stratford,...
## $ northing_m <int> 182009, NA, 189395,...
## $ northing_rounded <int> 182050, 183750, 189...
## $ num_pumps_attending <int> 1, 1, 1, 2, 1, 2, 4...
## $ num_stations_with_pumps_attending <int> 1, 1, 1, 2, 1, 2, 3...
## $ postcode_district <fctr> W1T, E15, E17, W13...
## $ postcode_full <fctr> W1T 7PD, , E17 6RH...
## $ proper_case <fctr> Camden, Newham, Wa...
## $ property_category <fctr> Non Residential, D...
## $ property_type <fctr> Purpose built offi...
## $ second_pump_arriving_attendance_time <int> NA, NA, NA, 660, NA...
## $ second_pump_arriving_deployed_from_station <fctr> , , , Southall, , ...
## $ special_service_type <fctr> , No action (not f...
## $ stop_code_description <fctr> AFA, Special Servi...
## $ time_of_call <fctr> 08:57:38, 17:42:29...
## $ timestamp_of_call <fctr> 2017-01-20 08:57:3...
## $ ward_code <fctr> E05000129, E050004...
## $ ward_name <fctr> BLOOMSBURY, WEST H...
## $ ward_name_new <fctr> BLOOMSBURY, WEST H...
# Summary of data
summary(calls)
## ï..address_qualifier borough_code
## Correct incident location :19267 E09000033: 2469
## Within same building : 5669 E09000007: 1500
## In street outside gazetteer location : 2592 E09000028: 1435
## On land associated with building : 2114 E09000022: 1383
## In street close to gazetteer location : 1128 E09000030: 1330
## Open land/water - nearest gazetteer location: 866 E09000012: 1177
## (Other) : 611 (Other) :22953
## borough_name cal_year date_of_call easting_m
## WESTMINSTER : 2469 Min. :2017 2017-02-23: 525 Min. :493654
## CAMDEN : 1500 1st Qu.:2017 2017-04-09: 346 1st Qu.:525061
## SOUTHWARK : 1435 Median :2017 2017-01-22: 335 Median :530716
## LAMBETH : 1383 Mean :2017 2017-04-07: 329 Mean :530668
## TOWER HAMLETS: 1330 3rd Qu.:2017 2017-04-10: 317 3rd Qu.:536995
## HACKNEY : 1177 Max. :2017 2017-04-15: 315 Max. :560804
## (Other) :22953 (Other) :30080 NA's :15411
## easting_rounded first_pump_arriving_attendance_time
## Min. :492450 Min. : 2.0
## 1st Qu.:525150 1st Qu.: 232.0
## Median :530850 Median : 298.0
## Mean :530641 Mean : 318.2
## 3rd Qu.:536450 3rd Qu.: 379.0
## Max. :563150 Max. :1196.0
## NA's :1819
## first_pump_arriving_deployed_from_station frs
## : 1819 London :32078
## Soho : 1205 OverTheBorder: 169
## Lambeth : 641
## Paddington : 625
## Euston : 580
## West Hampstead: 560
## (Other) :26817
## hour_of_call incident_group incident_number
## Min. : 0.00 False Alarm :15732 000003-01012017: 1
## 1st Qu.: 9.00 Fire : 6434 000004-01012017: 1
## Median :14.00 Special Service:10081 000005-01012017: 1
## Mean :13.47 000006-01012017: 1
## 3rd Qu.:18.00 000007-01012017: 1
## Max. :23.00 000008-01012017: 1
## (Other) :32241
## incident_station_ground northing_m northing_rounded
## Soho : 1247 Min. :152868 Min. :152850
## Paddington : 695 1st Qu.:175863 1st Qu.:175950
## Euston : 694 Median :180962 Median :180950
## Lambeth : 653 Mean :180367 Mean :180430
## Shoreditch : 628 3rd Qu.:185018 3rd Qu.:185150
## West Hampstead: 574 Max. :204891 Max. :224250
## (Other) :27756 NA's :15411
## num_pumps_attending num_stations_with_pumps_attending postcode_district
## Min. :1.000 Min. :1.000 CR0 : 612
## 1st Qu.:1.000 1st Qu.:1.000 SE1 : 522
## Median :1.000 Median :1.000 E1 : 449
## Mean :1.536 Mean :1.357 N1 : 431
## 3rd Qu.:2.000 3rd Qu.:2.000 E14 : 417
## Max. :7.000 Max. :6.000 NW1 : 414
## NA's :68 NA's :68 (Other):29402
## postcode_full proper_case property_category
## :15411 Westminster : 2469 Dwelling :15240
## NW3 2PF : 36 Camden : 1500 Non Residential : 7769
## SW17 0QT: 31 Southwark : 1435 Road Vehicle : 2777
## SE18 4QH: 22 Lambeth : 1383 Outdoor : 2594
## TW6 2GB : 22 Tower Hamlets: 1330 Outdoor Structure: 2004
## EN5 3DJ : 21 Hackney : 1177 Other Residential: 1796
## (Other) :16704 (Other) :22953 (Other) : 67
## property_type
## Purpose Built Flats/Maisonettes - 4 to 9 storeys : 3823
## House - single occupancy : 3784
## Purpose Built Flats/Maisonettes - Up to 3 storeys: 2909
## Car : 1655
## Self contained Sheltered Housing : 1425
## Purpose built office : 1274
## (Other) :17377
## second_pump_arriving_attendance_time
## Min. : 4.0
## 1st Qu.: 299.0
## Median : 372.0
## Mean : 399.1
## 3rd Qu.: 464.0
## Max. :1195.0
## NA's :20281
## second_pump_arriving_deployed_from_station
## :20281
## Soho : 375
## Hammersmith : 342
## Paddington : 333
## West Hampstead: 287
## Lambeth : 285
## (Other) :10344
## special_service_type stop_code_description
## :22166 AFA :11811
## Flooding : 2061 Special Service :10073
## Effecting entry/exit : 1959 Primary Fire : 3586
## Lift Release : 1609 False alarm - Good intent: 3481
## RTC : 1391 Secondary Fire : 2831
## No action (not false alarm): 706 False alarm - Malicious : 440
## (Other) : 2355 (Other) : 25
## time_of_call timestamp_of_call ward_code
## 18:03:40: 6 2017-01-10 20:54:46: 2 E05000649: 623
## 12:00:03: 5 2017-01-26 22:51:36: 2 E05000644: 490
## 16:32:40: 5 2017-02-05 13:51:44: 2 E05000138: 213
## 18:58:49: 5 2017-02-16 11:50:55: 2 E05000641: 209
## 19:21:23: 5 2017-03-01 20:43:29: 2 E05000367: 205
## 19:22:30: 5 2017-03-11 20:36:59: 2 E05000331: 203
## (Other) :32216 (Other) :32235 (Other) :30304
## ward_name ward_name_new
## WEST END : 623 WEST END : 623
## ST. JAMES'S : 490 ST. JAMES'S : 490
## FAIRFIELD : 219 FAIRFIELD : 219
## HOLBORN AND COVENT GARDEN: 213 HOLBORN AND COVENT GARDEN: 213
## MARYLEBONE HIGH STREET : 209 MARYLEBONE HIGH STREET : 209
## BUNHILL : 205 BUNHILL : 205
## (Other) :30288 (Other) :30288
By creating trend analysis we can see number of calls on each day.
# Find the numbers of call on each day.
grp_call <- calls %>%
group_by(date_of_call) %>%
summarise(count=n())
#Show on the graph, see the trends
ggplot(data=grp_call, aes(x=date_of_call, y=count, group = 1)) +
geom_line(colour = "red") +
labs(title="Trend of Calls between Jan-April 2017", x="Date of Call", y="Count") +
theme (axis.text.x=element_text (angle=-90,vjust=0.5,hjust=0))
#Average calls
grp_call %>% summarise(count=mean(count))
## # A tibble: 1 x 1
## count
## <dbl>
## 1 268.725
If we examine calls on day,
ggplot(data=calls,aes(x=calls$hour_of_call, group=1)) +
geom_bar() +
labs(title="Call trend by the hour of day",x="Hour of Call",y="Count")
If we examine the response time;
* Regardless of the hour of call, the average first response time is around 5 mins for all hours of the day
data <- calls %>% mutate(first_pup_arriving_min=first_pump_arriving_attendance_time/60)
## Warning: package 'bindrcpp' was built under R version 3.4.2
ggplot(data=data,aes(y=data$first_pup_arriving_min, x=data$hour_of_call)) +
geom_boxplot(aes(group = cut_width(hour_of_call, 0.5)), outlier.alpha = 0.1, outlier.colour = "red", colour = "#3366FF")+
labs(title="Average First Response Time",x="Hour of Call",y="Minute")
## Warning: Removed 1819 rows containing non-finite values (stat_boxplot).
What about if we examine the highest calls day
#Highest call of the day
data <- calls %>% filter(date_of_call=="2017-02-23")
data <- data %>% mutate(first_pup_arriving_min=first_pump_arriving_attendance_time/60)
ggplot(data=data,aes(y=data$first_pup_arriving_min, x=data$hour_of_call)) +
geom_boxplot(aes(group = cut_width(hour_of_call, 0.5)), outlier.alpha = 0.1, outlier.colour = "red", colour = "#3366FF")+
labs(title="Average First Response Time",x="Hour of Call",y="Minute")
## Warning: Removed 25 rows containing non-finite values (stat_boxplot).
If we want to look into which property type has most calls, Residential house is first one as expected.
#Find the #of calls for each property category
data <- calls %>%
group_by(property_category) %>%
summarise(count=n()) %>%
arrange(desc(count))
data
## # A tibble: 9 x 2
## property_category count
## <fctr> <int>
## 1 Dwelling 15240
## 2 Non Residential 7769
## 3 Road Vehicle 2777
## 4 Outdoor 2594
## 5 Outdoor Structure 2004
## 6 Other Residential 1796
## 7 Aircraft 25
## 8 Boat 24
## 9 Rail Vehicle 18
ggplot(data=data,aes(x=reorder(property_category,-count), y=data$count)) +
geom_bar(stat="identity") +
labs(title="Calls for Property Category",x="Property Category",y="Count") +
theme (axis.text.x=element_text (angle=90,vjust=0.5, hjust=0))