Introduction

This is my first data analysis on R. Data is related with Calls made to London Fire Brigade between Jan 01 2017 to April 30 2017. Dataset is taken from Kaggle|London Fire Data.

I’ll do preliminary analysis on data and answer some questions and some visualizations.

Hope you enjoy :)

Loading Data and Preliminary Analysis

Load libraries used for this study.

# Load packages
library('ggplot2') # visualization
library('ggthemes') # visualization
library('dplyr') # data manipulation
library('scales') # visualization

Let’s import data set without header line.

# Read the data and omit the first row
calls = read.csv("~/london_fire_brigade_service_calls.csv", header = TRUE)

# See the dimension of our data
dim(calls)
## [1] 32247    32

Our data consisting of 32 columns and 32247 rows. Let’s what are these columns and their contents

# Structure and sample contents
glimpse(calls)
## Observations: 32,247
## Variables: 32
## $ ï..address_qualifier                       <fctr> Within same buildi...
## $ borough_code                               <fctr> E09000007, E090000...
## $ borough_name                               <fctr> CAMDEN, NEWHAM, WA...
## $ cal_year                                   <int> 2017, 2017, 2017, 2...
## $ date_of_call                               <fctr> 2017-01-20, 2017-0...
## $ easting_m                                  <int> 529459, NA, 536990,...
## $ easting_rounded                            <int> 529450, 539650, 536...
## $ first_pump_arriving_attendance_time        <int> 359, 211, NA, 295, ...
## $ first_pump_arriving_deployed_from_station  <fctr> Euston, Stratford,...
## $ frs                                        <fctr> London, London, Lo...
## $ hour_of_call                               <int> 8, 17, 18, 11, 17, ...
## $ incident_group                             <fctr> False Alarm, Speci...
## $ incident_number                            <fctr> 008148-20012017, 0...
## $ incident_station_ground                    <fctr> Euston, Stratford,...
## $ northing_m                                 <int> 182009, NA, 189395,...
## $ northing_rounded                           <int> 182050, 183750, 189...
## $ num_pumps_attending                        <int> 1, 1, 1, 2, 1, 2, 4...
## $ num_stations_with_pumps_attending          <int> 1, 1, 1, 2, 1, 2, 3...
## $ postcode_district                          <fctr> W1T, E15, E17, W13...
## $ postcode_full                              <fctr> W1T 7PD, , E17 6RH...
## $ proper_case                                <fctr> Camden, Newham, Wa...
## $ property_category                          <fctr> Non Residential, D...
## $ property_type                              <fctr> Purpose built offi...
## $ second_pump_arriving_attendance_time       <int> NA, NA, NA, 660, NA...
## $ second_pump_arriving_deployed_from_station <fctr> , , , Southall, , ...
## $ special_service_type                       <fctr> , No action (not f...
## $ stop_code_description                      <fctr> AFA, Special Servi...
## $ time_of_call                               <fctr> 08:57:38, 17:42:29...
## $ timestamp_of_call                          <fctr> 2017-01-20 08:57:3...
## $ ward_code                                  <fctr> E05000129, E050004...
## $ ward_name                                  <fctr> BLOOMSBURY, WEST H...
## $ ward_name_new                              <fctr> BLOOMSBURY, WEST H...
# Summary of data
summary(calls)
##                                    ï..address_qualifier    borough_code  
##  Correct incident location                   :19267     E09000033: 2469  
##  Within same building                        : 5669     E09000007: 1500  
##  In street outside gazetteer location        : 2592     E09000028: 1435  
##  On land associated with building            : 2114     E09000022: 1383  
##  In street close to gazetteer location       : 1128     E09000030: 1330  
##  Open land/water - nearest gazetteer location:  866     E09000012: 1177  
##  (Other)                                     :  611     (Other)  :22953  
##         borough_name      cal_year        date_of_call     easting_m     
##  WESTMINSTER  : 2469   Min.   :2017   2017-02-23:  525   Min.   :493654  
##  CAMDEN       : 1500   1st Qu.:2017   2017-04-09:  346   1st Qu.:525061  
##  SOUTHWARK    : 1435   Median :2017   2017-01-22:  335   Median :530716  
##  LAMBETH      : 1383   Mean   :2017   2017-04-07:  329   Mean   :530668  
##  TOWER HAMLETS: 1330   3rd Qu.:2017   2017-04-10:  317   3rd Qu.:536995  
##  HACKNEY      : 1177   Max.   :2017   2017-04-15:  315   Max.   :560804  
##  (Other)      :22953                  (Other)   :30080   NA's   :15411   
##  easting_rounded  first_pump_arriving_attendance_time
##  Min.   :492450   Min.   :   2.0                     
##  1st Qu.:525150   1st Qu.: 232.0                     
##  Median :530850   Median : 298.0                     
##  Mean   :530641   Mean   : 318.2                     
##  3rd Qu.:536450   3rd Qu.: 379.0                     
##  Max.   :563150   Max.   :1196.0                     
##                   NA's   :1819                       
##  first_pump_arriving_deployed_from_station            frs       
##                : 1819                      London       :32078  
##  Soho          : 1205                      OverTheBorder:  169  
##  Lambeth       :  641                                           
##  Paddington    :  625                                           
##  Euston        :  580                                           
##  West Hampstead:  560                                           
##  (Other)       :26817                                           
##   hour_of_call           incident_group         incident_number 
##  Min.   : 0.00   False Alarm    :15732   000003-01012017:    1  
##  1st Qu.: 9.00   Fire           : 6434   000004-01012017:    1  
##  Median :14.00   Special Service:10081   000005-01012017:    1  
##  Mean   :13.47                           000006-01012017:    1  
##  3rd Qu.:18.00                           000007-01012017:    1  
##  Max.   :23.00                           000008-01012017:    1  
##                                          (Other)        :32241  
##    incident_station_ground   northing_m     northing_rounded
##  Soho          : 1247      Min.   :152868   Min.   :152850  
##  Paddington    :  695      1st Qu.:175863   1st Qu.:175950  
##  Euston        :  694      Median :180962   Median :180950  
##  Lambeth       :  653      Mean   :180367   Mean   :180430  
##  Shoreditch    :  628      3rd Qu.:185018   3rd Qu.:185150  
##  West Hampstead:  574      Max.   :204891   Max.   :224250  
##  (Other)       :27756      NA's   :15411                    
##  num_pumps_attending num_stations_with_pumps_attending postcode_district
##  Min.   :1.000       Min.   :1.000                     CR0    :  612    
##  1st Qu.:1.000       1st Qu.:1.000                     SE1    :  522    
##  Median :1.000       Median :1.000                     E1     :  449    
##  Mean   :1.536       Mean   :1.357                     N1     :  431    
##  3rd Qu.:2.000       3rd Qu.:2.000                     E14    :  417    
##  Max.   :7.000       Max.   :6.000                     NW1    :  414    
##  NA's   :68          NA's   :68                        (Other):29402    
##   postcode_full          proper_case            property_category
##          :15411   Westminster  : 2469   Dwelling         :15240  
##  NW3 2PF :   36   Camden       : 1500   Non Residential  : 7769  
##  SW17 0QT:   31   Southwark    : 1435   Road Vehicle     : 2777  
##  SE18 4QH:   22   Lambeth      : 1383   Outdoor          : 2594  
##  TW6 2GB :   22   Tower Hamlets: 1330   Outdoor Structure: 2004  
##  EN5 3DJ :   21   Hackney      : 1177   Other Residential: 1796  
##  (Other) :16704   (Other)      :22953   (Other)          :   67  
##                                            property_type  
##  Purpose Built Flats/Maisonettes - 4 to 9 storeys : 3823  
##  House - single occupancy                         : 3784  
##  Purpose Built Flats/Maisonettes - Up to 3 storeys: 2909  
##  Car                                              : 1655  
##  Self contained Sheltered Housing                 : 1425  
##  Purpose built office                             : 1274  
##  (Other)                                          :17377  
##  second_pump_arriving_attendance_time
##  Min.   :   4.0                      
##  1st Qu.: 299.0                      
##  Median : 372.0                      
##  Mean   : 399.1                      
##  3rd Qu.: 464.0                      
##  Max.   :1195.0                      
##  NA's   :20281                       
##  second_pump_arriving_deployed_from_station
##                :20281                      
##  Soho          :  375                      
##  Hammersmith   :  342                      
##  Paddington    :  333                      
##  West Hampstead:  287                      
##  Lambeth       :  285                      
##  (Other)       :10344                      
##                   special_service_type               stop_code_description
##                             :22166     AFA                      :11811    
##  Flooding                   : 2061     Special Service          :10073    
##  Effecting entry/exit       : 1959     Primary Fire             : 3586    
##  Lift Release               : 1609     False alarm - Good intent: 3481    
##  RTC                        : 1391     Secondary Fire           : 2831    
##  No action (not false alarm):  706     False alarm - Malicious  :  440    
##  (Other)                    : 2355     (Other)                  :   25    
##    time_of_call             timestamp_of_call     ward_code    
##  18:03:40:    6   2017-01-10 20:54:46:    2   E05000649:  623  
##  12:00:03:    5   2017-01-26 22:51:36:    2   E05000644:  490  
##  16:32:40:    5   2017-02-05 13:51:44:    2   E05000138:  213  
##  18:58:49:    5   2017-02-16 11:50:55:    2   E05000641:  209  
##  19:21:23:    5   2017-03-01 20:43:29:    2   E05000367:  205  
##  19:22:30:    5   2017-03-11 20:36:59:    2   E05000331:  203  
##  (Other) :32216   (Other)            :32235   (Other)  :30304  
##                      ward_name                       ward_name_new  
##  WEST END                 :  623   WEST END                 :  623  
##  ST. JAMES'S              :  490   ST. JAMES'S              :  490  
##  FAIRFIELD                :  219   FAIRFIELD                :  219  
##  HOLBORN AND COVENT GARDEN:  213   HOLBORN AND COVENT GARDEN:  213  
##  MARYLEBONE HIGH STREET   :  209   MARYLEBONE HIGH STREET   :  209  
##  BUNHILL                  :  205   BUNHILL                  :  205  
##  (Other)                  :30288   (Other)                  :30288

Data Exploration and Visualization

By creating trend analysis we can see number of calls on each day.

  1. As you can see there is a outlier in Feb 23 and 523 calls made.
  2. Average 268 calls made
# Find the numbers of call on each day.
grp_call <- calls %>%
  group_by(date_of_call) %>%
  summarise(count=n()) 

#Show on the graph, see the trends
ggplot(data=grp_call, aes(x=date_of_call, y=count, group = 1)) +
  geom_line(colour = "red") +
  labs(title="Trend of Calls between Jan-April 2017", x="Date of Call", y="Count") +
  theme (axis.text.x=element_text (angle=-90,vjust=0.5,hjust=0))

#Average calls
grp_call %>% summarise(count=mean(count)) 
## # A tibble: 1 x 1
##     count
##     <dbl>
## 1 268.725

If we examine calls on day,

ggplot(data=calls,aes(x=calls$hour_of_call, group=1)) +
geom_bar() +
labs(title="Call trend by the hour of day",x="Hour of Call",y="Count")

If we examine the response time;
* Regardless of the hour of call, the average first response time is around 5 mins for all hours of the day

data <- calls %>% mutate(first_pup_arriving_min=first_pump_arriving_attendance_time/60)
## Warning: package 'bindrcpp' was built under R version 3.4.2
ggplot(data=data,aes(y=data$first_pup_arriving_min, x=data$hour_of_call)) +
  geom_boxplot(aes(group = cut_width(hour_of_call, 0.5)), outlier.alpha = 0.1, outlier.colour = "red", colour = "#3366FF")+
labs(title="Average First Response Time",x="Hour of Call",y="Minute")
## Warning: Removed 1819 rows containing non-finite values (stat_boxplot).

What about if we examine the highest calls day

#Highest call of the day
data <- calls %>% filter(date_of_call=="2017-02-23")
data <- data %>% mutate(first_pup_arriving_min=first_pump_arriving_attendance_time/60)


ggplot(data=data,aes(y=data$first_pup_arriving_min, x=data$hour_of_call)) +
  geom_boxplot(aes(group = cut_width(hour_of_call, 0.5)), outlier.alpha = 0.1, outlier.colour = "red", colour = "#3366FF")+
labs(title="Average First Response Time",x="Hour of Call",y="Minute")
## Warning: Removed 25 rows containing non-finite values (stat_boxplot).

If we want to look into which property type has most calls, Residential house is first one as expected.

#Find the #of calls for each property category
data <- calls %>%
  group_by(property_category) %>%
  summarise(count=n())  %>%
  arrange(desc(count))

data
## # A tibble: 9 x 2
##   property_category count
##              <fctr> <int>
## 1          Dwelling 15240
## 2   Non Residential  7769
## 3      Road Vehicle  2777
## 4           Outdoor  2594
## 5 Outdoor Structure  2004
## 6 Other Residential  1796
## 7          Aircraft    25
## 8              Boat    24
## 9      Rail Vehicle    18
ggplot(data=data,aes(x=reorder(property_category,-count), y=data$count)) +
  geom_bar(stat="identity") +
  labs(title="Calls for Property Category",x="Property Category",y="Count") +
  theme (axis.text.x=element_text (angle=90,vjust=0.5, hjust=0))