Turkey Migration Map

1- Source Files For The Analysis

We are going to use TUIK data to map domestic migration in Turkey. Original excel file can be found in this link.

1.1- Description Of The Data Set

Data set we obtained from the TUIK website contains 4 years of migrated population within Turkey. Original excel file contains 329 rows and 86 columns of information. Each row represents the destination cities which people migrated to and on each column we can obtain the distribution of migrated population by the place of birth. There are 81 distinct Turkish cities as a destination and 83 distinc columns as a place of birth with the addition for people who born abroad or unknown locations.

2- Objectives

Primary goal of this analysis is to find & address the specific patterns that shape the migration dynamics in Turkey.
First we will analyze the how the migration figures evolved in four years and then visualize the common patterns observed along this period.
Finally we will try to connect additional demographic statistics from TUIK and show that what would be the population density across cities if people could migrate to place where they born instead of where they migrated actually.

3- Abstract

Human migration is the movement by people from one place to another with the intentions of settling, permanently or temporarily in a new location. The movement is often over long distances and from one country to another, but internal migration is also possible; indeed, this is the dominant form globally. The simplest definition of migration is living things and people moving or relocating from one place to another. If migration takes place within the borders of a country, then it is called internal migration; if it crosses a country’s border, it is called external or international migration. Migration can occur owing to social, economic, political, cultural, and ethnic reasons. Internal migration began with modernization in agriculture and industrialization activities after the Second World War in Turkey. Migration first occured from village to city, then from small- and medium-sized cities to large cities. In the 1990s, the shape of a new migration emerged: from cities to villages. Internal migration has caused social changes in both city and village settlements. Along with these changes, a number of problems emerged, especially in those of cities, and these problems exist till date.This study analyzed internal migration addresses from a city to another city in Turkey between 2014 and 2017.

4- Data Cleaning & Pre-processing

We need tidyverse library at that point.

library(tidyverse)

## -- Attaching packages -------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0

## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

4.1- Reading The Data

We will start by reading raw data from the excel file and get rid of some unnecessary lines containing footnotes.

# First three lines are empty because of header format of the excel and last 4 only includes footnotes
tmp1<-tempfile(fileext=".xls")
download.file("https://github.com/MEF-BDA503/gpj18-r_boys/blob/master/source_files/domestic_migration_inflow&outflow.xls?raw=true",mode = "wb",destfile=tmp1)
maindata<-readxl::read_excel(tmp1,skip=3) %>%
  slice(-c(329:333))

## Warning: Mangling the following names: Adiyaman -> Adiyaman, Agri -> Agri, Aydin -> Aydin, Balikesir -> Balikesir, Çankiri -> Çankiri, Diyarbakir -> Diyarbakir, Elazig -> Elazig, Eskisehir -> Eskisehir, Gümüshane -> Gümüshane, Istanbul -> Istanbul, Izmir -> Izmir, Kirklareli -> Kirklareli, Kirsehir -> Kirsehir, Kahramanmaras -> Kahramanmaras, Mugla -> Mugla, Mus -> Mus, Nevsehir -> Nevsehir, Nigde -> Nigde, Tekirdag -> Tekirdag, Sanliurfa -> Sanliurfa, Usak -> Usak, Kirikkale -> Kirikkale, Sirnak -> Sirnak, Bartin -> Bartin, Igdir -> Igdir, Yurt disi
## Abroad -> Yurt disi
## Abroad. Use enc2native() to avoid the warning.

First thing to do at this stage is checking data structure as r understand in order to identify what needs to be done to have a clean and civilized data.

str(maindata)

## Classes 'tbl_df', 'tbl' and 'data.frame':    328 obs. of  86 variables:
##  $ Yil                : chr  "2017" "2017" "2017" "2017" ...
##  $ Alan               : chr  "Toplam-Total" "Adana" "Adiyaman" "Afyonkarahisar" ...
##  $ Toplam             : num  2684820 49509 18040 21453 15088 ...
##  $ Adana              : num  78422 21992 460 271 183 ...
##  $ Adiyaman           : num  34453 785 9448 100 115 ...
##  $ Afyonkarahisar     : num  28264 82 21 7301 112 ...
##  $ Agri               : num  40965 184 57 143 6201 ...
##  $ Amasya             : num  18889 61 12 68 63 ...
##  $ Ankara             : num  121902 827 208 943 427 ...
##  $ Antalya            : num  36961 344 93 590 99 ...
##  $ Artvin             : num  12760 35 4 23 18 ...
##  $ Aydin              : num  25531 92 44 281 118 ...
##  $ Balikesir          : num  34358 142 42 214 131 ...
##  $ Bilecik            : num  5080 9 3 41 11 14 238 128 6 68 ...
##  $ Bingöl             : num  15190 279 19 27 45 ...
##  $ Bitlis             : num  22844 397 41 54 139 ...
##  $ Bolu               : num  11036 45 9 49 35 ...
##  $ Burdur             : num  9149 45 10 169 29 ...
##  $ Bursa              : num  49327 163 74 396 131 ...
##  $ Çanakkale          : num  12448 40 7 61 53 ...
##  $ Çankiri            : num  15160 36 9 31 30 ...
##  $ Çorum              : num  35073 109 46 128 116 ...
##  $ Denizli            : num  24717 87 39 687 100 ...
##  $ Diyarbakir         : num  65475 1137 332 129 264 ...
##  $ Edirne             : num  14099 42 14 40 56 ...
##  $ Elazig             : num  27630 510 151 84 79 ...
##  $ Erzincan           : num  15579 53 16 28 57 ...
##  $ Erzurum            : num  52387 205 34 123 367 ...
##  $ Eskisehir          : num  24830 106 32 439 76 ...
##  $ Gaziantep          : num  50602 1081 958 210 124 ...
##  $ Giresun            : num  35988 50 32 58 69 ...
##  $ Gümüshane          : num  22786 21 7 31 31 ...
##  $ Hakkari            : num  13252 70 23 22 67 ...
##  $ Hatay              : num  53491 1661 250 191 160 ...
##  $ Isparta            : num  15828 76 21 515 56 ...
##  $ Mersin             : num  56079 2473 401 276 151 ...
##  $ Istanbul           : num  259999 1325 816 859 642 ...
##  $ Izmir              : num  74483 428 103 852 288 ...
##  $ Kars               : num  22714 48 24 91 114 ...
##  $ Kastamonu          : num  20273 56 18 34 26 ...
##  $ Kayseri            : num  43456 696 99 146 164 ...
##  $ Kirklareli         : num  10087 37 8 27 27 ...
##  $ Kirsehir           : num  15435 56 27 44 58 ...
##  $ Kocaeli            : num  33707 112 32 152 78 ...
##  $ Konya              : num  59738 494 106 1096 244 ...
##  $ Kütahya            : num  20652 51 14 395 77 ...
##  $ Malatya            : num  39688 605 787 95 101 ...
##  $ Manisa             : num  38248 153 37 486 141 ...
##  $ Kahramanmaras      : num  48577 1116 554 189 138 ...
##  $ Mardin             : num  45338 1302 134 94 112 ...
##  $ Mugla              : num  17591 110 27 198 49 ...
##  $ Mus                : num  29067 237 64 76 222 ...
##  $ Nevsehir           : num  13989 110 33 59 54 ...
##  $ Nigde              : num  18562 751 28 82 53 ...
##  $ Ordu               : num  52724 80 39 82 113 ...
##  $ Rize               : num  21043 48 13 44 42 ...
##  $ Sakarya            : num  22831 59 20 95 72 ...
##  $ Samsun             : num  61719 136 60 252 182 ...
##  $ Siirt              : num  19479 588 29 30 78 ...
##  $ Sinop              : num  15360 36 11 19 23 ...
##  $ Sivas              : num  41657 294 45 106 124 ...
##  $ Tekirdag           : num  16512 51 19 53 50 ...
##  $ Tokat              : num  50079 96 44 90 117 ...
##  $ Trabzon            : num  41060 101 31 80 170 ...
##  $ Tunceli            : num  7510 58 12 11 5 5 278 200 10 201 ...
##  $ Sanliurfa          : num  66673 1978 1086 276 163 ...
##  $ Usak               : num  11330 28 17 339 47 ...
##  $ Van                : num  55154 609 84 105 478 ...
##  $ Yozgat             : num  35518 113 31 131 98 ...
##  $ Zonguldak          : num  29971 62 22 112 58 ...
##  $ Aksaray            : num  16230 148 23 94 49 ...
##  $ Bayburt            : num  11300 11 8 15 22 10 723 90 14 36 ...
##  $ Karaman            : num  9960 69 18 85 17 10 458 774 11 128 ...
##  $ Kirikkale          : num  17452 64 13 73 88 ...
##  $ Batman             : num  26958 293 72 48 138 ...
##  $ Sirnak             : num  18436 330 41 19 62 ...
##  $ Bartin             : num  9312 16 12 27 16 ...
##  $ Ardahan            : num  11374 19 7 27 26 ...
##  $ Igdir              : num  11733 29 12 36 231 ...
##  $ Yalova             : num  4077 23 9 11 14 ...
##  $ Karabük            : num  11464 32 25 31 34 ...
##  $ Kilis              : num  7058 136 42 12 8 ...
##  $ Osmaniye           : num  27833 2243 199 111 107 ...
##  $ Düzce              : num  11054 24 8 28 29 ...
##  $ Yurt disi
## Abroad  : num  22972 332 45 124 53 ...
##  $ Bilinmeyen
## Unknown: num  26828 477 115 189 73 ...

4.2- Reshaping The Data

Since the file we have is structured in a horizontal way it is better to transform it to a vertical format way by using melt function in reshape2 library.

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

# First three columns should remain as it is while rest melted
melted_data <- melt(maindata, id.vars = c("Yil", "Alan", "Toplam"))
# This will be the main structure
head(melted_data)

4.3- Organizing Data

There are additional informations that we will not use for our analysis in current table. Thus, it is better to clean them at this point with some additional renaming and formatting.

clean_data <- melted_data %>%
# We would not need total lines that can be calculated easily if necessary
  filter(Alan != "Toplam-Total") %>%
# We need only the period information, and cities both departure and destination also the number of people migrated
  select(Yil, Alan, variable, value) %>%
  arrange(Yil, Alan, variable, value)
# In order to uniform language it is better to assign new names to the columns
colnames(clean_data) <- c("Year", "Destination", "Birth_Place", "People")
clean_data$Year <- as.integer(clean_data$Year)
suppressWarnings(clean_data$`People` <- as.integer(clean_data$`People`))
clean_data$Birth_Place <- as.character(clean_data$Birth_Place)
clean_data$Destination <- enc2native(clean_data$Destination)
clean_data$Birth_Place <- enc2native(clean_data$Birth_Place)
# This should be the representation of our data set
head(clean_data)

Year column is currently in integer format as seen above. However, in order to do more analysis on ggplot or other packages it is better to transform it to date format with lubridated.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

clean_data$Year <- ymd(sprintf("%d-01-01",clean_data$Year))

4.4- Translation from TR to EN

As a last part of data cleaning steps, we will translate TR characters to EN by using below function

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result <- gsub(pattern[i],replacement[i],result)
  }
  return(result)
}

tr_to_en <- function(datafile){
  turkish_letters <- c("Ç","S","G","I","Ü","Ö","ç","s","g","i","ü","ö")
  english_letters <- c("C","S","G","I","U","O","c","s","g","i","u","o")
  datafile <- mgsub(turkish_letters,english_letters,datafile)
  return(datafile)
}

clean_data$Birth_Place <- tr_to_en(clean_data$Birth_Place)
clean_data$Destination <- tr_to_en(clean_data$Destination)
head(clean_data)

4.5- Saving with RDS format

RDS file would allow us to work on the same data later on if need to without replicating above steps.

saveRDS(clean_data, file = "D:/Data Analytics Esentials/project/shared/clean_migration.rds")

5- Exploratary Data Analysis (as of 2017)

Let’s first list top 6 cities which preferred as a “Province of residence” and top 6 cities that migrated population have as a “place of birth”. So we can compare how related both with each other.

Top 6 Cities Preffered for Migration

clean_data %>%
  filter(Year == "2017-01-01") %>%
  group_by(Destination) %>%
  summarize(sumofpeople = sum(People)) %>%
  top_n(n = 6, wt = sumofpeople) %>%
  arrange(desc(sumofpeople)) %>%
  print(width=Inf)

## # A tibble: 6 x 2
##   Destination sumofpeople
##   <chr>             <int>
## 1 Istanbul         416587
## 2 Ankara           188100
## 3 Izmir            127394
## 4 Kocaeli           87796
## 5 Antalya           87232
## 6 Bursa             86119

Top 6 Cities That Migrated People Have as a “Place of Birth”

clean_data %>%
  filter(Year == "2017-01-01") %>%
  group_by(Birth_Place) %>%
  summarize(sumofpeople = sum(People)) %>%
  top_n(n = 6, wt = sumofpeople) %>%
  arrange(desc(sumofpeople)) %>%
  print(width = Inf)

## # A tibble: 6 x 2
##   Birth_Place sumofpeople
##   <chr>             <int>
## 1 Istanbul         259999
## 2 Ankara           121902
## 3 Adana             78422
## 4 Izmir             74483
## 5 Sanliurfa         66673
## 6 Diyarbakir        65475

As we’ve seen from two tables, ranking of top 6 cities differ except Istanbul, Ankara and Izmir. We can conclude that “Kocaeli, Antalya and Bursa” have higher possibility to be considered more as a destination place to migrate compared to “Adana, Sanliurfa and Diyarbakir” since people who born in these locations preferred to migrate another city rather than living in their hometowns. Job opportunities and density of industrial activity would be possible dynamics behind that.

We concluded that three biggest cities of Turkey -Istanbul, Ankara and Izmir- have the same major role as migrated cities and people migrated from. But what would be the net impact of the migration on population? If migration to these cities is higher than migration from, then we should expect an increase in the population or vice versa.

top3to <- clean_data %>%
  filter(Destination %in% c("Istanbul", "Ankara", "Izmir")) %>%
  group_by(Year, Destination) %>%
  summarize(Migrated_to = sum(People))
colnames(top3to) <- c("Year", "City", "Migrated_to")

top3fr <- clean_data %>%
  filter(Birth_Place %in% c("Istanbul", "Ankara", "Izmir")) %>%
  group_by(Year, Birth_Place) %>%
  summarize(Migrated_from = sum(People))
colnames(top3fr) <- c("Year", "City", "Migrated_from")
  
top3 <- inner_join(top3to, top3fr, by =c("Year", "City")) %>%
  mutate(Impact = Migrated_to - Migrated_from)
print(top3)

## # A tibble: 12 x 5
## # Groups:   Year [4]
##    Year       City     Migrated_to Migrated_from Impact
##    <date>     <chr>          <int>         <int>  <int>
##  1 2014-01-01 Ankara        203621        123590  80031
##  2 2014-01-01 Istanbul      438998        227860 211138
##  3 2014-01-01 Izmir         124439         71524  52915
##  4 2015-01-01 Ankara        204048        122252  81796
##  5 2015-01-01 Istanbul      453407        239472 213935
##  6 2015-01-01 Izmir         126238         74313  51925
##  7 2016-01-01 Ankara        177166        119985  57181
##  8 2016-01-01 Istanbul      369582        249689 119893
##  9 2016-01-01 Izmir         122668         73603  49065
## 10 2017-01-01 Ankara        188100        121902  66198
## 11 2017-01-01 Istanbul      416587        259999 156588
## 12 2017-01-01 Izmir         127394         74483  52911

6- Findings

6.1- Three largest cities population growth

Below graph represent net annual impact of migration on three biggest Turkish cities. As shown, during the period between 2014 and 2017 net migration impact resulted in increasing population in these cities.

2015 was the peak of the population growth induced by migration for all three cities and 2016 was the lowest.

library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

ggplot(data = top3, aes(x = Year, y = Impact, color = City)) + geom_line(size = 1) +
  geom_point(size = 2) + ggtitle("Net Migration Impact") + theme(legend.position="bottom")

6.2- Net Migration Impact as of 2017 by All Cities

As shown below, 26 cities out of 81 have a position of net receiver. On the other hand remaining 55 cities lost their people due to migration. In this perspective we can say population concentrated on 32% of the Turkish cities in 2017 due to migration. Also it is visible in the graph that cities with net receiver status tend to concentrate on western and more industrialized part of the country. Without any other information we can assume majority of the migration in 2017 can be explained by economical motives.

library(scales)

inc <- clean_data %>%
  group_by(Year, Destination) %>%
  summarize(Migrated_to = sum(People))
colnames(inc) <- c("Year", "City", "Migrated_to")

dec <- clean_data %>%
  group_by(Year, Birth_Place) %>%
  summarize(Migrated_from = sum(People))
colnames(dec) <- c("Year", "City", "Migrated_from")

#print(dec)
  
nets <- inner_join(inc, dec, by =c("Year", "City")) %>%
  mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>%
  mutate(Impact = Migrated_to - Migrated_from) %>%
  mutate(Pct = percent(Impact / sum(Migrated_to))) %>%
  arrange(desc(Impact))
print(nets, width = Inf)

## # A tibble: 324 x 6
## # Groups:   Year [4]
##    Year       City     Migrated_to Migrated_from Impact Pct  
##    <date>     <chr>          <dbl>         <dbl>  <dbl> <chr>
##  1 2015-01-01 Istanbul      453407        239472 213935 7.91%
##  2 2014-01-01 Istanbul      438998        227860 211138 7.93%
##  3 2017-01-01 Istanbul      416587        259999 156588 5.85%
##  4 2016-01-01 Istanbul      369582        249689 119893 4.59%
##  5 2015-01-01 Ankara        204048        122252  81796 3.02%
##  6 2014-01-01 Ankara        203621        123590  80031 3.01%
##  7 2017-01-01 Ankara        188100        121902  66198 2.47%
##  8 2015-01-01 Antalya        96441         34773  61668 2.28%
##  9 2014-01-01 Antalya        93057         32290  60767 2.28%
## 10 2016-01-01 Ankara        177166        119985  57181 2.19%
## # ... with 314 more rows

nets %>%
  filter(Year == "2017-01-01") %>%
  ggplot(aes(x=reorder(City, Impact), y=Impact, fill=City)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(x = "City", y = "Impact") +
  ggtitle("Net Migration by Cities as of 2017") +
  theme(legend.position = "none", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 1))