We are going to use TUIK data to map domestic migration in Turkey. Original excel file can be found in this link.
Data set we obtained from the TUIK website contains 4 years of migrated population within Turkey. Original excel file contains 329 rows and 86 columns of information. Each row represents the destination cities which people migrated to and on each column we can obtain the distribution of migrated population by the place of birth. There are 81 distinct Turkish cities as a destination and 83 distinc columns as a place of birth with the addition for people who born abroad or unknown locations.
Human migration is the movement by people from one place to another with the intentions of settling, permanently or temporarily in a new location. The movement is often over long distances and from one country to another, but internal migration is also possible; indeed, this is the dominant form globally. The simplest definition of migration is living things and people moving or relocating from one place to another. If migration takes place within the borders of a country, then it is called internal migration; if it crosses a country’s border, it is called external or international migration. Migration can occur owing to social, economic, political, cultural, and ethnic reasons. Internal migration began with modernization in agriculture and industrialization activities after the Second World War in Turkey. Migration first occured from village to city, then from small- and medium-sized cities to large cities. In the 1990s, the shape of a new migration emerged: from cities to villages. Internal migration has caused social changes in both city and village settlements. Along with these changes, a number of problems emerged, especially in those of cities, and these problems exist till date.This study analyzed internal migration addresses from a city to another city in Turkey between 2014 and 2017.
We need tidyverse library at that point.
## -- Attaching packages -------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.8
## v tidyr 0.8.2 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We will start by reading raw data from the excel file and get rid of some unnecessary lines containing footnotes.
# First three lines are empty because of header format of the excel and last 4 only includes footnotes
tmp1<-tempfile(fileext=".xls")
download.file("https://github.com/MEF-BDA503/gpj18-r_boys/blob/master/source_files/domestic_migration_inflow&outflow.xls?raw=true",mode = "wb",destfile=tmp1)
maindata<-readxl::read_excel(tmp1,skip=3) %>%
slice(-c(329:333))
## Warning: Mangling the following names: Adiyaman -> Adiyaman, Agri -> Agri, Aydin -> Aydin, Balikesir -> Balikesir, Çankiri -> Çankiri, Diyarbakir -> Diyarbakir, Elazig -> Elazig, Eskisehir -> Eskisehir, Gümüshane -> Gümüshane, Istanbul -> Istanbul, Izmir -> Izmir, Kirklareli -> Kirklareli, Kirsehir -> Kirsehir, Kahramanmaras -> Kahramanmaras, Mugla -> Mugla, Mus -> Mus, Nevsehir -> Nevsehir, Nigde -> Nigde, Tekirdag -> Tekirdag, Sanliurfa -> Sanliurfa, Usak -> Usak, Kirikkale -> Kirikkale, Sirnak -> Sirnak, Bartin -> Bartin, Igdir -> Igdir, Yurt disi
## Abroad -> Yurt disi
## Abroad. Use enc2native() to avoid the warning.
First thing to do at this stage is checking data structure as r understand in order to identify what needs to be done to have a clean and civilized data.
## Classes 'tbl_df', 'tbl' and 'data.frame': 328 obs. of 86 variables:
## $ Yil : chr "2017" "2017" "2017" "2017" ...
## $ Alan : chr "Toplam-Total" "Adana" "Adiyaman" "Afyonkarahisar" ...
## $ Toplam : num 2684820 49509 18040 21453 15088 ...
## $ Adana : num 78422 21992 460 271 183 ...
## $ Adiyaman : num 34453 785 9448 100 115 ...
## $ Afyonkarahisar : num 28264 82 21 7301 112 ...
## $ Agri : num 40965 184 57 143 6201 ...
## $ Amasya : num 18889 61 12 68 63 ...
## $ Ankara : num 121902 827 208 943 427 ...
## $ Antalya : num 36961 344 93 590 99 ...
## $ Artvin : num 12760 35 4 23 18 ...
## $ Aydin : num 25531 92 44 281 118 ...
## $ Balikesir : num 34358 142 42 214 131 ...
## $ Bilecik : num 5080 9 3 41 11 14 238 128 6 68 ...
## $ Bingöl : num 15190 279 19 27 45 ...
## $ Bitlis : num 22844 397 41 54 139 ...
## $ Bolu : num 11036 45 9 49 35 ...
## $ Burdur : num 9149 45 10 169 29 ...
## $ Bursa : num 49327 163 74 396 131 ...
## $ Çanakkale : num 12448 40 7 61 53 ...
## $ Çankiri : num 15160 36 9 31 30 ...
## $ Çorum : num 35073 109 46 128 116 ...
## $ Denizli : num 24717 87 39 687 100 ...
## $ Diyarbakir : num 65475 1137 332 129 264 ...
## $ Edirne : num 14099 42 14 40 56 ...
## $ Elazig : num 27630 510 151 84 79 ...
## $ Erzincan : num 15579 53 16 28 57 ...
## $ Erzurum : num 52387 205 34 123 367 ...
## $ Eskisehir : num 24830 106 32 439 76 ...
## $ Gaziantep : num 50602 1081 958 210 124 ...
## $ Giresun : num 35988 50 32 58 69 ...
## $ Gümüshane : num 22786 21 7 31 31 ...
## $ Hakkari : num 13252 70 23 22 67 ...
## $ Hatay : num 53491 1661 250 191 160 ...
## $ Isparta : num 15828 76 21 515 56 ...
## $ Mersin : num 56079 2473 401 276 151 ...
## $ Istanbul : num 259999 1325 816 859 642 ...
## $ Izmir : num 74483 428 103 852 288 ...
## $ Kars : num 22714 48 24 91 114 ...
## $ Kastamonu : num 20273 56 18 34 26 ...
## $ Kayseri : num 43456 696 99 146 164 ...
## $ Kirklareli : num 10087 37 8 27 27 ...
## $ Kirsehir : num 15435 56 27 44 58 ...
## $ Kocaeli : num 33707 112 32 152 78 ...
## $ Konya : num 59738 494 106 1096 244 ...
## $ Kütahya : num 20652 51 14 395 77 ...
## $ Malatya : num 39688 605 787 95 101 ...
## $ Manisa : num 38248 153 37 486 141 ...
## $ Kahramanmaras : num 48577 1116 554 189 138 ...
## $ Mardin : num 45338 1302 134 94 112 ...
## $ Mugla : num 17591 110 27 198 49 ...
## $ Mus : num 29067 237 64 76 222 ...
## $ Nevsehir : num 13989 110 33 59 54 ...
## $ Nigde : num 18562 751 28 82 53 ...
## $ Ordu : num 52724 80 39 82 113 ...
## $ Rize : num 21043 48 13 44 42 ...
## $ Sakarya : num 22831 59 20 95 72 ...
## $ Samsun : num 61719 136 60 252 182 ...
## $ Siirt : num 19479 588 29 30 78 ...
## $ Sinop : num 15360 36 11 19 23 ...
## $ Sivas : num 41657 294 45 106 124 ...
## $ Tekirdag : num 16512 51 19 53 50 ...
## $ Tokat : num 50079 96 44 90 117 ...
## $ Trabzon : num 41060 101 31 80 170 ...
## $ Tunceli : num 7510 58 12 11 5 5 278 200 10 201 ...
## $ Sanliurfa : num 66673 1978 1086 276 163 ...
## $ Usak : num 11330 28 17 339 47 ...
## $ Van : num 55154 609 84 105 478 ...
## $ Yozgat : num 35518 113 31 131 98 ...
## $ Zonguldak : num 29971 62 22 112 58 ...
## $ Aksaray : num 16230 148 23 94 49 ...
## $ Bayburt : num 11300 11 8 15 22 10 723 90 14 36 ...
## $ Karaman : num 9960 69 18 85 17 10 458 774 11 128 ...
## $ Kirikkale : num 17452 64 13 73 88 ...
## $ Batman : num 26958 293 72 48 138 ...
## $ Sirnak : num 18436 330 41 19 62 ...
## $ Bartin : num 9312 16 12 27 16 ...
## $ Ardahan : num 11374 19 7 27 26 ...
## $ Igdir : num 11733 29 12 36 231 ...
## $ Yalova : num 4077 23 9 11 14 ...
## $ Karabük : num 11464 32 25 31 34 ...
## $ Kilis : num 7058 136 42 12 8 ...
## $ Osmaniye : num 27833 2243 199 111 107 ...
## $ Düzce : num 11054 24 8 28 29 ...
## $ Yurt disi
## Abroad : num 22972 332 45 124 53 ...
## $ Bilinmeyen
## Unknown: num 26828 477 115 189 73 ...
Since the file we have is structured in a horizontal way it is better to transform it to a vertical format way by using melt function in reshape2 library.
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
# First three columns should remain as it is while rest melted
melted_data <- melt(maindata, id.vars = c("Yil", "Alan", "Toplam"))
# This will be the main structure
head(melted_data)
There are additional informations that we will not use for our analysis in current table. Thus, it is better to clean them at this point with some additional renaming and formatting.
clean_data <- melted_data %>%
# We would not need total lines that can be calculated easily if necessary
filter(Alan != "Toplam-Total") %>%
# We need only the period information, and cities both departure and destination also the number of people migrated
select(Yil, Alan, variable, value) %>%
arrange(Yil, Alan, variable, value)
# In order to uniform language it is better to assign new names to the columns
colnames(clean_data) <- c("Year", "Destination", "Birth_Place", "People")
clean_data$Year <- as.integer(clean_data$Year)
suppressWarnings(clean_data$`People` <- as.integer(clean_data$`People`))
clean_data$Birth_Place <- as.character(clean_data$Birth_Place)
clean_data$Destination <- enc2native(clean_data$Destination)
clean_data$Birth_Place <- enc2native(clean_data$Birth_Place)
# This should be the representation of our data set
head(clean_data)
Year column is currently in integer format as seen above. However, in order to do more analysis on ggplot or other packages it is better to transform it to date format with lubridated.
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
As a last part of data cleaning steps, we will translate TR characters to EN by using below function
mgsub <- function(pattern, replacement, x, ...) {
n = length(pattern)
if (n != length(replacement)) {
stop("pattern and replacement do not have the same length.")
}
result = x
for (i in 1:n) {
result <- gsub(pattern[i],replacement[i],result)
}
return(result)
}
tr_to_en <- function(datafile){
turkish_letters <- c("Ç","S","G","I","Ü","Ö","ç","s","g","i","ü","ö")
english_letters <- c("C","S","G","I","U","O","c","s","g","i","u","o")
datafile <- mgsub(turkish_letters,english_letters,datafile)
return(datafile)
}
clean_data$Birth_Place <- tr_to_en(clean_data$Birth_Place)
clean_data$Destination <- tr_to_en(clean_data$Destination)
head(clean_data)
RDS file would allow us to work on the same data later on if need to without replicating above steps.
Let’s first list top 6 cities which preferred as a “Province of residence” and top 6 cities that migrated population have as a “place of birth”. So we can compare how related both with each other.
Top 6 Cities Preffered for Migration
clean_data %>%
filter(Year == "2017-01-01") %>%
group_by(Destination) %>%
summarize(sumofpeople = sum(People)) %>%
top_n(n = 6, wt = sumofpeople) %>%
arrange(desc(sumofpeople)) %>%
print(width=Inf)
## # A tibble: 6 x 2
## Destination sumofpeople
## <chr> <int>
## 1 Istanbul 416587
## 2 Ankara 188100
## 3 Izmir 127394
## 4 Kocaeli 87796
## 5 Antalya 87232
## 6 Bursa 86119
Top 6 Cities That Migrated People Have as a “Place of Birth”
clean_data %>%
filter(Year == "2017-01-01") %>%
group_by(Birth_Place) %>%
summarize(sumofpeople = sum(People)) %>%
top_n(n = 6, wt = sumofpeople) %>%
arrange(desc(sumofpeople)) %>%
print(width = Inf)
## # A tibble: 6 x 2
## Birth_Place sumofpeople
## <chr> <int>
## 1 Istanbul 259999
## 2 Ankara 121902
## 3 Adana 78422
## 4 Izmir 74483
## 5 Sanliurfa 66673
## 6 Diyarbakir 65475
As we’ve seen from two tables, ranking of top 6 cities differ except Istanbul, Ankara and Izmir. We can conclude that “Kocaeli, Antalya and Bursa” have higher possibility to be considered more as a destination place to migrate compared to “Adana, Sanliurfa and Diyarbakir” since people who born in these locations preferred to migrate another city rather than living in their hometowns. Job opportunities and density of industrial activity would be possible dynamics behind that.
We concluded that three biggest cities of Turkey -Istanbul, Ankara and Izmir- have the same major role as migrated cities and people migrated from. But what would be the net impact of the migration on population? If migration to these cities is higher than migration from, then we should expect an increase in the population or vice versa.
top3to <- clean_data %>%
filter(Destination %in% c("Istanbul", "Ankara", "Izmir")) %>%
group_by(Year, Destination) %>%
summarize(Migrated_to = sum(People))
colnames(top3to) <- c("Year", "City", "Migrated_to")
top3fr <- clean_data %>%
filter(Birth_Place %in% c("Istanbul", "Ankara", "Izmir")) %>%
group_by(Year, Birth_Place) %>%
summarize(Migrated_from = sum(People))
colnames(top3fr) <- c("Year", "City", "Migrated_from")
top3 <- inner_join(top3to, top3fr, by =c("Year", "City")) %>%
mutate(Impact = Migrated_to - Migrated_from)
print(top3)
## # A tibble: 12 x 5
## # Groups: Year [4]
## Year City Migrated_to Migrated_from Impact
## <date> <chr> <int> <int> <int>
## 1 2014-01-01 Ankara 203621 123590 80031
## 2 2014-01-01 Istanbul 438998 227860 211138
## 3 2014-01-01 Izmir 124439 71524 52915
## 4 2015-01-01 Ankara 204048 122252 81796
## 5 2015-01-01 Istanbul 453407 239472 213935
## 6 2015-01-01 Izmir 126238 74313 51925
## 7 2016-01-01 Ankara 177166 119985 57181
## 8 2016-01-01 Istanbul 369582 249689 119893
## 9 2016-01-01 Izmir 122668 73603 49065
## 10 2017-01-01 Ankara 188100 121902 66198
## 11 2017-01-01 Istanbul 416587 259999 156588
## 12 2017-01-01 Izmir 127394 74483 52911
Below graph represent net annual impact of migration on three biggest Turkish cities. As shown, during the period between 2014 and 2017 net migration impact resulted in increasing population in these cities.
2015 was the peak of the population growth induced by migration for all three cities and 2016 was the lowest.
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
As shown below, 26 cities out of 81 have a position of net receiver. On the other hand remaining 55 cities lost their people due to migration. In this perspective we can say population concentrated on 32% of the Turkish cities in 2017 due to migration. Also it is visible in the graph that cities with net receiver status tend to concentrate on western and more industrialized part of the country. Without any other information we can assume majority of the migration in 2017 can be explained by economical motives.
library(scales)
inc <- clean_data %>%
group_by(Year, Destination) %>%
summarize(Migrated_to = sum(People))
colnames(inc) <- c("Year", "City", "Migrated_to")
dec <- clean_data %>%
group_by(Year, Birth_Place) %>%
summarize(Migrated_from = sum(People))
colnames(dec) <- c("Year", "City", "Migrated_from")
#print(dec)
nets <- inner_join(inc, dec, by =c("Year", "City")) %>%
mutate_if(is.numeric,funs(ifelse(is.na(.),0,.))) %>%
mutate(Impact = Migrated_to - Migrated_from) %>%
mutate(Pct = percent(Impact / sum(Migrated_to))) %>%
arrange(desc(Impact))
print(nets, width = Inf)
## # A tibble: 324 x 6
## # Groups: Year [4]
## Year City Migrated_to Migrated_from Impact Pct
## <date> <chr> <dbl> <dbl> <dbl> <chr>
## 1 2015-01-01 Istanbul 453407 239472 213935 7.91%
## 2 2014-01-01 Istanbul 438998 227860 211138 7.93%
## 3 2017-01-01 Istanbul 416587 259999 156588 5.85%
## 4 2016-01-01 Istanbul 369582 249689 119893 4.59%
## 5 2015-01-01 Ankara 204048 122252 81796 3.02%
## 6 2014-01-01 Ankara 203621 123590 80031 3.01%
## 7 2017-01-01 Ankara 188100 121902 66198 2.47%
## 8 2015-01-01 Antalya 96441 34773 61668 2.28%
## 9 2014-01-01 Antalya 93057 32290 60767 2.28%
## 10 2016-01-01 Ankara 177166 119985 57181 2.19%
## # ... with 314 more rows
nets %>%
filter(Year == "2017-01-01") %>%
ggplot(aes(x=reorder(City, Impact), y=Impact, fill=City)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "City", y = "Impact") +
ggtitle("Net Migration by Cities as of 2017") +
theme(legend.position = "none", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 1))