We worked on trade statistics data of Turkey against the G7 countries and 3 more countries (Russia, India, China) both for export and import.Our aim is to analyze Turkey’s trade characteristics by product groups. Within this aim, we take trade values and quantities by product groups in MS Excel file format from trademap.org website for each trade partner country for the period of 2013 - 2017. In this study, We applied data manipulation to make it proper for analysis and explanatory data analysis with visulations and conclude with overall statistical overview of Turkey’s trade.
Following the extraction from source (Trademap.org), our raw data consist of 10 countries’ trade data against Turkey. This data splits to 2 with trade direction as “Export” and “Import”. The other dimension is stat for trade, one is “Quantity” of Export/Import, the other is “Value” in billion USD of Export/Import. Therefore we have four main data frame for each country (total of 40 excel files).
Data Frame | DF Definition |
---|---|
ExportValues | Export from Turkey to target countries and their values in billion USD |
ExportQuantity | Export from Turkey to target countries and their quantities in units(tons,gallons etc) |
ImportValues | Import to Turkey from target counties and their values in billion USD |
ImportQuantity | Import to Turkey from target counties and their quantities in units(tons,gallons etc) |
Table structures for “Quantity” and “Value” files are as follows:
Country | Product code | Product label | Quantity in 2013 | Unit | Quantity in 2014 | Unit | Quantity in 2015 | Unit | Quantity in 2016 | Unit | Quantity in 2017 | Unit |
---|---|---|---|---|---|---|---|---|---|---|---|---|
France | ’0306 | Crustaceans, whether in shell or not, live, fresh, chilled, frozen, dried, salted or in brine, … | 121 | Tons | 139 | Tons | 22 | Tons | 24 | Tons |
Table 1: Quantity excel table structure
Country | Product code | Product label | Value in 2013 | Value in 2014 | Value in 2015 | Value in 2016 | Value in 2017 |
---|---|---|---|---|---|---|---|
France | ’0306 | Crustaceans, whether in shell or not, live, fresh, chilled, frozen, dried, salted or in brine, … | 0 | 1155 | 1182 | 270 | 0 |
Table 2: Values excel table structure
In addition to main raw data, we also have look-up table for product codes; This lookup table registers product labels in 2 digit code that is less categorized than main data.
Product code | Product label |
---|---|
’01 | Live animals |
’02 | Meat and edible meat offal |
’03 | Fish and crustaceans, molluscs and other aquatic invertebrates |
’04 | Dairy produce; birds’ eggs; natural honey; edible products of animal origin, not elsewhere … |
Table 3: Product look-up excel table
Regarding the variables included in these tables; definitions are as follows:
ExportQuantity & ImportQuantity | |
---|---|
Country | Trade Partner of Turkey |
Product_Code | Traded Product Code in 4 digit |
Product_Label | Product Definition |
Unit | Metric of Quantity |
Quantity_2013 | Export or Import quantity in 2013 |
Quantity_2014 | Export or Import quantity in 2014 |
Quantity_2015 | Export or Import quantity in 2015 |
Quantity_2016 | Export or Import quantity in 2016 |
Quantity_2017 | Export or Import quantity in 2017 |
ExportValues & ImportValues | |
---|---|
Country | Trade Partner of Turkey |
Product_Code | Traded Product Code in 4 digit |
Product_Label | Product Definition |
Value_2013 | Export or Import value in 2013 |
Value_2014 | Export or Import value in 2014 |
Value_2015 | Export or Import value in 2015 |
Value_2016 | Export or Import value in 2016 |
Value_2017 | Export or Import value in 2017 |
In order to read these excel files, firstly we create a function named “getDataSet”:
PATH <-"C:/Users/emrek/Google Drive/BDA/503-EssentialsOfDataAnalytics/GitHub/gpj18-group_four/Raw Data"
setwd(PATH)
getDataSet <- function(pPattern){
nm <- list.files(path = PATH, pattern = pPattern)
if (exists("resultDataset")){
rm("resultDataset")
}
for (file in nm){
if (!exists("resultDataset")){
resultDataset <- read_excel(file, col_names=FALSE, skip=1)
}
else{
temp_dataset <-read_excel(file, col_names=FALSE, skip=1)
resultDataset<-rbind(resultDataset, temp_dataset)
rm(temp_dataset)
}
}
return(resultDataset)
}
import_quantity_dataset <- getDataSet('*.mport.*uant.*')
import_value_dataset <- getDataSet('*.mport.*alu.*')
export_quantity_dataset <- getDataSet('*.xport.*uant.*')
export_value_dataset <- getDataSet('*.xport.*alu.*')
colnames(import_quantity_dataset) <- c("Country","Product_Code","Product_Label","Quantity_2013","Unit1","Quantity_2014","Unit2","Quantity_2015","Unit3","Quantity_2016","Unit4","Quantity_2017","Unit5")
colnames(import_value_dataset) <- c("Country","Product_Code","Product_Label","Value_2013","Value_2014","Value_2015","Value_2016","Value_2017")
colnames(export_quantity_dataset) <- c("Country","Product_Code","Product_Label","Quantity_2013","Unit1","Quantity_2014","Unit2","Quantity_2015","Unit3","Quantity_2016","Unit4","Quantity_2017","Unit5")
colnames(export_value_dataset) <- c("Country","Product_Code","Product_Label","Value_2013","Value_2014","Value_2015","Value_2016","Value_2017")
import_quantity_dataset$Product_Code<- gsub("'", "", import_quantity_dataset$Product_Code)
import_value_dataset$Product_Code<- gsub("'", "",import_value_dataset$Product_Code)
export_quantity_dataset$Product_Code<- gsub("'", "",export_quantity_dataset$Product_Code)
export_value_dataset$Product_Code<- gsub("'", "",export_value_dataset$Product_Code)
And by using the getDataSet function, we merged the Turkey’s export-quantity, export-value, import-quantity, import-value tables for 10 selected countries.
glimpse(export_quantity_dataset)
## Observations: 12,223
## Variables: 13
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "TOTAL", "7214", "6907", "2523", "7208", "6802",...
## $ Product_Label <chr> "All products", "Bars and rods, of iron or non-a...
## $ Quantity_2013 <dbl> NA, 131711, 1795, 22000, 220, 46950, 19782, 2821...
## $ Unit1 <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
## $ Quantity_2014 <dbl> NA, 133195, 2560, 54210, 157114, 49668, 38396, 1...
## $ Unit2 <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
## $ Quantity_2015 <dbl> NA, 1085, 2291, 78195, 10756, 38561, 29677, 4219...
## $ Unit3 <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
## $ Quantity_2016 <dbl> NA, 1759, 1935, 62500, 38354, 33166, 19759, 2239...
## $ Unit4 <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
## $ Quantity_2017 <dbl> 0, 220831, 140214, 129930, 80459, 35413, 34646, ...
## $ Unit5 <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
glimpse(export_value_dataset)
## Observations: 12,222
## Variables: 8
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "7214", "8703", "8901", "0802", "8902", "7208", ...
## $ Product_Label <chr> "Bars and rods, of iron or non-alloy steel, not ...
## $ Value_2013 <dbl> 73607, 13976, 0, 57640, 0, 211, 895, 40420, 1052...
## $ Value_2014 <dbl> 73814, 2485, 0, 65460, 0, 96558, 1227, 46362, 11...
## $ Value_2015 <dbl> 616, 2477, 62, 105122, 0, 5033, 1052, 37154, 129...
## $ Value_2016 <dbl> 863, 94, 66747, 94623, 0, 18777, 832, 30918, 171...
## $ Value_2017 <dbl> 106068, 87274, 84922, 75731, 45504, 44261, 40668...
glimpse(import_quantity_dataset)
## Observations: 12,222
## Variables: 13
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "2601", "7204", "2701", "4401", "0713", "1001", ...
## $ Product_Label <chr> "Iron ores and concentrates, incl. roasted iron ...
## $ Quantity_2013 <dbl> 465332, 373154, 338707, 313714, 194731, 230375, ...
## $ Unit1 <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity_2014 <dbl> 162004, 277774, 491793, 126757, 298731, 89640, N...
## $ Unit2 <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity_2015 <dbl> 494577, 317117, 504464, 297655, 304681, 161211, ...
## $ Unit3 <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity_2016 <dbl> 504892, 372166, 1344338, 308932, 322942, 196538,...
## $ Unit4 <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity_2017 <dbl> 675057, 526312, 485643, 241778, 235982, 145438, ...
## $ Unit5 <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
glimpse(import_value_dataset)
## Observations: 12,223
## Variables: 8
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "7108", "0713", "7204", "2601", "2701", "2710", ...
## $ Product_Label <chr> "Gold, incl. gold plated with platinum, unwrough...
## $ Value_2013 <dbl> 230689, 132022, 147798, 77901, 62688, 6565, 1065...
## $ Value_2014 <dbl> 8259, 204972, 104914, 25982, 70544, 6244, 169636...
## $ Value_2015 <dbl> 760, 232255, 77607, 43341, 55067, 6183, 47389, 2...
## $ Value_2016 <dbl> 48488, 265707, 84380, 38810, 135545, 7990, 24360...
## $ Value_2017 <dbl> 1030428, 164565, 157861, 78121, 73275, 70084, 63...
We have 13 and 8 variables in quantity and value tables respectively both for export and import.
Now, it is time to join quantity and value data for both export and import files. While joining the tables, we should also clear the repetitive variable “Unit” for each year.
Also, we need to join product labels in 2 digit product codes to 4 digit product codes and labels.
#read the ProductCodes excel
file = "C:/Users/emrek/Google Drive/BDA/503-EssentialsOfDataAnalytics/GitHub/gpj18-group_four/2digitproductlist.xlsx"
prdCode <- read_excel(file, col_names=FALSE, skip=1)
prdCode$X__1 <- gsub("'", "", prdCode$X__1)
# join the import values and quantities
import_jTable <- import_quantity_dataset %>%
mutate( Unit= coalesce(Unit1,Unit2 ,Unit3 , Unit4,Unit5)) %>%
select( Country, Product_Code , Unit, Quantity_2013, Quantity_2014, Quantity_2015, Quantity_2016, Quantity_2017) %>%
full_join(import_value_dataset,by = c("Country","Product_Code"))
# seperate the first 2 digits of 4 digit product code and join the 2digit product code excel
import_jTable <- import_jTable %>% mutate(PcodeTwo=substr(Product_Code,1,2))
import_jTable <- left_join(import_jTable,prdCode,by =c("PcodeTwo"="X__1") )
# join the export values and quantities
export_jTable <- export_quantity_dataset %>%
mutate( Unit= coalesce(Unit1,Unit2 ,Unit3 , Unit4,Unit5)) %>%
select( Country, Product_Code , Unit, Quantity_2013, Quantity_2014,Quantity_2015,Quantity_2016,Quantity_2017) %>%
full_join(export_value_dataset,by = c("Country","Product_Code"))
# seperate the first 2 digits of 4 digit product code and join the 2digit product code excel
export_jTable <- export_jTable %>% mutate(PcodeTwo=substr(Product_Code,1,2))
export_jTable <- left_join(export_jTable,prdCode,by =c("PcodeTwo"="X__1") )
Now we have joined tables for import and export stats of Turkey.
glimpse(import_jTable)
## Observations: 12,223
## Variables: 16
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "2601", "7204", "2701", "4401", "0713", "1001", ...
## $ Unit <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity_2013 <dbl> 465332, 373154, 338707, 313714, 194731, 230375, ...
## $ Quantity_2014 <dbl> 162004, 277774, 491793, 126757, 298731, 89640, N...
## $ Quantity_2015 <dbl> 494577, 317117, 504464, 297655, 304681, 161211, ...
## $ Quantity_2016 <dbl> 504892, 372166, 1344338, 308932, 322942, 196538,...
## $ Quantity_2017 <dbl> 675057, 526312, 485643, 241778, 235982, 145438, ...
## $ Product_Label <chr> "Iron ores and concentrates, incl. roasted iron ...
## $ Value_2013 <dbl> 77901, 147798, 62688, 33992, 132022, 93218, 0, 6...
## $ Value_2014 <dbl> 25982, 104914, 70544, 15532, 204972, 31688, 0, 6...
## $ Value_2015 <dbl> 43341, 77607, 55067, 32940, 232255, 47770, 0, 61...
## $ Value_2016 <dbl> 38810, 84380, 135545, 31159, 265707, 54832, 0, 7...
## $ Value_2017 <dbl> 78121, 157861, 73275, 23684, 164565, 34618, 2364...
## $ PcodeTwo <chr> "26", "72", "27", "44", "07", "10", "27", "27", ...
## $ X__2 <chr> "Ores, slag and ash", "Iron and steel", "Mineral...
glimpse(export_jTable)
## Observations: 12,223
## Variables: 16
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ Product_Code <chr> "TOTAL", "7214", "6907", "2523", "7208", "6802",...
## $ Unit <chr> "No quantity", "Tons", "Tons", "Tons", "Tons", "...
## $ Quantity_2013 <dbl> NA, 131711, 1795, 22000, 220, 46950, 19782, 2821...
## $ Quantity_2014 <dbl> NA, 133195, 2560, 54210, 157114, 49668, 38396, 1...
## $ Quantity_2015 <dbl> NA, 1085, 2291, 78195, 10756, 38561, 29677, 4219...
## $ Quantity_2016 <dbl> NA, 1759, 1935, 62500, 38354, 33166, 19759, 2239...
## $ Quantity_2017 <dbl> 0, 220831, 140214, 129930, 80459, 35413, 34646, ...
## $ Product_Label <chr> NA, "Bars and rods, of iron or non-alloy steel, ...
## $ Value_2013 <dbl> NA, 73607, 895, 1531, 211, 40420, 16625, 37956, ...
## $ Value_2014 <dbl> NA, 73814, 1227, 3961, 96558, 46362, 32160, 1484...
## $ Value_2015 <dbl> NA, 616, 1052, 5810, 5033, 37154, 23569, 6140, 0...
## $ Value_2016 <dbl> NA, 863, 832, 4590, 18777, 30918, 12319, 24849, ...
## $ Value_2017 <dbl> NA, 106068, 40668, 8801, 44261, 30766, 27136, 27...
## $ PcodeTwo <chr> "TO", "72", "69", "25", "72", "68", "73", "73", ...
## $ X__2 <chr> NA, "Iron and steel", "Ceramic products", "Salt;...
Both tables now have 16 variables and 12,223 rows.
But there is a row in Product_Code column as “TOTAL”, we must clear it.
export_jTable <- export_jTable %>% filter(!grepl("TOTAL", Product_Code))
import_jTable <- import_jTable %>% filter(!grepl("TOTAL", Product_Code))
We have 16 variables in these tables however quantity and value variables repeat for each year. So actually, it is better to organize the table that it will have variables “Year”, “Quantity”, “Value” instead of 10 columns.
#create new objects for each years' quantity
q2013<-import_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2013) %>% mutate(year="2013") %>% rename("Quantity"=Quantity_2013)
q2014<-import_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2014) %>% mutate(year="2014") %>% rename("Quantity"=Quantity_2014)
q2015<-import_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2015) %>% mutate(year="2015") %>% rename("Quantity"=Quantity_2015)
q2016<-import_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2016) %>% mutate(year="2016") %>% rename("Quantity"=Quantity_2016)
q2017<-import_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2017) %>% mutate(year="2017") %>% rename("Quantity"=Quantity_2017)
#bind the objects
importQuant <- bind_rows(q2013,q2014,q2015,q2016,q2017)
#create new objects for each years' values
v2013<-import_jTable %>% select(Country,Product_Code, Value_2013) %>% rename("Values"=Value_2013) %>% mutate(year="2013")
v2014<-import_jTable %>% select(Country,Product_Code, Value_2014) %>% rename("Values"=Value_2014) %>% mutate(year="2014")
v2015<-import_jTable %>% select(Country,Product_Code, Value_2015) %>% rename("Values"=Value_2015) %>% mutate(year="2015")
v2016<-import_jTable %>% select(Country,Product_Code, Value_2016) %>% rename("Values"=Value_2016) %>% mutate(year="2016")
v2017<-import_jTable %>% select(Country,Product_Code, Value_2017) %>% rename("Values"=Value_2017) %>% mutate(year="2017")
#bind the objects
importValues <- bind_rows(v2013,v2014,v2015,v2016,v2017)
#join binded quantity and value
import_jTableMELTED <- importQuant %>% left_join(importValues, by=c("Country","Product_Code","year"))
#Repeat with export table
q2013<-export_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2013) %>% mutate(year="2013") %>% rename("Quantity"=Quantity_2013)
q2014<-export_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2014) %>% mutate(year="2014") %>% rename("Quantity"=Quantity_2014)
q2015<-export_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2015) %>% mutate(year="2015") %>% rename("Quantity"=Quantity_2015)
q2016<-export_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2016) %>% mutate(year="2016") %>% rename("Quantity"=Quantity_2016)
q2017<-export_jTable %>% select(Country, PcodeTwo, X__2 ,Product_Code, Product_Label, Unit, Quantity_2017) %>% mutate(year="2017") %>% rename("Quantity"=Quantity_2017)
exportQuant <- bind_rows(q2013,q2014,q2015,q2016,q2017)
v2013<-export_jTable %>% select(Country,Product_Code, Value_2013) %>% rename("Values"=Value_2013) %>% mutate(year="2013")
v2014<-export_jTable %>% select(Country,Product_Code, Value_2014) %>% rename("Values"=Value_2014) %>% mutate(year="2014")
v2015<-export_jTable %>% select(Country,Product_Code, Value_2015) %>% rename("Values"=Value_2015) %>% mutate(year="2015")
v2016<-export_jTable %>% select(Country,Product_Code, Value_2016) %>% rename("Values"=Value_2016) %>% mutate(year="2016")
v2017<-export_jTable %>% select(Country,Product_Code, Value_2017) %>% rename("Values"=Value_2017) %>% mutate(year="2017")
exportValues <- bind_rows(v2013,v2014,v2015,v2016,v2017)
export_jTableMELTED <- exportQuant %>% left_join(exportValues, by=c("Country","Product_Code","year"))
Now, we have more useful tables, with 9 variables and 61100 rows.
glimpse(export_jTableMELTED)
## Observations: 61,100
## Variables: 9
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ PcodeTwo <chr> "72", "69", "25", "72", "68", "73", "73", "26", ...
## $ X__2 <chr> "Iron and steel", "Ceramic products", "Salt; sul...
## $ Product_Code <chr> "7214", "6907", "2523", "7208", "6802", "7306", ...
## $ Product_Label <chr> "Bars and rods, of iron or non-alloy steel, not ...
## $ Unit <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity <dbl> 131711, 1795, 22000, 220, 46950, 19782, 28212, N...
## $ year <chr> "2013", "2013", "2013", "2013", "2013", "2013", ...
## $ Values <dbl> 73607, 895, 1531, 211, 40420, 16625, 37956, 0, 0...
glimpse(import_jTableMELTED)
## Observations: 61,100
## Variables: 9
## $ Country <chr> "Canada", "Canada", "Canada", "Canada", "Canada"...
## $ PcodeTwo <chr> "26", "72", "27", "44", "07", "10", "27", "27", ...
## $ X__2 <chr> "Ores, slag and ash", "Iron and steel", "Mineral...
## $ Product_Code <chr> "2601", "7204", "2701", "4401", "0713", "1001", ...
## $ Product_Label <chr> "Iron ores and concentrates, incl. roasted iron ...
## $ Unit <chr> "Tons", "Tons", "Tons", "Tons", "Tons", "Tons", ...
## $ Quantity <dbl> 465332, 373154, 338707, 313714, 194731, 230375, ...
## $ year <chr> "2013", "2013", "2013", "2013", "2013", "2013", ...
## $ Values <dbl> 77901, 147798, 62688, 33992, 132022, 93218, 0, 6...