This dataset contains Food Prices data for Turkey. Food prices data comes from the World Food Programme (WFP) and covers 52 different kinds of foods within 8 main categories such as cereals and tubers, milk and dairy, oil and fats and vegetables and fruits for Turkey grouped as national average, Ankara, Istanbul and Izmir. The data goes back to May 15, 2013 and includes 4,718 rows and 15 columns. The main numerical information in dataset is the food price. The data is basically categorized into food category, market, months and units.
The descriptions of the raw data are as below:
Our raw excel file is in our project repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.
# Create a temporary file
tmp=tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.csv?raw=true",destfile=tmp,mode='wb')
# Read that csv file.
raw_data=read.csv(tmp,skip=1)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove unwanted columns, adjust column names and delete the rows type = "non-food"
refined_data=
raw_data %>%
select(-X.country.name,-X.adm1.name,-X.adm1.code,-X.item.code,-X,-X.1,-X.2,-X.item.type.code,-X.meta.id,-X.3)
colnames(refined_data) = c("date","name","unit","type","value","currency","market")
refined_data = subset(refined_data,subset=type != "non-food")
# Add month and year columns
refined_data=
refined_data %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month))
# Save organized file as ".rds"
#saveRDS(raw_data, file = "/Users/serhansuer/Desktop/wfp_food_prices_turkey.rds")
# Reading organized ".rds" file from GPJ
# Create a temporary file
tmp=tempfile(fileext=".rds")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.rds?raw=true",destfile=tmp,mode='wb')
# Read that rds file.
refined_data=read_rds(tmp)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
head(refined_data)
## date name unit type value currency
## 1 2013-05-15 Rice - Retail KG cereals and tubers 4.4920 TRY
## 2 2013-06-15 Rice - Retail KG cereals and tubers 4.5786 TRY
## 3 2013-11-15 Rice - Retail KG cereals and tubers 4.7865 TRY
## 4 2013-12-15 Rice - Retail KG cereals and tubers 5.1337 TRY
## 5 2014-01-15 Rice - Retail KG cereals and tubers 5.5099 TRY
## 6 2014-02-15 Rice - Retail KG cereals and tubers 5.8146 TRY
## market year month
## 1 National Average 2013 5
## 2 National Average 2013 6
## 3 National Average 2013 11
## 4 National Average 2013 12
## 5 National Average 2014 1
## 6 National Average 2014 2
After organizing the data, the details of the column names & descriptions are represented below:
Since there are many foods in the data, to understand which food type we need to focus on for further analysis, in this graph we checked average prices per food type.
This graph shows that “milk and dairy” is the first type of foods that significant price changes occured, followed by “meat, fish and eggs”. Also “oil and fats” average prices increased after these two. The others’ average prices looks close to previous years.
refined_data %>%
group_by(type, year) %>%
summarise(yearly_average_price=mean(value)) %>%
ggplot(data = ., aes(x = type, y = yearly_average_price,
fill = as.character(year))) + geom_bar(stat = "identity", position=position_dodge()) + aes(x = type,
y = yearly_average_price) + labs(x = "", y = "", title = "Yearly Average Price per Food Type") + theme_bw() + theme( axis.text.x = element_text(angle = 65,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Year"))
When we checked the overall prices by market we can say that national average is below Istanbul, Ankara and Izmir but these three have close price levels.
refined_data %>%
group_by(type, market) %>%
summarise(market_average_price=mean(value)) %>%
ggplot(data = ., aes(x = type, y = market_average_price,
fill = as.character(market))) + geom_bar(stat = "identity", position = position_dodge()) + aes(x = type,
y = market_average_price) + labs(x = "", y = "", title = "Overall Average Price based on Market") + theme_bw() + theme( axis.text.x = element_text(angle = 65,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Market"))
We wanted to see further details of foods under “milk and dairy” which is the first type of foods that significant price changes occured.
refined_data %>%
filter(type=="milk and dairy") %>%
group_by(name, year) %>%
summarize(average_price=mean(value)) %>%
ggplot(data = ., aes(x = name, y = average_price,
fill = as.character(year))) + geom_bar(stat = "identity", position = position_dodge()) + aes(x = name,
y = average_price) + labs(x = "", y = "", title = "Average Price Development of Milk and Dairy") + theme_bw() + theme( axis.text.x = element_text(angle = 45,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Year"))
We were interested in yearly average price changes per food, so we added YoY_growth column and sorted the data according to YoY_growth to see the highest price changes.
price_change = refined_data %>%
filter(market=="National Average")%>%
group_by(name, year) %>%
summarise(yearly_average_price=mean(value)) %>%
mutate(YoY_growth = (yearly_average_price/lag(yearly_average_price))^(1/(year-lag(year))) - 1) %>%
arrange(desc(YoY_growth))
price_change
## # A tibble: 233 x 4
## # Groups: name [46]
## name year yearly_average_price YoY_growth
## <fct> <dbl> <dbl> <dbl>
## 1 Eggplants - Retail 2017 5.47 1.14
## 2 Onions - Retail 2018 2.23 1.07
## 3 Cucumbers (greenhouse) - Retail 2017 4.38 0.904
## 4 Potatoes - Retail 2018 2.28 0.812
## 5 Oil (olive) - Retail 2015 21.6 0.697
## 6 Oranges - Retail 2018 2.70 0.655
## 7 Cheese - Retail 2018 24.2 0.566
## 8 Fish (fresh) - Retail 2018 23.7 0.435
## 9 Meat (mutton) - Retail 2018 47.4 0.432
## 10 Bananas - Retail 2018 8.54 0.422
## # ... with 223 more rows
We also added a trend graph for top 5 food price changes and saw that eggplants, onions, cucumbers, potatoes and olive oil had the maximum price changes. In our first analysis, we saw that “milk and dairy” and “meat, fish and eggs” categories had significant price changes. However, when we made the analysis on food detail, we saw that none of top 5 belongs to “milk and dairy” and “meat, fish and eggs” categories.
price_change %>%
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>%
ggplot(aes(x = year, y = yearly_average_price, group=name, colour=name)) +
geom_line() +
geom_point()+
labs( x="" , y = "Yearly Average Price")+
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10))
For a better visibility of price changes, we also added a trend graph for top 5 food price changes (YoY_growth rates).
price_change %>%
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>%
filter(YoY_growth!="NA") %>%
ggplot(aes(x = year, y = YoY_growth, group=name, colour=name)) +
geom_line() +
geom_point()+
labs( x="" , y = "YoY_growth")+
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10))