Data Description

This dataset contains Food Prices data for Turkey. Food prices data comes from the World Food Programme (WFP) and covers 52 different kinds of foods within 8 main categories such as cereals and tubers, milk and dairy, oil and fats and vegetables and fruits for Turkey grouped as national average, Ankara, Istanbul and Izmir. The data goes back to May 15, 2013 and includes 4,718 rows and 15 columns. The main numerical information in dataset is the food price. The data is basically categorized into food category, market, months and units.

The descriptions of the raw data are as below:

Loading The Data

Our raw excel file is in our project repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.

# Create a temporary file
tmp=tempfile(fileext=".xlsx")

# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.csv?raw=true",destfile=tmp,mode='wb')

# Read that csv file.
raw_data=read.csv(tmp,skip=1)

# Remove the temp file
file.remove(tmp)
## [1] TRUE

Data Preprocessing

# Remove unwanted columns, adjust column names and delete the rows type = "non-food" 
refined_data=
raw_data %>%
select(-X.country.name,-X.adm1.name,-X.adm1.code,-X.item.code,-X,-X.1,-X.2,-X.item.type.code,-X.meta.id,-X.3)
colnames(refined_data) = c("date","name","unit","type","value","currency","market")
refined_data = subset(refined_data,subset=type != "non-food")
# Add month and year columns 
refined_data=
refined_data %>%
mutate(date = ymd(date)) %>% 
mutate_at(vars(date), funs(year, month)) 

# Save organized file as ".rds"
#saveRDS(raw_data, file = "/Users/serhansuer/Desktop/wfp_food_prices_turkey.rds")

# Reading organized ".rds" file from GPJ 
# Create a temporary file
tmp=tempfile(fileext=".rds")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.rds?raw=true",destfile=tmp,mode='wb')
# Read that rds file.
refined_data=read_rds(tmp)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
head(refined_data)
##         date          name unit               type  value currency
## 1 2013-05-15 Rice - Retail   KG cereals and tubers 4.4920      TRY
## 2 2013-06-15 Rice - Retail   KG cereals and tubers 4.5786      TRY
## 3 2013-11-15 Rice - Retail   KG cereals and tubers 4.7865      TRY
## 4 2013-12-15 Rice - Retail   KG cereals and tubers 5.1337      TRY
## 5 2014-01-15 Rice - Retail   KG cereals and tubers 5.5099      TRY
## 6 2014-02-15 Rice - Retail   KG cereals and tubers 5.8146      TRY
##             market year month
## 1 National Average 2013     5
## 2 National Average 2013     6
## 3 National Average 2013    11
## 4 National Average 2013    12
## 5 National Average 2014     1
## 6 National Average 2014     2

Data Summary after Preprocessing

After organizing the data, the details of the column names & descriptions are represented below:

Yearly Average Price per Food Type

Since there are many foods in the data, to understand which food type we need to focus on for further analysis, in this graph we checked average prices per food type.

This graph shows that “milk and dairy” is the first type of foods that significant price changes occured, followed by “meat, fish and eggs”. Also “oil and fats” average prices increased after these two. The others’ average prices looks close to previous years.

refined_data %>% 
    group_by(type, year) %>%
    summarise(yearly_average_price=mean(value)) %>%
    ggplot(data = ., aes(x = type, y = yearly_average_price, 
    fill = as.character(year))) + geom_bar(stat = "identity", position=position_dodge()) + aes(x = type, 
    y =   yearly_average_price) + labs(x = "", y = "", title = "Yearly Average Price per Food Type") + theme_bw() + theme( axis.text.x = element_text(angle = 65, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +   guides(fill=guide_legend(title="Year"))

Overall Average Price based on Market

When we checked the overall prices by market we can say that national average is below Istanbul, Ankara and Izmir but these three have close price levels.

    refined_data %>% 
    group_by(type, market) %>%
    summarise(market_average_price=mean(value)) %>%
    ggplot(data = ., aes(x = type, y = market_average_price, 
    fill = as.character(market))) + geom_bar(stat = "identity", position = position_dodge()) + aes(x = type, 
    y =   market_average_price) + labs(x = "", y = "", title = "Overall Average Price based on Market") + theme_bw() + theme( axis.text.x = element_text(angle = 65, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +     guides(fill=guide_legend(title="Market")) 

Average Price Development of Milk and Dairy

We wanted to see further details of foods under “milk and dairy” which is the first type of foods that significant price changes occured.

refined_data %>% 
    filter(type=="milk and dairy") %>%
    group_by(name, year) %>%
    summarize(average_price=mean(value)) %>%
    ggplot(data = ., aes(x = name, y = average_price, 
    fill = as.character(year))) + geom_bar(stat = "identity", position = position_dodge()) + aes(x = name, 
    y =   average_price) + labs(x = "", y = "", title = "Average Price Development of Milk and Dairy") + theme_bw() + theme( axis.text.x = element_text(angle = 45, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +   guides(fill=guide_legend(title="Year"))  

The products with maximum price change

We were interested in yearly average price changes per food, so we added YoY_growth column and sorted the data according to YoY_growth to see the highest price changes.

price_change = refined_data %>%
  filter(market=="National Average")%>%
  group_by(name, year) %>%
  summarise(yearly_average_price=mean(value)) %>%
  mutate(YoY_growth = (yearly_average_price/lag(yearly_average_price))^(1/(year-lag(year))) - 1) %>%
  arrange(desc(YoY_growth))
price_change
## # A tibble: 233 x 4
## # Groups:   name [46]
##    name                             year yearly_average_price YoY_growth
##    <fct>                           <dbl>                <dbl>      <dbl>
##  1 Eggplants - Retail               2017                 5.47      1.14 
##  2 Onions - Retail                  2018                 2.23      1.07 
##  3 Cucumbers (greenhouse) - Retail  2017                 4.38      0.904
##  4 Potatoes - Retail                2018                 2.28      0.812
##  5 Oil (olive) - Retail             2015                21.6       0.697
##  6 Oranges - Retail                 2018                 2.70      0.655
##  7 Cheese - Retail                  2018                24.2       0.566
##  8 Fish (fresh) - Retail            2018                23.7       0.435
##  9 Meat (mutton) - Retail           2018                47.4       0.432
## 10 Bananas - Retail                 2018                 8.54      0.422
## # ... with 223 more rows

We also added a trend graph for top 5 food price changes and saw that eggplants, onions, cucumbers, potatoes and olive oil had the maximum price changes. In our first analysis, we saw that “milk and dairy” and “meat, fish and eggs” categories had significant price changes. However, when we made the analysis on food detail, we saw that none of top 5 belongs to “milk and dairy” and “meat, fish and eggs” categories.

price_change %>% 
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>% 
  ggplot(aes(x = year, y = yearly_average_price, group=name, colour=name)) + 
  geom_line() + 
  geom_point()+
  labs( x="" , y = "Yearly Average Price")+ 
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10)) 

For a better visibility of price changes, we also added a trend graph for top 5 food price changes (YoY_growth rates).

price_change %>% 
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>% 
  filter(YoY_growth!="NA") %>% 
  ggplot(aes(x = year, y = YoY_growth, group=name, colour=name)) + 
  geom_line() + 
  geom_point()+
  labs( x="" , y = "YoY_growth")+ 
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10)) 

References: