Download Raw Data

Our raw excel file is in our project repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.

# Create a temporary file
tmp=tempfile(fileext=".xlsx")

# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.csv?raw=true",destfile=tmp,mode='wb')

# Read that csv file.
raw_data=read.csv(tmp,skip=1)

# Remove the temp file
file.remove(tmp)
## [1] TRUE

Data Preprocessing

# Remove unwanted columns, adjust column names and delete the rows type = "non-food" 
raw_data=
raw_data %>%
select(-X.country.name,-X.adm1.name,-X.adm1.code,-X.item.code,-X,-X.1,-X.2,-X.item.type.code,-X.meta.id,-X.3)
colnames(raw_data) = c("date","name","unit","type","value","currency","market")
raw_data = subset(raw_data,subset=type != "non-food")
# Add month and year columns 
raw_data=
raw_data %>%
mutate(date = ymd(date)) %>% 
mutate_at(vars(date), funs(year, month)) 

# Save organized file as ".rds"
saveRDS(raw_data, file = "C:/Ufuk/BDA_503_Data_Analytics_Essentials/Project/Dataset/wfp_food_prices_turkey.rds")

# Reading organized ".rds" file from GPJ 
# Create a temporary file
tmp=tempfile(fileext=".rds")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.rds?raw=true",destfile=tmp,mode='wb')
# Read that rds file.
raw_data=read_rds(tmp)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
head(raw_data)
##         date          name unit               type  value currency
## 1 2013-05-15 Rice - Retail   KG cereals and tubers 4.4920      TRY
## 2 2013-06-15 Rice - Retail   KG cereals and tubers 4.5786      TRY
## 3 2013-11-15 Rice - Retail   KG cereals and tubers 4.7865      TRY
## 4 2013-12-15 Rice - Retail   KG cereals and tubers 5.1337      TRY
## 5 2014-01-15 Rice - Retail   KG cereals and tubers 5.5099      TRY
## 6 2014-02-15 Rice - Retail   KG cereals and tubers 5.8146      TRY
##             market year month
## 1 National Average 2013     5
## 2 National Average 2013     6
## 3 National Average 2013    11
## 4 National Average 2013    12
## 5 National Average 2014     1
## 6 National Average 2014     2

Data Summary after Preprocessing

This dataset contains Food Prices data for Turkey. Food prices data comes from the World Food Programme (WFP) and covers 52 different kinds of foods within 8 main categories such as cereals and tubers, milk and dairy, oil and fats and vegetables and fruits for Turkey grouped as national average, Ankara, Istanbul and Izmir. It is updated weekly but contains to a large extent monthly data. The data goes back to May 15, 2013 and includes 4,718 rows and 15 columns. The main numerical information in dataset is the food price. The data is basically categorized into food category, market, months and units.

After organizing the data, the details of the column names & descriptions are represented below:

Yearly Average Price per Food Type

    raw_data %>% 
    group_by(type, year) %>%
    summarise(yearly_average_price=mean(value)) %>%
    ggplot(data = ., aes(x = type, y = yearly_average_price, 
    fill = as.character(year))) + geom_bar(stat = "identity") + aes(x = reorder(type, -yearly_average_price), 
    y =   yearly_average_price) + labs(x = "", y = "", title = "Yearly Average Price per Food Type") + theme_bw() + theme( axis.text.x = element_text(angle = 45, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +   guides(fill=guide_legend(title="Year", reverse=TRUE)) 

Overall Average Price based on Market

    raw_data %>% 
    group_by(type, market) %>%
    summarise(market_average_price=mean(value)) %>%
    ggplot(data = ., aes(x = type, y = market_average_price, 
    fill = as.character(market))) + geom_bar(stat = "identity") + aes(x = reorder(type, -market_average_price), 
    y =   market_average_price) + labs(x = "", y = "", title = "Overall Average Price based on Market") + theme_bw() + theme( axis.text.x = element_text(angle = 45, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +     guides(fill=guide_legend(title="Market", reverse=TRUE)) 

Average Price Development of Milk and Dairy

    raw_data %>% 
    filter(type=="milk and dairy") %>%
    group_by(name, year) %>%
    summarize(average_price=mean(value)) %>%
    ggplot(data = ., aes(x = name, y = average_price, 
    fill = as.character(year))) + geom_bar(stat = "identity") + aes(x = reorder(name, -average_price), 
    y =   average_price) + labs(x = "", y = "", title = "Average Price Development of Milk and Dairy") + theme_bw() + theme( axis.text.x = element_text(angle = 45, 
    vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) +   guides(fill=guide_legend(title="Year", reverse=TRUE)) 

The products with maximum price change

price_change = raw_data %>%
  filter(market=="National Average")%>%
  group_by(name, year) %>%
  summarise(yearly_average_price=mean(value)) %>%
  mutate(YoY_growth = (yearly_average_price/lag(yearly_average_price))^(1/(year-lag(year))) - 1) %>%
  arrange(desc(YoY_growth))
price_change
## # A tibble: 233 x 4
## # Groups:   name [46]
##    name                             year yearly_average_price YoY_growth
##    <fct>                           <dbl>                <dbl>      <dbl>
##  1 Eggplants - Retail               2017                 5.47      1.14 
##  2 Onions - Retail                  2018                 2.23      1.07 
##  3 Cucumbers (greenhouse) - Retail  2017                 4.38      0.904
##  4 Potatoes - Retail                2018                 2.28      0.812
##  5 Oil (olive) - Retail             2015                21.6       0.697
##  6 Oranges - Retail                 2018                 2.70      0.655
##  7 Cheese - Retail                  2018                24.2       0.566
##  8 Fish (fresh) - Retail            2018                23.7       0.435
##  9 Meat (mutton) - Retail           2018                47.4       0.432
## 10 Bananas - Retail                 2018                 8.54      0.422
## # ... with 223 more rows
price_change %>% 
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>% 
  ggplot(aes(x = year, y = yearly_average_price, group=name, colour=name  )) + 
  geom_line() + 
  geom_point()+
  labs( x="Year" , y = "Yearly Average Price")+ 
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10)) 

References: