Our raw excel file is in our project repository. We can automatically download that file and put it in a temporary file. Then we can read that excel document into R and remove the temp file.
# Create a temporary file
tmp=tempfile(fileext=".xlsx")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.csv?raw=true",destfile=tmp,mode='wb')
# Read that csv file.
raw_data=read.csv(tmp,skip=1)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
# Remove unwanted columns, adjust column names and delete the rows type = "non-food"
raw_data=
raw_data %>%
select(-X.country.name,-X.adm1.name,-X.adm1.code,-X.item.code,-X,-X.1,-X.2,-X.item.type.code,-X.meta.id,-X.3)
colnames(raw_data) = c("date","name","unit","type","value","currency","market")
raw_data = subset(raw_data,subset=type != "non-food")
# Add month and year columns
raw_data=
raw_data %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month))
# Save organized file as ".rds"
saveRDS(raw_data, file = "C:/Ufuk/BDA_503_Data_Analytics_Essentials/Project/Dataset/wfp_food_prices_turkey.rds")
# Reading organized ".rds" file from GPJ
# Create a temporary file
tmp=tempfile(fileext=".rds")
# Download file from repository to the temp file
download.file("https://github.com/MEF-BDA503/gpj18-data-r-sizlari/blob/master/wfp_food_prices_turkey.rds?raw=true",destfile=tmp,mode='wb')
# Read that rds file.
raw_data=read_rds(tmp)
# Remove the temp file
file.remove(tmp)
## [1] TRUE
head(raw_data)
## date name unit type value currency
## 1 2013-05-15 Rice - Retail KG cereals and tubers 4.4920 TRY
## 2 2013-06-15 Rice - Retail KG cereals and tubers 4.5786 TRY
## 3 2013-11-15 Rice - Retail KG cereals and tubers 4.7865 TRY
## 4 2013-12-15 Rice - Retail KG cereals and tubers 5.1337 TRY
## 5 2014-01-15 Rice - Retail KG cereals and tubers 5.5099 TRY
## 6 2014-02-15 Rice - Retail KG cereals and tubers 5.8146 TRY
## market year month
## 1 National Average 2013 5
## 2 National Average 2013 6
## 3 National Average 2013 11
## 4 National Average 2013 12
## 5 National Average 2014 1
## 6 National Average 2014 2
This dataset contains Food Prices data for Turkey. Food prices data comes from the World Food Programme (WFP) and covers 52 different kinds of foods within 8 main categories such as cereals and tubers, milk and dairy, oil and fats and vegetables and fruits for Turkey grouped as national average, Ankara, Istanbul and Izmir. It is updated weekly but contains to a large extent monthly data. The data goes back to May 15, 2013 and includes 4,718 rows and 15 columns. The main numerical information in dataset is the food price. The data is basically categorized into food category, market, months and units.
After organizing the data, the details of the column names & descriptions are represented below:
raw_data %>%
group_by(type, year) %>%
summarise(yearly_average_price=mean(value)) %>%
ggplot(data = ., aes(x = type, y = yearly_average_price,
fill = as.character(year))) + geom_bar(stat = "identity") + aes(x = reorder(type, -yearly_average_price),
y = yearly_average_price) + labs(x = "", y = "", title = "Yearly Average Price per Food Type") + theme_bw() + theme( axis.text.x = element_text(angle = 45,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Year", reverse=TRUE))
raw_data %>%
group_by(type, market) %>%
summarise(market_average_price=mean(value)) %>%
ggplot(data = ., aes(x = type, y = market_average_price,
fill = as.character(market))) + geom_bar(stat = "identity") + aes(x = reorder(type, -market_average_price),
y = market_average_price) + labs(x = "", y = "", title = "Overall Average Price based on Market") + theme_bw() + theme( axis.text.x = element_text(angle = 45,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Market", reverse=TRUE))
raw_data %>%
filter(type=="milk and dairy") %>%
group_by(name, year) %>%
summarize(average_price=mean(value)) %>%
ggplot(data = ., aes(x = name, y = average_price,
fill = as.character(year))) + geom_bar(stat = "identity") + aes(x = reorder(name, -average_price),
y = average_price) + labs(x = "", y = "", title = "Average Price Development of Milk and Dairy") + theme_bw() + theme( axis.text.x = element_text(angle = 45,
vjust = 0.5, hjust = 0.5, size = 12)) + scale_y_continuous(labels = scales::comma) + guides(fill=guide_legend(title="Year", reverse=TRUE))
price_change = raw_data %>%
filter(market=="National Average")%>%
group_by(name, year) %>%
summarise(yearly_average_price=mean(value)) %>%
mutate(YoY_growth = (yearly_average_price/lag(yearly_average_price))^(1/(year-lag(year))) - 1) %>%
arrange(desc(YoY_growth))
price_change
## # A tibble: 233 x 4
## # Groups: name [46]
## name year yearly_average_price YoY_growth
## <fct> <dbl> <dbl> <dbl>
## 1 Eggplants - Retail 2017 5.47 1.14
## 2 Onions - Retail 2018 2.23 1.07
## 3 Cucumbers (greenhouse) - Retail 2017 4.38 0.904
## 4 Potatoes - Retail 2018 2.28 0.812
## 5 Oil (olive) - Retail 2015 21.6 0.697
## 6 Oranges - Retail 2018 2.70 0.655
## 7 Cheese - Retail 2018 24.2 0.566
## 8 Fish (fresh) - Retail 2018 23.7 0.435
## 9 Meat (mutton) - Retail 2018 47.4 0.432
## 10 Bananas - Retail 2018 8.54 0.422
## # ... with 223 more rows
price_change %>%
filter(name=="Eggplants - Retail" | name=="Onions - Retail" | name=="Cucumbers (greenhouse) - Retail" | name=="Potatoes - Retail" | name== "Oil (olive) - Retail" ) %>%
ggplot(aes(x = year, y = yearly_average_price, group=name, colour=name )) +
geom_line() +
geom_point()+
labs( x="Year" , y = "Yearly Average Price")+
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.0, size = 10))