Course Essentials
Group Projects
- Group project guidelines. (Updated! 2020-12-18)
Week 7 (Dec 30, 2020)
Guest Lecture Önder Akar CEO at smartPulse Technologies
Presentations!
Week 6 (Dec 16, 2020)
Guest lecture by Pınar Dursun, Postdoctoral Research Fellow at Sloan Kettering Institute & Umut Gündüz, Quant at a NYC fintech startup.
In Class Exercise: Check the NYC Airbnb Data Kaggle Page for notebooks applying machine learning algorithms to the data. Pick one notebook, give the link, and explain the process steps on your Progress Journals (you do not need to write code).
Assignment 3 (individual) (Due Date: Dec 24, 2020 23:59) These are 3 individual data sets / assignments. You may do all of them, but choose one to report. Add the assignment to your individual Progress Journals. If you add more than one assignment to your PJ, state the one you want to be graded. (p.s. Those data sets are popular on internet. If you find an inspiration, please state it in a references section with links.)
- Assginment 3.1: Esoph and Youth Survey (html | pdf)
- Assginment 3.2: Spam Data (html | pdf)
- Assginment 3.3: Diamonds Data (html | pdf)
Resources
Week 5 (Dec 2, 2020)
Guest Lecture Cem Vardar Senior Director, Decision Support Systems at Carvana
-
- In Class Exercise: Try the exercises at the end of the rvest mini tutorial.
- Joins tutorial
- Mini tutorial on pivot longer/wider
- In-Class Exercise data: ATP World Tour 2017 Tennis data (RData) (Source) (Example analysis)
- In-class exercise (see the desired outputs on dummy data)
- Create a matrix (or data frame) of “top 20” (top winners by quantity) players with the number matches among themselves as the value. Plot a heatmap of this matrix.
- Create a matrix (or data frame) of top 20 players with win percentages (rows are winners).
- Do the same for top 5 countries.
- In-class exercise (see the desired outputs on dummy data)
Week 4 (Nov 18, 2020)
- This week we will learn about
shiny
to create interactive dashboards on web browsers (official tutorial). Also see the Shiny Cheatsheet from RStudio. (Bonus: Check https://shinyapps.io to deploy your shiny apps) - Shiny in-class exercise starter code.
- Example run code from a Shiny application
shiny::runGitHub("BOUN-IE48A/boun-ie48a.github.io",subdir="files/shinyExample/")
- Bonus: You may look at shinydashboard for dashboard web apps.
- Bonus: You may look at shinyMobile for mobile optimized web apps.
Group Assignment: ISBIKE Data (Due Date: November 29, 2020 23:59)
isbike is the bike sharing service of Istanbul Metropolitan Municipality. This assignment requires you to take a snapshot of isbike station data and prepare an informative Shiny app with it.
- Download snapshot data from this link. (Data is originally taken from IBB Open Data Portal)
- Data is in text json format. You can use
fromJSON
function fromjsonlite
R package. - Design of the Shiny app is up to you, but it should be informative. It should provide both station metrics and overall summary metrics.
- You should deploy your app to shinyapps.io and provide a working link from your Group Progress Journal to be graded.
- Bonus: If you also provide a mobile version of your app using shinyMobile (You need to provide the plain version first, there should be two links).
Week 3 (Nov 4, 2020)
- Introduction to ggplot2
- Extra Material: Introduction to ggplot2 with election data
- Extra Material: Introduction to ggplot2 with weather data
In-Class Exercise (Bonus) (Due Date: Nov 4, 2020 21:30)
This exercise will provide up to 5% bonus to your final grades. You are required to present a brief summary report on Istanbul’s property market using official statistics. Your report should not be long but it should tell a good story. You are expected to use your dplyr & ggplot2 skills and present an HTML output generated by RMarkdown. It is highly recommended that you do the Datacamp or relevant reading before the lecture.
- Download Data Set. Please inspect the data set before you start coding. You will see some titles and explanations after the data (this is the raw export).
- You can use
read_xlsx
command of thereadxl
package. Usen_max
parameter to prevent explanations from disrupting dataframe order. - Your report should include an introduction with max 2-3 bullet points about your findings (e.g. “Istanbul’s property prices increased sharply during Summer 2020. I think that it is due to high demand because of cheap credit availability for that limited time.”). Support your findings with one, or at most two visualizations. Have a brief conclusion part.
- Reports not linked from your Progress Journals will not be graded.
- Bonus deadline is strict. But you are welcome to put your analysis to your Progress Journals afterwards. They will be taken into account to determine your letter grades regardless of the bonus.
Assignment: Electricity Market Prices (Due Date: Nov 12, 2020)
- Data source: EPIAS/EXIST MCP/SMP Page
- Assignment: Prepare a report about September 2020’s electricity prices using only MCP/SMP data using RMarkdown, dplyr and ggplot2.
- Download (1-30) September 2020 data from the source. (There is “download xls/csv” button at the bottom of the table on the left)
- Post the RMarkdown HTML output and upload it to your Progress Journal. (You need to give a link on your PJ to your assignment, otherwise it won’t be evaluated.)
- Use this tutorial to learn more about the data.
Week 2 (Oct 19, 2020)
Guest Lecture Kadir Malak Software Engineer at Tazi.ai
dplyr
has undergone significant changes in version 1.0.0. You might want to update. See all changes from this link.
- Introduction to dplyr v1.0.0
- Extra Material: Introduction to dplyr with election data
- Extra Material (in Turkish): R ile Veri Analizi 101
Week 1 (Oct 7, 2020)
- Introduction to BDA503
- RMarkdown Homework (Deadline Oct 21, 18:30): Prepare an RMarkdown document. Introduce yourself in one paragraph (Your name surname, your work, your data interests and how you (plan to) use data science skills in your current/future work). Plus, add your Linkedin account link. Watch some UseR-2020 videos (Main Link - YouTube Link) and write one of them down on your RMarkdown document. Find 3 R posts relevant to your interests and describe them. Get the html output and put it in your progress journal repository. Provide link from your Progress Journal page. Click for (Example 1) and Example 2.
-
Form teams of 4-5 and prepare for major projects (we will discuss in week 2).
- Cheat Sheet Heaven
- Introduction to R - Brief Presentation
- Introduction to R (html | pdf)
- R Fundamentals Exercises (solutions)
Week 0
- Some light reading about the previous year. (Read on Blog)
- Some light reading about instructor’s view on R. (Read on Blog)
- Some light reading about Progress Journals of previous years. (Read on Blog)
This course benefits from DataCamp for the Classroom program. See details here.
Course Archive
Miscellaneous
Data Sets for Prospective Projects
- YÖK
https://yokatlas.yok.gov.tr/ https://istatistik.yok.gov.tr/
- ÖSYS
http://www.osym.gov.tr/TR,6552/sureli-yayinlar.html
- EPİAŞ
https://seffaflik.epias.com.tr/transparency/
- SPK
- Merkez Bankası - CBRT
- Emeklilik Gözetim Merkezi
http://www.egm.org.tr/?pid=351
- TURKSTAT - TUIK
http://www.tuik.gov.tr/Start.do
http://www.tuik.gov.tr/takvim/tkvim.zul?submenuheader=0#tb1
Extra Materials
For audiovisual learners, some webinars here.
dplyr
- Official dplyr tutorial
- dplyr join functions
- dplyr join functions official tutorial
- dplyr Cheat Sheet
ggplot2
RMarkdown
- Introduction to RMarkdown - Official
- R4DS Book - Communication
- DataCamp - Authoring R Markdown Reports Free Part
- RMarkdown Cheat Sheet
Shiny
RStudio
External Good Resources About R and Data Science
- Introduction to Statistical Learning
- R for Data Science
- R’a Hızlı Giriş (Türkçe)
- The Elements of Statistical Learning
- Advanced R
- Bookdown Compilation
- Akademik Bilişim 2017 - R ile Veri Analizi Dersi
- BOUN-FE 522
- Learn X in Y Minutes - R
- dplyr vignettes
- ggplot2 workshop
- RStudio Cheat Sheets (Base R, dplyr, ggplot2, RMarkdown etc.)
- R Reference Cards
- data.table Cheat Sheet