Projects and Project Guidelines
Introduction
You are going to apply all you have learned in this lecture on a real life data analysis project. Each year has a theme and each group is supposed to use a comprehensive dataset from a source. You are going to learn by yourself,
- How to extract data
- How to preprocess
- How to analyze
- How to communicate and present your findings
Phases and Deadlines
- Project Proposal (Nov 9, 2023)
- Preparation of the preprocessed data (Nov 16, 2023)
- Exploratory Data Analysis (Dec 14, 2023)
- Rehearsal (Dec 28, 2023)
- Blog Post & Teaser (Dec 28, 2023)
- Final Project and Presentation (Jan 4, 2024)
Final presentation deadline is strict, but you will be able to revise your final project document.
Potential Projects and Proposals
This year’s theme is “Türkiye compared to X”. Here are the candidate data sources (domains).
- World Development Indicators (World Bank)
- PISA (Education)
- Global Dietary Database (Nutrition)
- World Integrated Trade Solution (World Bank)
Each team will choose one data source and propose a project. Almost each data set, has a corresponding R package or R code. Otherwise there are download options. See the following links for some example solutions.
No two teams are allowed on the same source. First come first served (with your Project Proposals).
Project Proposals
One person for each group needs to write an email to specify the domain of their choice. They should provide a short reason about why they picked that domain, how they are going to tell a story of numbers and statistics, and finally which data sets they are going to use.
Your analyses are expected to span over multiple data sets and multiple years. Though, you should provide a high quality narrative with striking conclusions. Therefore your analyses should not be shallow or same lame code repeated over numerous data sets (e.g., bunch of yearly averages might mean nothing). Rule of thumb is to focus on one data set and supplement the story with 2-4 more data sets.
Good luck!
Project Guidelines
After your project proposal is accepted you need to prepare your projects according to these guidelines.
This document contains documentation and guidelines about the group project. You are expected to perform analysis with R on a real data set about Türkiye and present your findings on your Group Progress Journals using Quarto and Shiny. At the end of the group project you are expected to prepare a full report about your data and do a 15-minute presentation in class.
There might be addition to this guidelines, keep an eye on the updates.
Required Final Deliverables
- Preprocessed data in
RData
orrds
file format. Preprocessing code and explanations in HTML. - An EDA analysis
- A full report in both HTML and PDF generated by Quarto
- A presentation in Quarto (bonus) or Powerpoint.
- A Shiny app (deployed to shinyapps.io)
- Rehearsal video
- A Medium post (300-500 words) to introduce your project with a link to your GPJ.
- A 30 seconds video teaser uploaded to YouTube (Bonus)
Guidelines
Preprocessing
- Use .RData (or rds) format to store your preprocessed data. And give explicit link to your RData file in your GPJ and in your reports.
Some data sets need to be preprocessed before they are ready to analyze and it can take more than some steps from raw data (xlsx, csv etc.) to input data (preferable RData). Then, you can start your analysis from a clean input data.
You need to provide a preprocessing section on a separate preprocessing HTML document. Show the steps from the raw data
Both files should be accessible directly and explicit links should be provided.
It is also recommended to use eval/echo trick (i.e. eval=FALSE/echo=TRUE code block with relative path and eval=TRUE/echo=FALSE code block with true local path) to avoid hardcoded absolute paths (e.g. instead of C:/MyName/MyDocuments/myfile.csv we shall see pjournal.github.io/mygroup/myfile.csv).
Exploratory Data Analysis
Before giving your report a final shape, prepare an EDA report. In this document, you will be exploring as the name suggests. You will not be required to do any analysis and deductions, but you are asked to give a full picture of the data (what the columns mean, how the values are distributed, some ideas about the possible analyses). Keep it “not long” and add some plots. It is like a rehearsal to your full report.
Report
You can use any tool that is taught within the course or outside the course to enhance the “storytelling”. You are required to use Quarto HTML outputs at least to show your work on the Group Progress Journal (GPJ). It means you are not confined to dplyr+ggplot2. For instance, you may use data.table + dygraphs if you want to. (It is possible to get a bonus if the result is really good!)
Add more content to your analysis than code. Explain in not just code and plots but also in words. Make code available but collapsible if possible. (see example)
Styling and coherence of your GPJ and analysis is also important. You should prepare your analyses as you are preparing for the board. Minimal typos, neat structure and no long running data output in your final reports. Styling is 35%. Good styling is up to 15% bonus. Bad styling can affect your grade by -20% regardless of the content.
Important! Add Key Takeaways section to the top. It should contain no less than 3 and no more than 5 bullet points about your study. Key items include what is the topic and data used (link to both source raw data and analysis data), what aspects are important and what results are interesting, finally what is the main outcome.
About your GPJ’s: Please keep them clean and in a good UX order. Put your names, title of the project, and a brief description on the main page of the GPJ. A very good example is https://pjournal.github.io/mef04g-rhapsody/. Also, you can always improve.
Put a small paragraph explaining your project in your GPJs right under its title section.
Example on hypothetical ISBIKE project.Group Project: ISBIKE Analysis
We use bicycle station and utilization data of Istanbul Metropolitan Municipality bicycle services ISBIKE. Main objective of our analysis is to find out actionable insights about the placement and replenishment of bikes. These insights might help the municipality with their ISBIKE expansion plans to new locations and improve their services.
Please also put a PDF version of your final analysis on your GPJ.
State your phases in different links. Make it very easy for the user to navigate your GPJ. User experience matters.
Ethics rule: Good artists copy, best artists steal. Nevertheless, if you copy some code or idea from somewhere, please indicate the source explicitly with a direct link. It is never embarassing to adapt a good idea to a new use and tell about your source. Though, it is quite embarassing to be exposed. Referencing is encouraged, up to 10% bonus for good references.
Shiny App
You are also required to prepare a Shiny app and present an interactive environment. You should also make it very easy for the others to run your app via shiny::runGitHub function and deploy it to shinyapps.io (one per group is enough).
Don’t forget to provide link of your ShinyApps in your GPJ
Try to stick to first deadline; so, in any case you will have some slack to reevaluate.
Presentation
Upload your presentation to your GPJ. Consider going
{quarto}
. Adding a pdf version is recommended.State clearly who the group members are and either give a link to your progress journals or Linkedin profiles.
You need to submit a rehearsal video by the deadline. 10% of your project points will come from rehearsal videos. There are two prerequisites.
- It should be under 15 minutes (strict) and it should cover most of your actual presentation. This is a rehearsal so you don’t need to care much about perfection. Try to make it look good but do not spend a lot of time on retakes.
- Send videos directly to my email. It is not required to publish them on your GPJ (but it is encouraged if you want to).
Blog Post (Medium)
Briefly describe your project in a blog post fashion. Do not include code, just describe your project clearly and give a link to your analysis and your GPJ main page. Also provide a link to your blog post in your GPJs.
Select one representative in each group to submit to EDA Journal, “official” blog of BDA 503. Send an email to your instructor with your group name and the medium username of the representative.
Submit the article to EDA Journal (help). Your instructor will review and publish it. Then you may add the link to your GPH.
Grading Weights (Total 110%)
- Preprocessing Data and Report (10%)
- Exploratory Data Analysis (20%)
- Final Project and Presentation (Rehearsal included) (45%)
- Shiny App (15%)
- Medium Post (10%)
- BONUS: Teaser (10%)