About the Course

Introduction

MEF BDA 503 Data Analytics Essentials is a grad level data analytics course taught at MEF University (Istanbul/Turkey) under Big Data Analytics (BDA) Masters programme.

BDA 503 is one of the core courses of the BDA programme. The course is active since 2016. There is a dedicated course webpage since 2017 (see the archive).

BDA 503’s main objective is to teach data analytics at a fundamental level. Implementation stands on three core principles: Analysis, Communication and Reproducibility. The course is designed to teach all these skills to students at a limited time.

There is no prerequisite of any kind, even prior coding experience. Students of this course come from all sorts of backgrounds. My personal opinion is those who work with Excel frequently, benefit from the course the most.

Main language of the course is English. Main programming language is R.

Core Principles

  • Analysis is taking raw data and extracting meaningful insights from the data. Main purpose is to teach Exploratory Data Analysis. In this course tidyverse packages, especially dplyr and ggplot2, are taught and used.

  • Communication is to take the conducted analysis and convert it to a communication material (e.g. a report) for the stakeholders to understand the findings from the raw data. Two main R packages are used and taught: RMarkdown and Shiny. Quarto will gradually replace RMarkdown starting from Fall 2022.

  • Reproducibility is to recreate the analysis and findings using the same data and code. All of us do a lot of ad-hoc analyses and report findings based on the results of those analyses. It is actually a bad practice. Therefore all assignments and projects are done with RMarkdown/Quarto to keep code and analysis together.

In this course “stakeholders” are the public, therefore all work is kept in Github repositories called “Progress Journals” and findings are displayed using Github Pages. Reproducibility and communication is achieved (up to a point) this way.

Progress

Although the course itself evolves continuously, underlying process remains largely the same.

  • Some light reading before the course starts
  • 7-week biweekly lectures by the instructor
    • Minimal lecturing
    • In-class assignments
  • Guest lectures from the technical and business side of the data analytics . Mainly startup founders, deeply technical people, and analysts and managers of corporates.
  • Individual and group assignments
  • Student group project presentations
  • A take-home Final

Lecture Progress

  • Lecture 1 - Introduction and Setup: Introduction to the course, setting up Github accounts and Progress Journals. Some base R functionality. First assignment is always an introduction page with RMarkdown/Quarto on Progress Journals.
  • Lecture 2 - Introduction to dplyr: First analysis tool is dplyr. Students take (rectangular) raw data and conduct simple analysis with the fundamental dplyr commands such as select/rename, filter, arrange, mutate/transmute and group_by/summarize.
  • Lecture 3 - Introduction to ggplot2: When simple analysis is learned, it is time for visualization. There is usually a guest lecturer.
  • Lecture 4 - Introduction to Shiny: After learning static reporting with dplyr, ggplot2 and Quarto/RMarkdown it is time for interactive dashboards. There is usually a guest lecturer.
  • Lecture 5 - Further into data analysis: It is time to do analysis with multiple data sets and advanced transformations. Students learn about joining tables and transforming data sets from/to wide/long formats using tidyverse. There is usually a guest lecturer.
  • Lecture 6 - Advanced topics: This lecture is reserved for advanced topics such as introduction to machine learning, Docker, work with Python with reticulate package, other packages such as golem. There is usually a guest lecturer.
  • Lecture 7 - Presentations: This time students are the lecturers.

Outcomes

  • Students learn data analysis and communication of their analyses to relevant stakeholders
  • Students can code in R and use some of its most powerful packages to conduct and publish their analyses.
  • Students can use basic functions of Github and Github Pages (next Github Actions).
  • Students are introduced to reproducibility.
  • Students have a portfolio page (Progress Journal) to showcase in the future (e.g. in job applications).