In this case study you are going to explore university entrance examinations (YGS/LYS) data from 2017. You are going to work with your project groups. (If you don’t have a group, form one by adhering to the project guidelines.) Report your RMarkdown output html files at the group progress journals.

Brief: Suppose, MEF University management asks you to examine the data and provide insights that are useful to understand MEF University’s place among its competitors and in the undergraduate market. Our technical team cleaned up the data for you as best as they can (you can check the raw data from here). Data is provided with the following commands and necessary information can be found below. You should explicitly state your code and process with clear communication. Assume management knows a bit of R and would like to reproduce your work in case there is any problem with the calculations. The university is not interested in universities abroad (IDs that start with 3 or 4).

# The Data

# First set your working directory
setwd("~/myBDAgroup/case_1/")
# Download from GitHub (do it only once)
"osym_data_2017.RData")
# Install tidyverse if not already installed
if (!("tidyverse" %in% installed.packages())) {
install.packages("tidyverse", repos = "https://cran.r-project.org")
}
# Load tidyverse package
library(tidyverse)
# Load the data

Now that we loaded it, let’s see the data. Your data consists of undergraduate programs offered in 2017. Each program offers an availability (i.e. quota). Then students get placed according to their lists and their scores. Each program is filled with the students ranked by their scores until placements are equal to availability. Student placed to a a program with the highest score forms the maximum score of that program and the last student to be placed forms the minimum score.

Valedictorians (i.e. best students of each high school) are sometimes considered separately. Though valedictorian quota can be transferred to general availability if number of valedictorians asking for that program is fewer than valedictorian quota. (Hence for some programs general_placement > general_availability).

Now let’s see the data in detail.

glimpse(osym_data_2017)
## Observations: 11,031
## Variables: 14
## $program_id <int> 100110266, 100110487, 100110724, 100130252, ... ##$ university_name   <chr> "ABANT İZZET BAYSAL ÜNİVERSİTESİ", "ABANT İZ...
## $city <chr> "BOLU", "BOLU", "BOLU", "BOLU", "BOLU", "BOL... ##$ faculty_name      <chr> "Bolu Sağlık Yüksekokulu", "Bolu Turizm İşle...
## $program_name <chr> "Hemşirelik", "Gastronomi ve Mutfak Sanatlar... ##$ exam_type         <chr> "YGS_2", "YGS_4", "YGS_6", "YGS_6", "MF_3", ...
## $general_quota <int> 150, 60, 60, 60, 80, 1, 40, 60, 60, 80, 60, ... ##$ general_placement <int> 150, 60, 62, 26, 80, 1, 9, 62, 60, 81, 60, 7...
## $min_score <dbl> 328.8790, 346.4491, 225.7170, 199.2710, 446.... ##$ max_score         <dbl> 376.3817, 388.3141, 290.2683, 234.9510, 451....
## $val_quota <dbl> 4, 2, 2, 2, 2, 0, 1, 2, 2, 2, 2, 2, 2, 3, 2,... ##$ val_placement     <dbl> 4, 2, 0, 0, 2, 0, 0, 0, 2, 1, 2, 2, 2, 3, 1,...
## $val_min_score <dbl> 312.8462, 293.6994, 180.0000, 180.0000, 437.... ##$ val_max_score     <dbl> 328.0626, 328.7560, 180.0000, 180.0000, 442....
• program_id is the ID of the program. First 4 digits belong to the ID of the university. First digit defines the type of the university (1: State, 2: Private/Foundation, 3: Northern Cyprus, 4: Abroad).
• university_name is the name of the university.
• city is the city of the university. For NC and abroad universities, it might also include the country.
• faculty_name is the name of the faculty.
• program_name is the name of the program. Program name may include other information. (İÖ: İkinci Öğretim -evening education-, scholarship rates, MTOK: Mesleki ve Teknik Ortaöğretim Kurumları -occupational and technical high schools-, scholarship ratios and language of education). There might be typos and other irregularities.
• exam_type is the type of the set of exams students should take. MF (matematik/fen) stands for math/science, TM (turkce/matematik) stands for turkish/math, TS (turkce/sosyal) stands for turkish/social sciences and DIL stands for language studies. Numbers are sub categories. For instance both medicine (Tıp) and engineering belong to MF but medicine programs are MF-3 and engineering programs are MF-4. Similar programs tend to be in the same subgroup. Though there are many exceptions as well.
• general_quota is the quota for all students.
• general_placement is the number of placement for all students.
• min_score is the last student’s score that is placed to this program. Scores cannot be less than 180. If a program’s placement is 0 min score is updated to 180.
• max_score is the best student’s score that is placed to this program. Scores cannot be less than 180. If a program’s placement is 0 min score is updated to 180.
• val_ is the same but only for valedictorians.