Here is a first exploratory analysis of the competition dataset. We are provided with a list of real estate properties in three counties (Los Angeles, Orange and Ventura, California) data in 2016.
This is an early Exploratory Data Analysis for the Personalized Medicine: Redefining Cancer Treatment challenge. I will be using ggplot2 and the tidyverse tools to study and visualise the structures in the data.
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.
Taxis are plentiful and convenient in New York City, but the city is also served by a wide network of commuter bicycles (Citi Bikes). If you need to get from, say, the West Village to the Garment District, are you better off time-wise hailing a cab, or heading over to the nearest Citi Bike station? Data scientist Todd W. Schnieder crunched the number on travel times for both taxis and Citi Bikes to figure out which was better. Neither is universally the best, but for some trips taxis are most often the fastest, and for others bikes are faster. An interactive map (created with R) allows you to select the time of day and an origin neighborhood, and the map will then tell you the fraction of the time (according to the historical data) that a Citi Bike will outpace a taxi.
This book describes the process of analyzing data. The authors have extensive experience both managing data analysts and conducting their own data analyses, and this book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science. Printed copies are available through Lulu.