About Me

I plan to use my data science skills on some platforms as Kaggle or assist my AI skills at some job etc.. For this purpose, I simply work on Datacamp tutorials these days to accomplish a level at data science. Besides , I regularly go ‘Sarıyer Akademi AI lessons’ in Yaşar Kemal structure.

Summary of ‘Tidy forecasting in R’

In the video, forecasting is described as fable which is a not a real construction but tells us about future or reality. In this fable package in summary, all forecast models also are included in fable package. All fable packages produce mable class objects.Forecasting works on mable objects and produce fable objects. Fable could replace the hts package.Fable simplifies the model development process. This makes more simplier use of R in order to escape from severity of R interfaces.Especially, transformations and back-transformations are made easily in this package.

video link

Github Documents for Usage of Fable Package

Github is a nice place for coders to improve their skills or learn new tools for their programming language use. In this site, you can see R usages of Fable package and other examples of it. Time series is the main concept about Fable package and you can see in the files. ARIMA modelling is a tool of Fable package and all the objects are stored in tidy format.

github link

AI in R

There are several machine learning algorithms and every known algorithm finds a place in R programming language.For example, “KNN” k-nearest algorithm calculates eucledian distances of points and seperates the regions of points. Logistic Regression is a regression type but explores data as 1 and 0 so that appliers conclude binary data of data set. There are more AI examples in R language. These are:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Tree
  4. SVM
  5. Naive Bayes
  6. kNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boosting algorithms:
    • GBM
    • XGBoost
    • LightGBM
    • CatBoost

1.Linear Regression

It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation Y= a *X + b.

2.Linear Regression

It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, it is also known as logit regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

3.Decision Tree

It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible

4.SVM (Support Vector Machine)

It is a classification method. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

5. Naive Bayes

It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier would consider all of these properties to independently contribute to the probability that this fruit is an apple.

6. kNN (k- Nearest Neighbors)

It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function.

7. K-Means

t is a type of unsupervised algorithm which solves the clustering problem. Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups.

8. Random Forest

Random Forest is a trademark term for an ensemble of decision trees. In Random Forest, we’ve collection of decision trees (so known as “Forest”). To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

9. Dimensionality Reduction Algorithms

In the last 4-5 years, there has been an exponential increase in data capturing at every possible stages. Corporates/ Government Agencies/ Research organisations are not only coming with new sources but also they are capturing data in great detail.

These are the main algorihtms that could be explained.

link