1) Needed Libraries

library(ggthemes)
library(Hmisc)
library(gridExtra)
library(GGally)
library(tibble)
library(dplyr)
library(tidyr)
library(reshape2)
library(tidyverse)
library(ggplot2)
library(corrgram)
library(corrplot)
library(date)
library(readxl)
library(knitr)

Setting theme

theme_set(theme_economist())

2) Data Preparations

In this project, we are analyzing the BIST30 members as of November 2018. The data set consist of last price ad volume data of equities listed in BIST30 Index from January 2011 to November 2018. Firstly, we have examined the data structure and, made some little transformation in the columns of data frame.

tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/gpj18-data-jugglers/blob/master/XU030_v2.xlsx?raw=true",destfile=tmp,mode='wb')

bist30=read_xlsx(tmp)
str(bist30)

3) Beta Analysis

Normalization: Before starting beta analysis, we should normalize the data as we take date 01.01.2011 equals to 100 for all stocks’ price.

tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/gpj18-data-jugglers/blob/master/XU030_v2.xlsx?raw=true",destfile=tmp,mode='wb')
bist30=read_xlsx(tmp)

bist30 <- cbind(bist30, apply(bist30[seq(2,60,2)], 2, function (a) a / a[[1]] * 100))

colnames(bist30)[62:91] <- paste(colnames(bist30[seq(2,60,2)]), "norm", sep = "_")

head(bist30[62:90],10)

We will divide the data into two groups, financial sector companies, reel sector companies. Then we will compare two groups according to the annual yields, and illustrate them by graphics for each year.

4) Correlation and Regression Analysis of Volume and Price

1) Correlation Tables for Normalized and Raw Data

Corrgram is a good source for correlation table, we can see the desired relations between two values in table by looking the

bistCorr30 <- bist30[2:60]
corrValuesTable <- cor(bist30[2:60],bist30[2:60])
corrgram(corrValuesTable,order=TRUE, lower.panel=panel.shade,
  upper.panel=panel.pie, text.panel=panel.txt,
  main="Correlation Data in Volume-Price Order")

Another correlation table for variables is provided by corrplot package. This is the visualization of it

corrplot(corrValuesTable, type = "upper", tl.pos = "td",
         method = "circle", tl.cex = 0.5, tl.col = 'black',
         order = "hclust", diag = FALSE)

Another correlation relation could be seen between normalized prices as below.

corrValuesPrices <- cor(bist30[62:90],bist30[62:90])
corrgram(corrValuesPrices,order=TRUE, lower.panel=panel.shade,
  upper.panel=panel.pie, text.panel=panel.txt,
  main="Correlation Data in Price Order")

And corrplot version of it

corrplot(corrValuesPrices, type = "upper", tl.pos = "td",
         method = "circle", tl.cex = 0.5, tl.col = 'black',
         order = "hclust", diag = FALSE)

2) The Correlation Values between Volume and Price

We should investigate that is there any meaningful connection between volume values and price values. If there is, we can assuma some predictions about company stock’s volumes or prices from another.

First we need company names vector for table

companyNames <- c('GARAN','SAHOL','BIMAS','ARCLK','EREGL','YKBNK','TOASO','TUPRS','KRDMD','TTRAK','ASELS','IPEKE','KOZAA','ASYAB','ENKAI','KOZAL','DOHOL','IHLAS','KCHOL','EKGYO','TCELL','MGROS','SISE','ISCTR','VAKBN','AKBNK','PETKM','THYAO','TTKOM','HALKB')

After that, we can apply correlatio between all the prices and volumes.

df <- bist30
corGaran <- cor(df$GARAN_Price,df$GARAN_Volume)
corSahol <- cor(df$SAHOL_Price,df$SAHOL_Volume)
corBimas <- cor(df$BIMAS_Price,df$BIMAS_Volume)
corArclk <- cor(df$ARCLK_Price,df$ARCLK_Volume)
corEregl <- cor(df$EREGL_Price,df$EREGL_Volume)
corYkbnk <- cor(df$`YKBNK _Price`,df$`YKBNK _Volume`)
corToaso <- cor(df$TOASO_Price,df$TOASO_Volume)
corTuprs <- cor(df$TUPRS_Price,df$TUPRS_Volume)
corKrdmd <- cor(df$KRDMD_Price,df$KRDMD_Volume)
corTtrak <- cor(df$TTRAK_Price,df$TTKOM_Volume)
corAsels <- cor(df$ASELS_Price,df$ASELS_Volume)
corIpeke <- cor(df$IPEKE_Price,df$IPEKE_Volume)
corKozaa <- cor(df$KOZAA_Price,df$KOZAA_Volume)
corAsyab <- cor(df$ASYAB_Price,df$ASYAB_Volume)
corEnkai <- cor(df$ENKAI_Price,df$ENKAI_Volume)
corKozal <- cor(df$KOZAL_Price,df$KOZAL_Volume)
corDohol <- cor(df$DOHOL_Price,df$DOHOL_Volume)
corIhlas <- cor(df$IHLAS_Price,df$IHLAS_Volume)
corKchol <- cor(df$KCHOL_Price,df$KCHOL_Volume)
corEgkyo <- cor(df$EKGYO_Price,df$EKGYO_Volume)
corTcell <- cor(df$TCELL_Price,df$TCELL_Volume)
corMgros <- cor(df$MGROS_Price,df$MGROS_Volume)
corSise <- cor(df$SISE_Price,df$SISE_Volume)
corIsctr <- cor(df$ISCTR_Price,df$ISCTR_Volume)
corVakbn <- cor(df$VAKBN_Price,df$VAKBN_Volume)
corAkbnk <- cor(df$AKBNK_Price,df$AKBNK_Volume)
corPetkm <- cor(df$PETKM_Price,df$PETKM_Volume)
corThyao <- cor(df$THYAO_Price,df$THYAO_Volume)
corTtkom <- cor(df$TTKOM_Price,df$TTKOM_Volume)
corHalkb <- cor(df$HALKB_Price,df$HALKB_Volume)

corVec <- c(corGaran,corSahol,corBimas,corArclk,corEregl,corYkbnk,corToaso,corTuprs,corKrdmd,corTtrak,corAsels,corIpeke,corKozaa,corAsyab,corEnkai,corKozal,corDohol,corIhlas,corKchol,corEgkyo,corTcell,corMgros,corSise,corIsctr,corVakbn,corAkbnk,corPetkm,corThyao,corTtkom,corHalkb )
corrTable <- melt(data.frame(companyNames ,corVec))
## Using companyNames as id variables
corrTable$variable <- NULL
corrTable

CorVec variable keeps the correlation values between prices and volumes . There are negative and positive values. The absolute value of correlation means the relationship of these two variables are more predictable and more meaningfull for future actions. Thus, investigating these correlation values could help for regression values to seek which stocks are the best for prediction.

corrTable %>% arrange(desc(value)) %>%
ggplot( aes(x=companyNames, y=value)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Ordered Bar Chart", 
       subtitle="Correlation Values", 
       caption="Correlation graph") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Arranging to absolute values

corrTable %>% arrange(desc(abs(value)))

3) General Template for Highly Correlated Values

ggpairs(df, 
        columns = c("KRDMD_Price","KRDMD_Volume","ASELS_Price","ASELS_Volume","ASYAB_Price","ASYAB_Volume","TTKOM_Price","TTKOM_Volume","HALKB_Price","HALKB_Volume"), 
        upper = list(continuous = wrap("cor", 
                                       size = 10)), 
        lower = list(continuous = "smooth"))

4) Highly Correlated Price Values

First, we need to see the relation of the price and volumes for each column.

flattenCorrMatrix <- function(cormat, pmat) {
  ut <- upper.tri(cormat)
  data.frame(
    row = rownames(cormat)[row(cormat)[ut]],
    column = rownames(cormat)[col(cormat)[ut]],
    cor  =(cormat)[ut],
    p = pmat[ut]
    )
}
corrData<-rcorr(as.matrix(df[,2:60]))
generalCorData <- flattenCorrMatrix(corrData$r, corrData$P)
generalCorData

generalCorData$Names <- paste(generalCorData$row,generalCorData$column,sep= "&")

Comparison of the results between each column correlation values to related price-volume correlation values

generalCorData %>% arrange(desc(abs(cor))) %>%
  filter(cor > 0.95) %>%
ggplot( aes(x=Names, y=cor)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Ordered Bar Chart", 
       subtitle="Correlation Values", 
       caption="Correlation graph") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

generalCorData %>% arrange(desc(abs(cor))) %>%
  filter(cor > 0.95) 

Also correlation with negative high values are preferable to include in portfolio

generalCorData %>% arrange(desc(cor)) %>%
  filter(cor < -0.72) %>%
ggplot( aes(x=Names, y=cor)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Ordered Bar Chart", 
       subtitle="Correlation Values", 
       caption="Correlation graph") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Let’s draw these relations with ggplot library

Result: So there is a highly correlation between some prices. Therefore, if there is relation, we can examine the predictions on these values.

ggpairs(df, 
        columns = c("ASYAB_Price","KCHOL_Price","BIMAS_Price","TTRAK_Price","TOASO_Price","IHLAS_Price"), 
        upper = list(continuous = wrap("cor", 
                                       size = 10)), 
        lower = list(continuous = "smooth"))

And the graph of negative correlation values.

ggpairs(df, 
        columns = c("ASELS_Price","PETKM_Price","EREGL_Price","SISE_Price","GARAN_Price","AKBNK_Price","TOASO_Price","KCHOL_Price","TUPRS_Price"), 
        upper = list(continuous = wrap("cor", 
                                       size = 10)), 
        lower = list(continuous = "smooth"))

5) Linear Models

1) Linear Regression Model for Mostly Correlated Volume-Price Values

We can analyze the first five company which have the highest correlation volume-price relation values of all.Thus, if we look at the correlation table of volume-price relation, we can see that:

KRDMD is the first one

ggplot(df, aes(df$KRDMD_Price, df$KRDMD_Volume)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASELS is the second one

ggplot(df, aes(df$ASELS_Price, df$ASELS_Volume)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASYAB is the third one

ggplot(df, aes(df$ASYAB_Price, df$ASYAB_Volume)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

TTKOM is the forth one

ggplot(df, aes(df$TTKOM_Price, df$TTKOM_Volume)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

HALKB is the fifth one

ggplot(df, aes(df$HALKB_Price, df$HALKB_Volume)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

2) Linear Regression Charts for These Highly Correlated Prices

TOASO & KCHOL is the first one

ggplot(df, aes(df$TOASO_Price, df$KCHOL_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

TUPRS & ASELS is the second one

ggplot(df, aes(df$TUPRS_Price, df$ASELS_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASELS & PETKM is the third one

ggplot(df, aes(df$ASELS_Price, df$PETKM_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

EREGL & SISE is the forth one

ggplot(df, aes(df$EREGL_Price, df$SISE_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

GARAN & AKBNK is the fifth one

ggplot(df, aes(df$GARAN_Price, df$AKBNK_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

3) Linear Regression Charts for These Highly Negative Correlated Prices

ASYAB & KCHOL is the first one

ggplot(df, aes(df$ASYAB_Price, df$KCHOL_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASYAB & BIMAS is the second one

ggplot(df, aes(df$ASYAB_Price, df$BIMAS_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASYAB & TTRAK is the third one

ggplot(df, aes(df$ASYAB_Price, df$TTRAK_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

ASYAB & TOASO is the forth one

ggplot(df, aes(df$ASYAB_Price, df$TOASO_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

TTRAK & IHLAS is the fifth one

ggplot(df, aes(df$TTRAK_Price, df$IHLAS_Price)) + 
  geom_count(color="darkred") + 
  geom_smooth(method="lm", se=F,color="blue")

4) Linear Models for These Highly Correlated Prices

With lm function we can see the relation between data as linear model

linearMod1_1 <- lm(df$TOASO_Price ~ df$KCHOL_Price, data=df)
linearMod1_2 <- lm(df$TUPRS_Price ~ df$ASELS_Price, data=df)
linearMod1_3 <- lm(df$ASELS_Price ~ df$PETKM_Price, data=df)
linearMod1_4 <- lm(df$EREGL_Price ~ df$SISE_Price, data=df)
linearMod1_5 <- lm(df$GARAN_Price ~ df$AKBNK_Price, data=df)
summary(linearMod1_1)
summary(linearMod1_2)
summary(linearMod1_3)
summary(linearMod1_4)
summary(linearMod1_5)

Formula Model 1: \(-7.17270 + 2.13500 * X\)

Adjusted R-squared: \(0.9331\)

Formula Model 2: \(36.75406 + 2.90503 * X\)

Adjusted R-squared: \(0.9238\)

Formula Model 3: \(-5.34984 + 5.29687 * X\)

Adjusted R-squared: \(0.9141\)

Formula Model 4: \(-2.34248 + 2.44400 * X\)

Adjusted R-squared: \(0.9108\)

Formula Model 5: \(-0.41338 + 1.08461 * X\)

Adjusted R-squared: \(0.9101\)

The Adjusted R-squared values are really high. That’s why, we can say that, the prediction made between prices are highly elective and measurable.

5) Linear Models for These Specific Stocks

And linear models for these stocks are

linearMod2_1<- lm(df$KRDMD_Price ~ df$KRDMD_Volume, data=df)
linearMod2_2 <- lm(df$ASELS_Price ~ df$ASELS_Volume, data=df)
linearMod2_3 <- lm(df$ASYAB_Price ~ df$ASYAB_Volume, data=df)
linearMod2_4 <- lm(df$TTKOM_Price ~ df$TTKOM_Volume, data=df)
linearMod2_5 <- lm(df$HALKB_Price ~ df$HALKB_Volume, data=df)
summary(linearMod2_1)
summary(linearMod2_2)
summary(linearMod2_3)
summary(linearMod2_4)
summary(linearMod2_5)

Formula Model 1: \(8.495e-01 + 1.624e-08 * X\)

Adjusted R-squared: \(0.4025\)

Formula Model 2: \(7.391e+00 + 5.014e-07 * X\)

Adjusted R-squared: \(0.2563\)

Formula Model 3: \(1.183e+00 + 2.132e-08 * X\)

Adjusted R-squared: \(0.1787\)

Formula Model 4: \(6.662e+00 + -2.370e-08 * X\)

Adjusted R-squared: \(0.1714\)

Formula Model 5: \(1.375e+01 + -6.544e-08 * X\)

Adjusted R-squared: \(0.1586\)

The adjusted R-squared values are really high for volume-price relationship of same companies’ stock. That’s why, we can say that, the volume and price predictions are not reliable and not greatly measurable.

6) K-Means Analysis

We have divided the dataset into two groups as financial sector companies and real sector companies. Now we will check whether this classification is true or not by using k-means.

#we calculate the annual average of stocks.
annual_average<- aggregate(bist30[,62:91], list(format(bist30$Dates, "%Y")), mean)
View(annual_average)

#get transpose of annual_average dataset. 
transpose <- as.matrix(annual_average[,-1])
rownames(transpose) <- annual_average[,1]
annual_average <- t(transpose)
View(annual_average)

#lets define wcss(Within cluster sum of squares) vector. 
wcss <- vector()

#lets find the distance to the center for each predefined center numbers.
for (i in 1:7) wcss[i] <- sum(kmeans(annual_average, i)$withinss)
View(wcss)

#lets determine the breakdown point of the wcss graph. 
plot(1:7, wcss, type="l")

#3 centers are the optimum number for centers.
kmeans(annual_average, 3)

6.1) Cluster Table

Structuring the table for kmeans.

c1 <- companyNames
c2 <- c(3,3,2,2,2,3,2,2,2,2,1,3,3,3,3,3,3,3,2,3,3,3,2,3,3,3,2,2,3,3)
clusterFrame <- melt(data.frame(c1,c2))
## Using c1 as id variables
clusterFrame$variable <- NULL
names(clusterFrame) <- c("Companies","Groups")
Kmeans Cluster Table
Companies Groups
GARAN 3
SAHOL 3
BIMAS 2
ARCLK 2
EREGL 2
YKBNK 3
TOASO 2
TUPRS 2
KRDMD 2
TTRAK 2
ASELS 1
IPEKE 3
KOZAA 3
ASYAB 3
ENKAI 3
KOZAL 3
DOHOL 3
IHLAS 3
KCHOL 2
EKGYO 3
TCELL 3
MGROS 3
SISE 2
ISCTR 3
VAKBN 3
AKBNK 3
PETKM 2
THYAO 2
TTKOM 3
HALKB 3
ggplot(clusterFrame, aes(x =Groups , y = seq(1,30,1), colour = Groups,fill = Groups)) + geom_text(label = clusterFrame$Companies) + 
  xlab(label = "Numbers of Groups") +
  ylab(label = "Seperation according to Sequence") 

6.2) Sum of Squares

\[\frac{betweenSS}{totalSS} = 87.5 \]

Consequently, we can assume Aselsan as an outlier, so the result of k-means match up with our first assumption (there are two groups in bist30 stock such as real sector and financial sector) with regards to group classification.

7) Comperasion of Company Stocks

7.1) Price Momentums

We can measure the change in momentums by looking last and first 3 months average differences. The differences between last and first can show us what the price momentum would be beneficial if we purchase these stocks at 2011.

df1 <- head(bist30,n= 90)
df2 <- tail(bist30,n= 90)

averageFirst <- df1[0:61] %>% summarise_at(names(df1[0:61]), mean, na.rm = TRUE) %>%
  select(matches("price")) 
averageLast <- df2[0:61] %>% summarise_at(names(df2[0:61]), mean, na.rm = TRUE) %>%
  select(matches("price"))

momentums <- (averageLast - averageFirst) / averageFirst
momentums

First, we should combine the company names and momentums

df_momentums <-  as.data.frame(t(momentums))
momentumTable <-melt(data.frame(companyNames ,df_momentums))
## Using companyNames as id variables
momentumTable$variable <- NULL
momentumTable$companyNames <- paste(momentumTable$companyNames,"Momentum",sep= "&")
momentumTable %>% arrange(desc(value))

Lets visualize this

momentumTable %>%
  arrange(desc(value)) %>%
  ggplot(data = ., aes(x = reorder(companyNames, value), y = value, 
    fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()

7.2) Possible Highest Income Measure

We can find the max and min values of prices and calculate the measure of it. This will show us the a possible income for investor how to reach a maxiumum profit.

maxVal <- bist30 %>% summarise_all("max") %>%
  select(matches("norm")) 

minVal <- bist30 %>% summarise_all("min") %>%
  select(matches("norm"))


maxmin <- (maxVal-minVal) / minVal
maxmin

Then, we can turn this into a usable table

df_maxmin <-  as.data.frame(t(maxmin))
maxminTable <-melt(data.frame(companyNames ,df_maxmin))
## Using companyNames as id variables
maxminTable$variable <- NULL
maxminTable$companyNames <- paste(maxminTable$companyNames,"MaxminRange",sep= "&")
maxminTable %>% arrange(desc(value))

Visualization of this

maxminTable %>%
  arrange(desc(value)) %>%
  ggplot(data = ., aes(x = reorder(companyNames, value), y = value, 
    fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()

7.3) Economic Volume of Companies

The geometric average of volume and price can show us the economic value of companies. Thus, we can calculate this value and compare the companies by this.

df_volume <- bist30[0:61] %>% summarise_at(names(bist30[0:61]), mean, na.rm = TRUE) %>%
  select(matches("Volume")) 

df_prices <- bist30[0:61]%>% summarise_at(names(bist30[0:61]), mean, na.rm = TRUE) %>%
  select(matches("Price")) 

volumes <- as.data.frame(t(df_volume))
prices <- as.data.frame(t(df_prices))
economicValue <- data.frame(companyNames,sqrt(volumes*prices))

Visualization of this data frame

economicValue %>%
  arrange(desc(V1)) %>%
  ggplot(data = ., aes(x = reorder(companyNames, V1), y = V1, 
    fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()

7.4) Correlation of Economic Value and Momentums

cor(economicValue,momentumTable)

Correlation: \(-0.09753649\)

The information about companies and their stocks’ purchase has a correlation nearly -0.1 which is too low to guess about it. That’s why, economic values of companies are not good parameter for price predictions.

8) General Comperasion of Two Sector Banks and Others

8.1) Basic Average

First of all, we select the 6 banks from banking sector

names(bist30)
bankStocks <- c("GARAN_Price_norm","VAKBN_Price_norm","AKBNK_Price_norm","YKBNK _Price_norm","ASYAB_Price_norm","HALKB_Price_norm","ISCTR_Price_norm")

Selecting from bist30 data table as made below

newbist30 <- bist30[,62:91]
selectedComp1 <- newbist30[,bankStocks]
selectedComp2 <- newbist30[ , !(names(newbist30) %in% bankStocks)]
totalComp <- newbist30 %>% select(matches("norm"))
selectedComp2

Afer that, transpozing the vales help to summarise the values.

selectedComp2 <-  as.data.frame(t(selectedComp2))
selectedComp1 <-  as.data.frame(t(selectedComp1))
totalComp <- as.data.frame(t(totalComp))
names(selectedComp2) <- bist30$Dates
names(selectedComp1) <- bist30$Dates
names(totalComp) <- bist30$Dates
selectedComp1

As you can see we summarise the values of normalized prices

meanVal1 <- selectedComp1 %>% summarise_all(mean) 
meanVal2 <- selectedComp2 %>% summarise_all(mean)
Totalmean <- totalComp %>% summarise_all(mean)

Then,naming and again transposing data to visualize in normal form

newValue1<- as.data.frame(t(meanVal1))
newValue2 <- as.data.frame(t(meanVal2))
totalValue <-  as.data.frame(t(Totalmean))
names(newValue1) <- "Prices1"
names(newValue2) <- "Prices2"
names(totalValue) <- "TotalPrices"

Merging this data tables helps us to generate a mean table for sector and general companies.

CompanyPrices <-data.frame(bist30$Dates,newValue1$Prices1,newValue2$Prices2,totalValue$TotalPrices)
names(CompanyPrices) <- c("Dates","bankSectorPrices","OtherPrices","TotalPrices")
CompanyPrices

The normalized prices of groups to the years

ggplot(CompanyPrices, aes(x = Dates)) + 
  geom_line(aes(y = bankSectorPrices, colour = "bankSectorPrices")) +
  geom_line(aes(y = OtherPrices, colour = "OtherPrices")) +
  geom_line(aes(y = TotalPrices, colour = "TotalPrices")) +
  xlab("Years") + ylab("Normalized Prices")

After all, banking sector is below of the other sectors by means of price values.

8.2) Weighted Average

prices <- bist30[0:61] %>%
  select(matches("price")) 
volumes <- bist30[0:61] %>%
  select(matches("Volume"))

priceVolumeProduct <- prices * volumes

Volume Sums

volumeSums <- volumes %>% summarise_at(names(volumes), mean, na.rm = TRUE) 
TotalVolume <- rowSums(volumeSums)

Weigthed Average

weigthedAverage <- priceVolumeProduct / TotalVolume
weigthedAverage

Changing the names of columns

colnames(weigthedAverage) <- paste(companyNames,"wAvg",sep = "_")

weigthedAverage

Grouping according to sectors.

bankStocksAvg <- c(24,14,25,26,30,1,6)

In the below,we selected banking sectors and found the sums of groups. After that, collected them into a WavgTable data frame.

mainGroup <- weigthedAverage[,bankStocksAvg]
otherGroup <- weigthedAverage[ , c(-24,-14,-25,-26,-30,-1,-6)]
totalGroup <- weigthedAverage %>% select(matches("wAvg"))

mainGroup$Avg <- rowSums(mainGroup)
otherGroup$Avg <- rowSums(otherGroup)
totalGroup$Avg <- rowSums(totalGroup)
WAvgTable <- data.frame(bist30$Dates,mainGroup$Avg,otherGroup$Avg,totalGroup$Avg )

We can visualize table as before

ggplot(WAvgTable, aes(x = bist30.Dates)) + 
  geom_line(aes(y = mainGroup.Avg, colour = "mainGroup.Avg")) +
  geom_line(aes(y = otherGroup.Avg, colour = "otherGroup.Avg")) +
  geom_line(aes(y = totalGroup.Avg, colour = "totalGroup.Avg")) +
  xlab("Years") + ylab("Weighted Prices")

9) Conclusion

  • Kmeans shows us which prices have highly momentum and which prices have lower momentum. Aselsan is the outlier because of its highly increase in price to the normalized value. Then, group 2 have high momenum then 3 group. We can deduce that, group 1 and group 2 stocks would be the best options for inverstors to get high income return.

  • ASELS and KRDMD have higher correlation with their volume which means they are related with their purchase willingleness of their stocks. We can assuma that, some inverstors predicted that their prices would get higher.

  • The price correlation shows that some investors can find relation between stocks which have highly momentum and which don’t require the highly but have correlation so that they predict others’ price changes. TUPRS & ASELS, ASELS & PETKM,EREGL & SISE,TOASO & KCHOL have higher momentum than average. EREGL has the a high momentum but SISE has not. However, some inverstor can predict SISE’s prices by EREGL and collect this stock in their portfolio.

  • The product of volume-price which is similary their geometric average could shows us the economic value of companies. For example, GARANTI BANK reached the most valuable BANK in 3. quarter in this year. This information shows the market share of companies similarly regarless of their price momentum. That’s why, investors use this information to predict price changes. However, If we look at

  • As it is mentioned, economic value of a company and the price changes are not correlated. That’s why, the information about these area not properly truthful for investment.

  • Another information is, TUPRAS A.S and other heavy industry companies have high economic value but stock market cannot show this value when we look at the geometric average, because there is a posibility that the rising companies market their values higher than normally high companies.

  • There are highly capable and successfull investors and other type investors. If higher investment capacity investors use their money in highly production companies, than TUPRAS result is not suprising. Another low capital investor just triggers the Volume and that’s why information about Garanti and others could be manuplative If we focus the price changes.

  • The banking sector is below the averages of other sectors such as heavy industry and similar company stocks. This is according to economic volume of companies and generally these banks belongs main companies which focuses on other sectors as KOC,SABANCI etc. These companies creates banks in order to supply their main job area. That’s why we cannot expect that the stocks of their banks could be higher than their heavy industry companies logically.

10) References

[1] - Correlation Graphs link

[2] - Correlation Graphs link

[3] - General Graphs link

[4] - Math formulas for R Markdown link

[5] - R Markdown Style link

[6] - Merging line plots link

[7] - Column drop options link

[8] - Summarising the total columns link

[9] - Row Summation link