library(ggthemes)
library(Hmisc)
library(gridExtra)
library(GGally)
library(tibble)
library(dplyr)
library(tidyr)
library(reshape2)
library(tidyverse)
library(ggplot2)
library(corrgram)
library(corrplot)
library(date)
library(readxl)
library(knitr)
Setting theme
theme_set(theme_economist())
In this project, we are analyzing the BIST30 members as of November 2018. The data set consist of last price ad volume data of equities listed in BIST30 Index from January 2011 to November 2018. Firstly, we have examined the data structure and, made some little transformation in the columns of data frame.
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/gpj18-data-jugglers/blob/master/XU030_v2.xlsx?raw=true",destfile=tmp,mode='wb')
bist30=read_xlsx(tmp)
str(bist30)
Normalization: Before starting beta analysis, we should normalize the data as we take date 01.01.2011 equals to 100 for all stocks’ price.
tmp<-tempfile(fileext=".xlsx")
download.file("https://github.com/MEF-BDA503/gpj18-data-jugglers/blob/master/XU030_v2.xlsx?raw=true",destfile=tmp,mode='wb')
bist30=read_xlsx(tmp)
bist30 <- cbind(bist30, apply(bist30[seq(2,60,2)], 2, function (a) a / a[[1]] * 100))
colnames(bist30)[62:91] <- paste(colnames(bist30[seq(2,60,2)]), "norm", sep = "_")
head(bist30[62:90],10)
We will divide the data into two groups, financial sector companies, reel sector companies. Then we will compare two groups according to the annual yields, and illustrate them by graphics for each year.
Corrgram is a good source for correlation table, we can see the desired relations between two values in table by looking the
bistCorr30 <- bist30[2:60]
corrValuesTable <- cor(bist30[2:60],bist30[2:60])
corrgram(corrValuesTable,order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Correlation Data in Volume-Price Order")
Another correlation table for variables is provided by corrplot package. This is the visualization of it
corrplot(corrValuesTable, type = "upper", tl.pos = "td",
method = "circle", tl.cex = 0.5, tl.col = 'black',
order = "hclust", diag = FALSE)
Another correlation relation could be seen between normalized prices as below.
corrValuesPrices <- cor(bist30[62:90],bist30[62:90])
corrgram(corrValuesPrices,order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Correlation Data in Price Order")
And corrplot version of it
corrplot(corrValuesPrices, type = "upper", tl.pos = "td",
method = "circle", tl.cex = 0.5, tl.col = 'black',
order = "hclust", diag = FALSE)
We should investigate that is there any meaningful connection between volume values and price values. If there is, we can assuma some predictions about company stock’s volumes or prices from another.
First we need company names vector for table
companyNames <- c('GARAN','SAHOL','BIMAS','ARCLK','EREGL','YKBNK','TOASO','TUPRS','KRDMD','TTRAK','ASELS','IPEKE','KOZAA','ASYAB','ENKAI','KOZAL','DOHOL','IHLAS','KCHOL','EKGYO','TCELL','MGROS','SISE','ISCTR','VAKBN','AKBNK','PETKM','THYAO','TTKOM','HALKB')
After that, we can apply correlatio between all the prices and volumes.
df <- bist30
corGaran <- cor(df$GARAN_Price,df$GARAN_Volume)
corSahol <- cor(df$SAHOL_Price,df$SAHOL_Volume)
corBimas <- cor(df$BIMAS_Price,df$BIMAS_Volume)
corArclk <- cor(df$ARCLK_Price,df$ARCLK_Volume)
corEregl <- cor(df$EREGL_Price,df$EREGL_Volume)
corYkbnk <- cor(df$`YKBNK _Price`,df$`YKBNK _Volume`)
corToaso <- cor(df$TOASO_Price,df$TOASO_Volume)
corTuprs <- cor(df$TUPRS_Price,df$TUPRS_Volume)
corKrdmd <- cor(df$KRDMD_Price,df$KRDMD_Volume)
corTtrak <- cor(df$TTRAK_Price,df$TTKOM_Volume)
corAsels <- cor(df$ASELS_Price,df$ASELS_Volume)
corIpeke <- cor(df$IPEKE_Price,df$IPEKE_Volume)
corKozaa <- cor(df$KOZAA_Price,df$KOZAA_Volume)
corAsyab <- cor(df$ASYAB_Price,df$ASYAB_Volume)
corEnkai <- cor(df$ENKAI_Price,df$ENKAI_Volume)
corKozal <- cor(df$KOZAL_Price,df$KOZAL_Volume)
corDohol <- cor(df$DOHOL_Price,df$DOHOL_Volume)
corIhlas <- cor(df$IHLAS_Price,df$IHLAS_Volume)
corKchol <- cor(df$KCHOL_Price,df$KCHOL_Volume)
corEgkyo <- cor(df$EKGYO_Price,df$EKGYO_Volume)
corTcell <- cor(df$TCELL_Price,df$TCELL_Volume)
corMgros <- cor(df$MGROS_Price,df$MGROS_Volume)
corSise <- cor(df$SISE_Price,df$SISE_Volume)
corIsctr <- cor(df$ISCTR_Price,df$ISCTR_Volume)
corVakbn <- cor(df$VAKBN_Price,df$VAKBN_Volume)
corAkbnk <- cor(df$AKBNK_Price,df$AKBNK_Volume)
corPetkm <- cor(df$PETKM_Price,df$PETKM_Volume)
corThyao <- cor(df$THYAO_Price,df$THYAO_Volume)
corTtkom <- cor(df$TTKOM_Price,df$TTKOM_Volume)
corHalkb <- cor(df$HALKB_Price,df$HALKB_Volume)
corVec <- c(corGaran,corSahol,corBimas,corArclk,corEregl,corYkbnk,corToaso,corTuprs,corKrdmd,corTtrak,corAsels,corIpeke,corKozaa,corAsyab,corEnkai,corKozal,corDohol,corIhlas,corKchol,corEgkyo,corTcell,corMgros,corSise,corIsctr,corVakbn,corAkbnk,corPetkm,corThyao,corTtkom,corHalkb )
corrTable <- melt(data.frame(companyNames ,corVec))
## Using companyNames as id variables
corrTable$variable <- NULL
corrTable
CorVec variable keeps the correlation values between prices and volumes . There are negative and positive values. The absolute value of correlation means the relationship of these two variables are more predictable and more meaningfull for future actions. Thus, investigating these correlation values could help for regression values to seek which stocks are the best for prediction.
corrTable %>% arrange(desc(value)) %>%
ggplot( aes(x=companyNames, y=value)) +
geom_bar(stat="identity", width=.5, fill="tomato3") +
labs(title="Ordered Bar Chart",
subtitle="Correlation Values",
caption="Correlation graph") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
Arranging to absolute values
corrTable %>% arrange(desc(abs(value)))
And linear models for these stocks are
linearMod2_1<- lm(df$KRDMD_Price ~ df$KRDMD_Volume, data=df)
linearMod2_2 <- lm(df$ASELS_Price ~ df$ASELS_Volume, data=df)
linearMod2_3 <- lm(df$ASYAB_Price ~ df$ASYAB_Volume, data=df)
linearMod2_4 <- lm(df$TTKOM_Price ~ df$TTKOM_Volume, data=df)
linearMod2_5 <- lm(df$HALKB_Price ~ df$HALKB_Volume, data=df)
summary(linearMod2_1)
summary(linearMod2_2)
summary(linearMod2_3)
summary(linearMod2_4)
summary(linearMod2_5)
Formula Model 1: \(8.495e-01 + 1.624e-08 * X\)
Adjusted R-squared: \(0.4025\)
Formula Model 2: \(7.391e+00 + 5.014e-07 * X\)
Adjusted R-squared: \(0.2563\)
Formula Model 3: \(1.183e+00 + 2.132e-08 * X\)
Adjusted R-squared: \(0.1787\)
Formula Model 4: \(6.662e+00 + -2.370e-08 * X\)
Adjusted R-squared: \(0.1714\)
Formula Model 5: \(1.375e+01 + -6.544e-08 * X\)
Adjusted R-squared: \(0.1586\)
The adjusted R-squared values are really high for volume-price relationship of same companies’ stock. That’s why, we can say that, the volume and price predictions are not reliable and not greatly measurable.
We have divided the dataset into two groups as financial sector companies and real sector companies. Now we will check whether this classification is true or not by using k-means.
#we calculate the annual average of stocks.
annual_average<- aggregate(bist30[,62:91], list(format(bist30$Dates, "%Y")), mean)
View(annual_average)
#get transpose of annual_average dataset.
transpose <- as.matrix(annual_average[,-1])
rownames(transpose) <- annual_average[,1]
annual_average <- t(transpose)
View(annual_average)
#lets define wcss(Within cluster sum of squares) vector.
wcss <- vector()
#lets find the distance to the center for each predefined center numbers.
for (i in 1:7) wcss[i] <- sum(kmeans(annual_average, i)$withinss)
View(wcss)
#lets determine the breakdown point of the wcss graph.
plot(1:7, wcss, type="l")
#3 centers are the optimum number for centers.
kmeans(annual_average, 3)
Structuring the table for kmeans.
c1 <- companyNames
c2 <- c(3,3,2,2,2,3,2,2,2,2,1,3,3,3,3,3,3,3,2,3,3,3,2,3,3,3,2,2,3,3)
clusterFrame <- melt(data.frame(c1,c2))
## Using c1 as id variables
clusterFrame$variable <- NULL
names(clusterFrame) <- c("Companies","Groups")
Companies | Groups |
---|---|
GARAN | 3 |
SAHOL | 3 |
BIMAS | 2 |
ARCLK | 2 |
EREGL | 2 |
YKBNK | 3 |
TOASO | 2 |
TUPRS | 2 |
KRDMD | 2 |
TTRAK | 2 |
ASELS | 1 |
IPEKE | 3 |
KOZAA | 3 |
ASYAB | 3 |
ENKAI | 3 |
KOZAL | 3 |
DOHOL | 3 |
IHLAS | 3 |
KCHOL | 2 |
EKGYO | 3 |
TCELL | 3 |
MGROS | 3 |
SISE | 2 |
ISCTR | 3 |
VAKBN | 3 |
AKBNK | 3 |
PETKM | 2 |
THYAO | 2 |
TTKOM | 3 |
HALKB | 3 |
ggplot(clusterFrame, aes(x =Groups , y = seq(1,30,1), colour = Groups,fill = Groups)) + geom_text(label = clusterFrame$Companies) +
xlab(label = "Numbers of Groups") +
ylab(label = "Seperation according to Sequence")
\[\frac{betweenSS}{totalSS} = 87.5 \]
Consequently, we can assume Aselsan as an outlier, so the result of k-means match up with our first assumption (there are two groups in bist30 stock such as real sector and financial sector) with regards to group classification.
We can measure the change in momentums by looking last and first 3 months average differences. The differences between last and first can show us what the price momentum would be beneficial if we purchase these stocks at 2011.
df1 <- head(bist30,n= 90)
df2 <- tail(bist30,n= 90)
averageFirst <- df1[0:61] %>% summarise_at(names(df1[0:61]), mean, na.rm = TRUE) %>%
select(matches("price"))
averageLast <- df2[0:61] %>% summarise_at(names(df2[0:61]), mean, na.rm = TRUE) %>%
select(matches("price"))
momentums <- (averageLast - averageFirst) / averageFirst
momentums
First, we should combine the company names and momentums
df_momentums <- as.data.frame(t(momentums))
momentumTable <-melt(data.frame(companyNames ,df_momentums))
## Using companyNames as id variables
momentumTable$variable <- NULL
momentumTable$companyNames <- paste(momentumTable$companyNames,"Momentum",sep= "&")
momentumTable %>% arrange(desc(value))
Lets visualize this
momentumTable %>%
arrange(desc(value)) %>%
ggplot(data = ., aes(x = reorder(companyNames, value), y = value,
fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()
We can find the max and min values of prices and calculate the measure of it. This will show us the a possible income for investor how to reach a maxiumum profit.
maxVal <- bist30 %>% summarise_all("max") %>%
select(matches("norm"))
minVal <- bist30 %>% summarise_all("min") %>%
select(matches("norm"))
maxmin <- (maxVal-minVal) / minVal
maxmin
Then, we can turn this into a usable table
df_maxmin <- as.data.frame(t(maxmin))
maxminTable <-melt(data.frame(companyNames ,df_maxmin))
## Using companyNames as id variables
maxminTable$variable <- NULL
maxminTable$companyNames <- paste(maxminTable$companyNames,"MaxminRange",sep= "&")
maxminTable %>% arrange(desc(value))
Visualization of this
maxminTable %>%
arrange(desc(value)) %>%
ggplot(data = ., aes(x = reorder(companyNames, value), y = value,
fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()
The geometric average of volume and price can show us the economic value of companies. Thus, we can calculate this value and compare the companies by this.
df_volume <- bist30[0:61] %>% summarise_at(names(bist30[0:61]), mean, na.rm = TRUE) %>%
select(matches("Volume"))
df_prices <- bist30[0:61]%>% summarise_at(names(bist30[0:61]), mean, na.rm = TRUE) %>%
select(matches("Price"))
volumes <- as.data.frame(t(df_volume))
prices <- as.data.frame(t(df_prices))
economicValue <- data.frame(companyNames,sqrt(volumes*prices))
Visualization of this data frame
economicValue %>%
arrange(desc(V1)) %>%
ggplot(data = ., aes(x = reorder(companyNames, V1), y = V1,
fill = companyNames, color = companyNames)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(angle=65, vjust=0.6)) + coord_flip()
cor(economicValue,momentumTable)
Correlation: \(-0.09753649\)
The information about companies and their stocks’ purchase has a correlation nearly -0.1 which is too low to guess about it. That’s why, economic values of companies are not good parameter for price predictions.
First of all, we select the 6 banks from banking sector
names(bist30)
bankStocks <- c("GARAN_Price_norm","VAKBN_Price_norm","AKBNK_Price_norm","YKBNK _Price_norm","ASYAB_Price_norm","HALKB_Price_norm","ISCTR_Price_norm")
Selecting from bist30 data table as made below
newbist30 <- bist30[,62:91]
selectedComp1 <- newbist30[,bankStocks]
selectedComp2 <- newbist30[ , !(names(newbist30) %in% bankStocks)]
totalComp <- newbist30 %>% select(matches("norm"))
selectedComp2
Afer that, transpozing the vales help to summarise the values.
selectedComp2 <- as.data.frame(t(selectedComp2))
selectedComp1 <- as.data.frame(t(selectedComp1))
totalComp <- as.data.frame(t(totalComp))
names(selectedComp2) <- bist30$Dates
names(selectedComp1) <- bist30$Dates
names(totalComp) <- bist30$Dates
selectedComp1
As you can see we summarise the values of normalized prices
meanVal1 <- selectedComp1 %>% summarise_all(mean)
meanVal2 <- selectedComp2 %>% summarise_all(mean)
Totalmean <- totalComp %>% summarise_all(mean)
Then,naming and again transposing data to visualize in normal form
newValue1<- as.data.frame(t(meanVal1))
newValue2 <- as.data.frame(t(meanVal2))
totalValue <- as.data.frame(t(Totalmean))
names(newValue1) <- "Prices1"
names(newValue2) <- "Prices2"
names(totalValue) <- "TotalPrices"
Merging this data tables helps us to generate a mean table for sector and general companies.
CompanyPrices <-data.frame(bist30$Dates,newValue1$Prices1,newValue2$Prices2,totalValue$TotalPrices)
names(CompanyPrices) <- c("Dates","bankSectorPrices","OtherPrices","TotalPrices")
CompanyPrices
The normalized prices of groups to the years
ggplot(CompanyPrices, aes(x = Dates)) +
geom_line(aes(y = bankSectorPrices, colour = "bankSectorPrices")) +
geom_line(aes(y = OtherPrices, colour = "OtherPrices")) +
geom_line(aes(y = TotalPrices, colour = "TotalPrices")) +
xlab("Years") + ylab("Normalized Prices")
After all, banking sector is below of the other sectors by means of price values.
prices <- bist30[0:61] %>%
select(matches("price"))
volumes <- bist30[0:61] %>%
select(matches("Volume"))
priceVolumeProduct <- prices * volumes
Volume Sums
volumeSums <- volumes %>% summarise_at(names(volumes), mean, na.rm = TRUE)
TotalVolume <- rowSums(volumeSums)
Weigthed Average
weigthedAverage <- priceVolumeProduct / TotalVolume
weigthedAverage
Changing the names of columns
colnames(weigthedAverage) <- paste(companyNames,"wAvg",sep = "_")
weigthedAverage
Grouping according to sectors.
bankStocksAvg <- c(24,14,25,26,30,1,6)
In the below,we selected banking sectors and found the sums of groups. After that, collected them into a WavgTable data frame.
mainGroup <- weigthedAverage[,bankStocksAvg]
otherGroup <- weigthedAverage[ , c(-24,-14,-25,-26,-30,-1,-6)]
totalGroup <- weigthedAverage %>% select(matches("wAvg"))
mainGroup$Avg <- rowSums(mainGroup)
otherGroup$Avg <- rowSums(otherGroup)
totalGroup$Avg <- rowSums(totalGroup)
WAvgTable <- data.frame(bist30$Dates,mainGroup$Avg,otherGroup$Avg,totalGroup$Avg )
We can visualize table as before
ggplot(WAvgTable, aes(x = bist30.Dates)) +
geom_line(aes(y = mainGroup.Avg, colour = "mainGroup.Avg")) +
geom_line(aes(y = otherGroup.Avg, colour = "otherGroup.Avg")) +
geom_line(aes(y = totalGroup.Avg, colour = "totalGroup.Avg")) +
xlab("Years") + ylab("Weighted Prices")
Kmeans shows us which prices have highly momentum and which prices have lower momentum. Aselsan is the outlier because of its highly increase in price to the normalized value. Then, group 2 have high momenum then 3 group. We can deduce that, group 1 and group 2 stocks would be the best options for inverstors to get high income return.
ASELS and KRDMD have higher correlation with their volume which means they are related with their purchase willingleness of their stocks. We can assuma that, some inverstors predicted that their prices would get higher.
The price correlation shows that some investors can find relation between stocks which have highly momentum and which don’t require the highly but have correlation so that they predict others’ price changes. TUPRS & ASELS, ASELS & PETKM,EREGL & SISE,TOASO & KCHOL have higher momentum than average. EREGL has the a high momentum but SISE has not. However, some inverstor can predict SISE’s prices by EREGL and collect this stock in their portfolio.
The product of volume-price which is similary their geometric average could shows us the economic value of companies. For example, GARANTI BANK reached the most valuable BANK in 3. quarter in this year. This information shows the market share of companies similarly regarless of their price momentum. That’s why, investors use this information to predict price changes. However, If we look at
As it is mentioned, economic value of a company and the price changes are not correlated. That’s why, the information about these area not properly truthful for investment.
Another information is, TUPRAS A.S and other heavy industry companies have high economic value but stock market cannot show this value when we look at the geometric average, because there is a posibility that the rising companies market their values higher than normally high companies.
There are highly capable and successfull investors and other type investors. If higher investment capacity investors use their money in highly production companies, than TUPRAS result is not suprising. Another low capital investor just triggers the Volume and that’s why information about Garanti and others could be manuplative If we focus the price changes.
The banking sector is below the averages of other sectors such as heavy industry and similar company stocks. This is according to economic volume of companies and generally these banks belongs main companies which focuses on other sectors as KOC,SABANCI etc. These companies creates banks in order to supply their main job area. That’s why we cannot expect that the stocks of their banks could be higher than their heavy industry companies logically.