Introduction

This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com.

Fields include

  1. Rank - Ranking of overall sales
  2. Name - The games name
  3. Platform - Platform of the games release (i.e. PC,PS4, etc.)
  4. Year - Year of the game’s release
  5. Genre - Genre of the game
  6. Publisher - Publisher of the game
  7. NA_Sales - Sales in North America (in millions)
  8. EU_Sales - Sales in Europe (in millions)
  9. JP_Sales - Sales in Japan (in millions)
  10. Other_Sales - Sales in the rest of the world (in millions)
  11. Global_Sales - Total worldwide sales.
  12. The script to scrape the data is available at https://github.com/GregorUT/vgchartzScrape. It is based on
  13. BeautifulSoup using Python. There are 16,598 records. 2 records were dropped due to incomplete information.
data = read.csv("vgsales.csv", header = TRUE)
str(data)
## 'data.frame':    16598 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : Factor w/ 11493 levels "'98 Koshien",..: 10991 9343 5532 10993 7370 9707 6648 10989 6651 2594 ...
##  $ Platform    : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
##  $ Year        : Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
##  $ Publisher   : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...

Data Visualizations

  1. Data Preparation for Visualization and library importing:

  2. Bar Visualization for Genre:

ggplot(data,  aes(x = factor(""), fill = Genre) ) +
  geom_bar()

  1. Pie Chart Visualization for Genre
ggplot(data,
       aes(x = factor(""), fill = Genre) ) +
  geom_bar() +
  coord_polar(theta = "y") +
  scale_x_discrete("")

4.Histogram for EU Sales Count

df <- ggplot(mstppEU)
ah<-aes(sumEU_Sales,colour=cut)
tit<-("EU Sales geom_histogram()")
gs <- geom_histogram(color="black", fill="darkblue", bins = 30)
df+ah+gs+ggtitle(tit)

  1. Releases of Video Games for Years
ggplot(data, aes(Year)) +
  geom_bar(fill = "grey") +
  xlab("Year") +
  ylab("Number of games") +
  ggtitle("Releases of Video Games for Years")

  1. EU_Sales and JP_Sales dot graphic display
ggplot(mstppEUandJP ,aes(sumEU_Sales ,sumJP_Sales))+geom_point() +
  xlab("EU_Sales")+ylab("JP_Sales") 

  1. Sales By Genre Of Last 3 Years
ggplot(mstppEUandJPFilter, aes(Year, fill= Genre ) ) +
  geom_bar(position="dodge")