Content
1.Price in US dollars ($326–$18,823)
2.Carat weight of the diamond (0.2–5.01)
3.Cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
4.Color diamond from J (worst) to D (best)
5.Clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
6.X length in mm (0–10.74)
7.Y width in mm (0–58.9)
8.Z depth in mm (0–31.8)
9.Depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
10.Table width of top of diamond relative to widest point (43–95)
1.We check data :
glimpse(data)## Observations: 53,940
## Variables: 11
## $ X       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, ...
## $ cut     <fctr> Ideal, Premium, Good, Premium, Good, Very Good, Very ...
## $ color   <fctr> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J,...
## $ clarity <fctr> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, S...
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, ...
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54...
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339,...
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, ...
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, ...
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, ...2.Checking Statistical values for each column :
summary(data)##        X             carat               cut        color    
##  Min.   :    1   Min.   :0.2000   Fair     : 1610   D: 6775  
##  1st Qu.:13486   1st Qu.:0.4000   Good     : 4906   E: 9797  
##  Median :26971   Median :0.7000   Ideal    :21551   F: 9542  
##  Mean   :26971   Mean   :0.7979   Premium  :13791   G:11292  
##  3rd Qu.:40455   3rd Qu.:1.0400   Very Good:12082   H: 8304  
##  Max.   :53940   Max.   :5.0100                     I: 5422  
##                                                     J: 2808  
##     clarity          depth           table           price      
##  SI1    :13065   Min.   :43.00   Min.   :43.00   Min.   :  326  
##  VS2    :12258   1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950  
##  SI2    : 9194   Median :61.80   Median :57.00   Median : 2401  
##  VS1    : 8171   Mean   :61.75   Mean   :57.46   Mean   : 3933  
##  VVS2   : 5066   3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324  
##  VVS1   : 3655   Max.   :79.00   Max.   :95.00   Max.   :18823  
##  (Other): 2531                                                  
##        x                y                z         
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.710   1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.700   Median : 5.710   Median : 3.530  
##  Mean   : 5.731   Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :10.740   Max.   :58.900   Max.   :31.800  
## qplot(carat, price, data=data, color=color, shape=cut)qplot(log(carat), log(price),
data=data, color=clarity)When we take log values of carat and price, we found “linear” plot. Price and carat variables are positively associated.
Facet function helps us to see numeric data together,
qplot(price, carat, data=data,
facets = . ~ color)We will see color and clarity
qplot(price, carat, data=data,
facets = color ~ clarity)qplot(cut, data=data, geom="bar")# p<-ggplot(data=data, aes(x=cut, y=price)) +
#   geom_bar(stat="identity", fill="steelblue")
# 
# pqplot(price, data=data, binwidth = 1000,
geom="histogram")b1 <- ggplot(data, aes(x=color,fill=as.character(data$cut) ))+
  theme(axis.text.x = element_text(angle = 60, hjust=1))+
  geom_bar() +
  labs(x="Diamonds Colors", y="Diamonds Price", fill="Cut")
b1