Content
1.Price in US dollars ($326–$18,823)
2.Carat weight of the diamond (0.2–5.01)
3.Cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
4.Color diamond from J (worst) to D (best)
5.Clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
6.X length in mm (0–10.74)
7.Y width in mm (0–58.9)
8.Z depth in mm (0–31.8)
9.Depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
10.Table width of top of diamond relative to widest point (43–95)
1.We check data :
glimpse(data)
## Observations: 53,940
## Variables: 11
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, ...
## $ cut <fctr> Ideal, Premium, Good, Premium, Good, Very Good, Very ...
## $ color <fctr> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J,...
## $ clarity <fctr> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, S...
## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, ...
## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54...
## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339,...
## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, ...
## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, ...
## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, ...
2.Checking Statistical values for each column :
summary(data)
## X carat cut color
## Min. : 1 Min. :0.2000 Fair : 1610 D: 6775
## 1st Qu.:13486 1st Qu.:0.4000 Good : 4906 E: 9797
## Median :26971 Median :0.7000 Ideal :21551 F: 9542
## Mean :26971 Mean :0.7979 Premium :13791 G:11292
## 3rd Qu.:40455 3rd Qu.:1.0400 Very Good:12082 H: 8304
## Max. :53940 Max. :5.0100 I: 5422
## J: 2808
## clarity depth table price
## SI1 :13065 Min. :43.00 Min. :43.00 Min. : 326
## VS2 :12258 1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 950
## SI2 : 9194 Median :61.80 Median :57.00 Median : 2401
## VS1 : 8171 Mean :61.75 Mean :57.46 Mean : 3933
## VVS2 : 5066 3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5324
## VVS1 : 3655 Max. :79.00 Max. :95.00 Max. :18823
## (Other): 2531
## x y z
## Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 4.710 1st Qu.: 4.720 1st Qu.: 2.910
## Median : 5.700 Median : 5.710 Median : 3.530
## Mean : 5.731 Mean : 5.735 Mean : 3.539
## 3rd Qu.: 6.540 3rd Qu.: 6.540 3rd Qu.: 4.040
## Max. :10.740 Max. :58.900 Max. :31.800
##
qplot(carat, price, data=data, color=color, shape=cut)
qplot(log(carat), log(price),
data=data, color=clarity)
When we take log values of carat and price, we found “linear” plot. Price and carat variables are positively associated.
Facet function helps us to see numeric data together,
qplot(price, carat, data=data,
facets = . ~ color)
We will see color and clarity
qplot(price, carat, data=data,
facets = color ~ clarity)
qplot(cut, data=data, geom="bar")
# p<-ggplot(data=data, aes(x=cut, y=price)) +
# geom_bar(stat="identity", fill="steelblue")
#
# p
qplot(price, data=data, binwidth = 1000,
geom="histogram")
b1 <- ggplot(data, aes(x=color,fill=as.character(data$cut) ))+
theme(axis.text.x = element_text(angle = 60, hjust=1))+
geom_bar() +
labs(x="Diamonds Colors", y="Diamonds Price", fill="Cut")
b1