The following analyses are conducted based on the sales figures of an FMCG company.
Data shows us details such as invoice date,sales amount,purchased item,item categories etc.
It also includes info about customer characteristics like gender, age,address, segmentation, loyalty.
Processed data is 31.2 MB. There are approximately 494K rows with 15 attributes.
Here you may find the necessary packages. The references are indicated on the Citations.
library(dplyr) #[1]
library(tidyverse) #[2]
library(ggplot2) #[3]
library(scales) #[4]
library(readxl) #[5]
library(lubridate) #[6]
library(treemapify) #[7]
library(magrittr) #[8]
Excel file is downloaded from github to a local data frame (raw_data) and prepared for analysis.
Here the raw data structure can be seen in the manner of the number of variables, variable types and the number of observations.
There are 15 variables and 494.305 observations to be conducted.
project_raw_data <- read_excel("sales_data_v10.xlsx")
project_raw_data$SALES <- as.numeric(gsub(",", ".", gsub("\\.", "", project_raw_data$SALES)))
project_raw_data$UNIT_PRICE <- as.numeric(gsub(",", ".", gsub("\\.", "", project_raw_data$UNIT_PRICE)))
glimpse(project_raw_data)
## Observations: 494,305
## Variables: 15
## $ INVOICEID <dbl> 53927301, 53927301, 53927302, 53927302, 53927...
## $ ACCOUNTNUM <dbl> 23349830, 23349830, 10915900, 10915900, 10915...
## $ GENDER <chr> "M", "M", "F", "F", "F", "F", "F", "F", "F", ...
## $ AGE <dbl> 46, 46, 33, 33, 33, 33, 33, 33, 33, 33, 33, 3...
## $ LOA <dbl> 25, 25, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
## $ CITY <chr> "Uşak", "Uşak", "Kırklareli", "Kırklareli", "...
## $ Customer_Segment <chr> "E", "E", "B", "B", "B", "B", "B", "B", "B", ...
## $ BUSINESS <chr> "BEAUTY", "BEAUTY", "BEAUTY", "BEAUTY", "BEAU...
## $ CATEGORY_ <chr> "BODY", "TOILETRIES", "BODY", "TOILETRIES", "...
## $ SEGMENT_ <chr> "MY BODY", "STYLING", "HAND TREATMENT", "WASH...
## $ SALES <dbl> 223.400, 15.200, 0.758, 0.888, 0.758, 0.942, ...
## $ UNIT_PRICE <dbl> 22.340, 3.040, 0.758, 0.888, 0.758, 0.942, 0....
## $ QUANTITY <dbl> 10, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ `FIRST ORDER` <dttm> 2016-12-23, 2016-12-23, 2018-09-06, 2018-09-...
## $ INVOICE_DATE <dttm> 2018-09-29, 2018-09-29, 2018-09-29, 2018-09-...
dim(project_raw_data)
## [1] 494305 15
The first graph in following shows us the total sales for 3 business segments (Beauty, Fashion and Home).
Accordingly, the main business category of this company is ‘beauty’, since the majority of sales comes from this group. ‘Fashion’ has also small share in total sales, while ‘home’ share in total is negligible.
From A to J, there are 10 customer segments in total. The other output out of the first graph is the contribution amount of every single customer segment to total sales.
It is showed that, customers in D segment brings the highest contribution; while F and J segment follow it,respectively.
The second graph shows the average sales for each segment. Once we compare it with the previous one, it results in the fact that, the middle segments have the higher contribution to total sales.
However, J segment, which is the premium one, has the highest average sales among all. The reason is,the number of customers in this premium (J segment) is quite low.But their total sales amount is quite high. Therefore, the average sales of this J segment is the highest one .
sales_by_acc_loa <-
project_raw_data %>%
select(ACCOUNTNUM, LOA,Customer_Segment,SALES,BUSINESS) %>%
group_by(ACCOUNTNUM, LOA,Customer_Segment,BUSINESS) %>%
summarise(total_sales=sum(SALES))
glimpse(sales_by_acc_loa)
## Observations: 48,320
## Variables: 5
## $ ACCOUNTNUM <dbl> 997983, 998144, 998144, 998144, 998316, 99961...
## $ LOA <dbl> 150, 245, 245, 245, 152, 335, 334, 334, 335, ...
## $ Customer_Segment <chr> "D", "C", "C", "C", "D", "D", "F", "F", "C", ...
## $ BUSINESS <chr> "BEAUTY", "BEAUTY", "FASHION", "HOME", "BEAUT...
## $ total_sales <dbl> 28.934, 71.392, 13.676, 3.990, 36.236, 216.01...
total_sales_by_segment_bus <-
sales_by_acc_loa %>%
group_by(Customer_Segment, BUSINESS) %>%
summarise(total_segment_bus_sales=sum(total_sales))
total_sales_by_segment_bus %>%
ggplot(aes(x=Customer_Segment, y=total_segment_bus_sales, fill=BUSINESS )) +
geom_bar(stat="identity")+
labs( x="Customer Segment" , y = "Total sales")+
ggtitle("total sales based on Sales Representative Segment") +
theme(legend.position = "bottom", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 10))
avg_sales_by_segment_bus <-
sales_by_acc_loa %>%
group_by(Customer_Segment, BUSINESS) %>%
summarise(total_segment_bus_sales=mean(total_sales))
avg_sales_by_segment_bus %>%
ggplot(aes(x=Customer_Segment, y=total_segment_bus_sales, fill=BUSINESS )) +
geom_bar(stat="identity")+
labs( x="Customer Segment" , y = "Average Sales")+
ggtitle("average sales based on Sales Representative Segment") +
theme(legend.position = "bottom", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 10))
In the third graph, top 20 cities,based on their total sales amount are showed. Istabul,Bursa and Izmir are in top 3.
sales_by_city <-
project_raw_data %>%
group_by(CITY) %>%
summarise(total_sales=sum(SALES)) %>%
arrange(desc(total_sales)) %>%
slice(1:20)
glimpse(sales_by_city)
## Observations: 20
## Variables: 2
## $ CITY <chr> "İstanbul", "Bursa", "İzmir", "Ankara", "Gaziantep...
## $ total_sales <dbl> 789032.15, 386678.62, 301098.37, 257775.61, 169061...
sales_by_city %>%
ggplot(aes(x=reorder(CITY,total_sales), y=total_sales, fill=CITY )) +
geom_bar(stat="identity")+
coord_flip()+
labs( x="CITY" , y = "Total sales")+
ggtitle("total sales based on city") +
theme(legend.position = "none", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 1))
In the 4th graph, weekdays are desired to compared with respect to their sales contribution, distributon is given in a pie chart.
Saturday has slightly lower share in a week.
sales_per_day <-
project_raw_data %>%
select(INVOICE_DATE,SALES) %>%
mutate(INVOICE_DATE2=lubridate::as_date(ymd(INVOICE_DATE))) %>%
mutate(weekday=weekdays(INVOICE_DATE2)) %>%
group_by(weekday) %>%
summarise(avg_sale=mean(SALES)) %>%
mutate(day_order=c(5,6,3,7,1,4,2)) %>%
arrange(day_order)
glimpse(sales_per_day)
## Observations: 7
## Variables: 3
## $ weekday <chr> "Pazartesi", "Salı", "Çarşamba", "Perşembe", "Cuma",...
## $ avg_sale <dbl> 8.639990, 9.330272, 10.781722, 10.095326, 9.677520, ...
## $ day_order <dbl> 1, 2, 3, 4, 5, 6, 7
sales_per_day %>%
ggplot(aes(x=" ",y=avg_sale, fill=reorder(weekday,day_order))) +
geom_bar(stat="identity")+
coord_polar("y",start = 0)+
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank())
5th graph in below shows us the share of all these sub-categories in total sales. Bigger area indicates bigger share.
Accordingly, fragnance is the first, and it is followed by toiletres and color respectively.
categorized_sales <-
project_raw_data %>%
select(CATEGORY_, SALES) %>%
group_by(CATEGORY_) %>%
summarise(SALES = sum(SALES))
ggplot(categorized_sales, aes(area = SALES, fill = CATEGORY_, label = CATEGORY_)) +
geom_treemap() +
geom_treemap_text(fontface = "italic", colour = "white", place = "centre", grow = TRUE)+
theme(legend.position = "bottom")
[1]Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2018). dplyr: A Grammar of Data Manipulation. R package version 0.7.7.
https://CRAN.R-project.org/package=dplyr
[2]Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1.
https://CRAN.R-project.org/package=tidyverse
[3]H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
[4]Hadley Wickham (2018). scales: Scale Functions for Visualization. R package version 1.0.0.
https://CRAN.R-project.org/package=scale
[5] Hadley Wickham and Jennifer Bryan (2018). readxl: Read Excel Files. R package version 1.1.0.
https: //CRAN.R-project.org/package=readxl
[6]Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25.
http://www.jstatsoft.org/v40/i03/
[7]David Wilkins (2018). treemapify: Draw Treemaps in ‘ggplot2’. R package version 2.5.2.
https://CRAN.R-project.org/package=treemapify
[8]Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5.
https://CRAN.R-project.org/package=magrittr