Introduction to FMCG Company Sales Analysis World

The following analyses are conducted based on the sales figures of an FMCG company.
Data shows us details such as invoice date,sales amount,purchased item,item categories etc.
It also includes info about customer characteristics like gender, age,address, segmentation, loyalty.
Processed data is 31.2 MB. There are approximately 494K rows with 15 attributes.

Data Fields:

Analysis-Necessary Packages

Here you may find the necessary packages. The references are indicated on the Citations.

library(dplyr)                                #[1]
library(tidyverse)                            #[2]
library(ggplot2)                              #[3]
library(scales)                               #[4]
library(readxl)                               #[5]
library(lubridate)                            #[6]
library(treemapify)                           #[7]
library(magrittr)                             #[8]

Downloading the Data

Excel file is downloaded from github to a local data frame (raw_data) and prepared for analysis.
Here the raw data structure can be seen in the manner of the number of variables, variable types and the number of observations.
There are 15 variables and 494.305 observations to be conducted.

project_raw_data <- read_excel("sales_data_v10.xlsx")
project_raw_data$SALES <- as.numeric(gsub(",", ".", gsub("\\.", "", project_raw_data$SALES)))
project_raw_data$UNIT_PRICE <- as.numeric(gsub(",", ".", gsub("\\.", "", project_raw_data$UNIT_PRICE)))
glimpse(project_raw_data)
## Observations: 494,305
## Variables: 15
## $ INVOICEID        <dbl> 53927301, 53927301, 53927302, 53927302, 53927...
## $ ACCOUNTNUM       <dbl> 23349830, 23349830, 10915900, 10915900, 10915...
## $ GENDER           <chr> "M", "M", "F", "F", "F", "F", "F", "F", "F", ...
## $ AGE              <dbl> 46, 46, 33, 33, 33, 33, 33, 33, 33, 33, 33, 3...
## $ LOA              <dbl> 25, 25, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
## $ CITY             <chr> "Uşak", "Uşak", "Kırklareli", "Kırklareli", "...
## $ Customer_Segment <chr> "E", "E", "B", "B", "B", "B", "B", "B", "B", ...
## $ BUSINESS         <chr> "BEAUTY", "BEAUTY", "BEAUTY", "BEAUTY", "BEAU...
## $ CATEGORY_        <chr> "BODY", "TOILETRIES", "BODY", "TOILETRIES", "...
## $ SEGMENT_         <chr> "MY BODY", "STYLING", "HAND TREATMENT", "WASH...
## $ SALES            <dbl> 223.400, 15.200, 0.758, 0.888, 0.758, 0.942, ...
## $ UNIT_PRICE       <dbl> 22.340, 3.040, 0.758, 0.888, 0.758, 0.942, 0....
## $ QUANTITY         <dbl> 10, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ `FIRST ORDER`    <dttm> 2016-12-23, 2016-12-23, 2018-09-06, 2018-09-...
## $ INVOICE_DATE     <dttm> 2018-09-29, 2018-09-29, 2018-09-29, 2018-09-...
dim(project_raw_data)
## [1] 494305     15

Exploratory Data Analysis

1. Total Sales vs Customer Segments (1st Graph)

Beauty is more driver than the Fashion and Home. D segment is the highest for contribution to the total sales.

The first graph in following shows us the total sales for 3 business segments (Beauty, Fashion and Home).
Accordingly, the main business category of this company is ‘beauty’, since the majority of sales comes from this group. ‘Fashion’ has also small share in total sales, while ‘home’ share in total is negligible.

From A to J, there are 10 customer segments in total. The other output out of the first graph is the contribution amount of every single customer segment to total sales.

It is showed that, customers in D segment brings the highest contribution; while F and J segment follow it,respectively.

2. Average Sales vs Customer Segments (2nd Graph)

J segment is Atom-Ant. The highest average sales with lower sales contribution to the total sales.

The second graph shows the average sales for each segment. Once we compare it with the previous one, it results in the fact that, the middle segments have the higher contribution to total sales.

However, J segment, which is the premium one, has the highest average sales among all. The reason is,the number of customers in this premium (J segment) is quite low.But their total sales amount is quite high. Therefore, the average sales of this J segment is the highest one .

sales_by_acc_loa <- 
  project_raw_data %>% 
  select(ACCOUNTNUM, LOA,Customer_Segment,SALES,BUSINESS) %>%
  group_by(ACCOUNTNUM, LOA,Customer_Segment,BUSINESS) %>% 
  summarise(total_sales=sum(SALES))

glimpse(sales_by_acc_loa)
## Observations: 48,320
## Variables: 5
## $ ACCOUNTNUM       <dbl> 997983, 998144, 998144, 998144, 998316, 99961...
## $ LOA              <dbl> 150, 245, 245, 245, 152, 335, 334, 334, 335, ...
## $ Customer_Segment <chr> "D", "C", "C", "C", "D", "D", "F", "F", "C", ...
## $ BUSINESS         <chr> "BEAUTY", "BEAUTY", "FASHION", "HOME", "BEAUT...
## $ total_sales      <dbl> 28.934, 71.392, 13.676, 3.990, 36.236, 216.01...
total_sales_by_segment_bus <- 
  sales_by_acc_loa %>% 
  group_by(Customer_Segment, BUSINESS) %>% 
  summarise(total_segment_bus_sales=sum(total_sales))

total_sales_by_segment_bus %>% 
        ggplot(aes(x=Customer_Segment, y=total_segment_bus_sales, fill=BUSINESS )) + 
        geom_bar(stat="identity")+
        labs( x="Customer Segment" , y = "Total sales")+
        ggtitle("total sales based on Sales Representative Segment") +
        theme(legend.position = "bottom", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 10))

avg_sales_by_segment_bus <- 
  sales_by_acc_loa %>% 
  group_by(Customer_Segment, BUSINESS) %>% 
  summarise(total_segment_bus_sales=mean(total_sales))

avg_sales_by_segment_bus %>% 
        ggplot(aes(x=Customer_Segment, y=total_segment_bus_sales, fill=BUSINESS )) + 
        geom_bar(stat="identity")+
        labs( x="Customer Segment" , y = "Average Sales")+
        ggtitle("average sales based on Sales Representative Segment") +
        theme(legend.position = "bottom", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 10))

3. Total Sales based on the City (3rd Graph)

Everybody knows, Istanbul has huge market in Turkey. However, do not underestimate Bursa, please. Bursa is runner-up city.

In the third graph, top 20 cities,based on their total sales amount are showed. Istabul,Bursa and Izmir are in top 3.

sales_by_city <- 
        project_raw_data %>% 
        group_by(CITY) %>% 
        summarise(total_sales=sum(SALES)) %>% 
        arrange(desc(total_sales)) %>%
        slice(1:20)

glimpse(sales_by_city)
## Observations: 20
## Variables: 2
## $ CITY        <chr> "İstanbul", "Bursa", "İzmir", "Ankara", "Gaziantep...
## $ total_sales <dbl> 789032.15, 386678.62, 301098.37, 257775.61, 169061...
sales_by_city %>% 
                  ggplot(aes(x=reorder(CITY,total_sales), y=total_sales, fill=CITY )) + 
                  geom_bar(stat="identity")+
                  coord_flip()+
                  labs( x="CITY" , y = "Total sales")+
                  ggtitle("total sales based on city") +
                  theme(legend.position = "none", axis.text.x = element_text(angle = 0.0, vjust = 0.0, hjust = 0.0, size = 1))

5. Which Product Category is the Most Famous? (5th Graph)

The Oscar goes to the Fragrance . Fragrance is very important. If you think not, Please watch the movie “Perfume: The Story of a Murderer”

There are sub- categories in each of the business units.To illustrate,‘beauty’ business category includes 5 different sub-categories; such as:
  • body
  • toiletries
  • face
  • fragnance
  • color

5th graph in below shows us the share of all these sub-categories in total sales. Bigger area indicates bigger share.
Accordingly, fragnance is the first, and it is followed by toiletres and color respectively.

categorized_sales <-
  project_raw_data %>% 
  select(CATEGORY_, SALES) %>%
  group_by(CATEGORY_) %>%
  summarise(SALES = sum(SALES)) 

ggplot(categorized_sales, aes(area = SALES, fill = CATEGORY_, label = CATEGORY_)) +
  geom_treemap() +
  geom_treemap_text(fontface = "italic", colour = "white", place = "centre", grow = TRUE)+
  theme(legend.position = "bottom") 

CITIATIONS:

[1]Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2018). dplyr: A Grammar of Data Manipulation. R package version 0.7.7.
https://CRAN.R-project.org/package=dplyr

[2]Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1.
https://CRAN.R-project.org/package=tidyverse

[3]H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

[4]Hadley Wickham (2018). scales: Scale Functions for Visualization. R package version 1.0.0.
https://CRAN.R-project.org/package=scale

[5] Hadley Wickham and Jennifer Bryan (2018). readxl: Read Excel Files. R package version 1.1.0.
https: //CRAN.R-project.org/package=readxl

[6]Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25.
http://www.jstatsoft.org/v40/i03/

[7]David Wilkins (2018). treemapify: Draw Treemaps in ‘ggplot2’. R package version 2.5.2.
https://CRAN.R-project.org/package=treemapify

[8]Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5.
https://CRAN.R-project.org/package=magrittr