STARBUCKS CAPSTONE

Rajath Nag Nagaraj (Raj)
11 min readJun 26, 2020

--

Starbucks Image Copyright Microsoft Design

This blog answers the key insights from the Starbucks datasets using the CRISP-DM approach.

Starbucks Corporation is an American multinational chain of coffeehouses and roaster reserves headquartered in Seattle, Washington. As the world’s largest coffeehouse chain, Starbucks is seen to be the main representation of the United States’ second wave of coffee culture. Since the 2000's, third wave coffee makers have targeted quality-minded coffee drinkers with hand-made coffee based on lighter roasts, while Starbucks nowadays uses automatic espresso machines for efficiency. As of early 2020, the company operates over 30,000 locations worldwide in more than 70 countries. Starbucks locations serve hot and cold drinks, whole-bean coffee, micro ground instant coffee known as VIA, espresso, Caffe latte, full- and loose-leaf teas including Teavana tea products, Evolution Fresh juices, Frappuccino beverages, La Boulange pastries, and snacks including items such as chips and crackers; some offerings (including their annual fall launch of the Pumpkin Spice Latte) are seasonal or specific to the locality of the store.

Project Overview:

Starbucks sends offers to its customers using mobile application, messages, emails etc. The types of offer include discounts, BOGO (Buy one get one free), International offer etc. The task is to find out how the customer responds to these offers.

The Main Goal of this project is to figure out if the customer will respond to the offer given to them.

This Blog will guide through the process of building a Machine Learning Model to archive this using the below steps:

Analyse the dataset by exploring them and visualizing.
Data Analysis using graphs.
Pre-processing the data.
Applying ML models for prediction.
Analyzing the models and understanding the results..

Every business firms always have certain number of questions that needs to addressed. The easier way to answer such questions are using related data collected and utilizing right data-mining techniques. The intention here is “You can have data without information, but you cannot have information without data.” — Daniel Keys Moran.

The answer to every question starts from understanding the business or the context behind the question that is exactly where Crisp-Dm methodology comes into picture, the process of Crisp-DM is as follows:

Crisp — Data-mining sequence

  1. Business Understanding: This process is all about understanding the outcome of this project. This stage is crucial than any other step in the process is because the analytics of the project could lead to major decisions.
  2. Data Understanding: This steps starts with collecting the data surrounding the question domain. Understanding the description behind each every dimension in data, describing the dataset is always included in this step to understand the spread of data i.e exploring the dataset with appropriate graphs, tables. Verifying the data quality ex: Addressing missing values, error in column data etc.
  3. Data Preparation: This step starts with merging the datasets if there are multiple data sources. Aggregating columns, handling the missing values based on the type of columns using mode, mean or even depending on the column importance dropping columns which may not be useful in analysis, creating dummy columns for categorical data.
  4. Modelling: Modelling technique that is being used in the model, listing all the techniques, assumptions and most importantly designing a model to test the results obtained.

The important Business question that will be answered from the datasets are:

1. Percentage of Customers spread across the gender.
2. Salary range of the customers.
3. Distribution of Male and female income of the customer base.
4. Which offer is more popular among which gender.
5. Amount of time taken for each gender to complete the offer given in days.
6. Which offer type is more popular among the gender and age.
7. Which gender like which kind of the offer most?
8. Finally predicting will the customer respond to the offer given to them.

Data Sets

The data is contained in three files:

portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
profile.json — demographic data for each customer
transcript.json — records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the dataset:

Step 1 : Understanding the dataset from business perspective:

portfolio.json
id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings)

profile.json
age (int) — age of the customer .
became_member_on (int) — date when customer created an app account.
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F).
id (str) — customer id.
income (float) — customer’s income.

transcript.json
event (str) — record description (i.e. transaction, offer received, offer viewed, etc.).
person (str) — customer id.
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record.

Step 2 & 3 : Data Understanding and Preparation:

Exploring and preparing the datasets:

  1. Portfolio dataset:

The dataset contains 6 columns and 10 rows

Portfolio dataset- Overview of the dataset.

The key aspects of the Portfolio dataset are:

  1. There are three offer types — Bogo (buy one get one free), international and discount.
  2. There were no NA columns or rows means there were no empty rows or columns.
  3. The channels were 3 in which the offers were made — Email, mobile and social media.

Data pre-processing steps of Portfolio dataset.

1.creating dummy variables for channels.
2. creating dummy variables for Offer Type.
3. Renaming Id to Offer Id.
4. Maping offer id to simple integer values from 1 to n.
5. Encoding Offer type into numerical values.

View of Portfolio dataset after pre-processing.

2. Profile Dateset.

The dataset contains 5 columns and 17000 rows in it.

View of the Profile dataset before wrangling.

The key aspects of the profile dataset.

  1. The percentage of male and female customers are 57% and 41% respectively.
Customer gender distribution in profile dataset.

As we observe the above graph it clearly indicates that Male customer base of Starbucks is higher when compared to the female customer base. The male gender constitutes 57%.

2. There were columns with Null values in them

Null value overview.

Data pre-processing of profile dataset for further analysis in the data set.

1.creating encoding for gender
2. Number of days since membership column.
3. Renaming Id to Customer ID.
4. Maps age to groups.
5. Create Income groups like Basic, Average, high.

View of Profile dataset after preprocessing.

3. Distribution of the age of customers.

Distribution of age of the customer base.

The distribution of age is symmetric about the center, most of customers are between 20 to 60 years of age and majority of them are in category 40 to 50 years of age.

4. Understanding the salary category of customer base.

The salary analysis of customer base.

From the above graph it is clear that income group 40000 to 50000$ visit the Starbucks often.

5. Distribution of Male and Female salaries:

The graphs show that the income for female and male lies in the range 20K-120K. The distribution also shows that the count of male is high in the range 30K-80K as compared to female being a little more uniform.

3. Transcript Dataset:

View into Transcript dataset.

The key aspects of the Transcript dataset.

  1. There were no rows or columns with null values.
  2. The count of each event type Transactions 138953, Offer received 762277, Offer completed 33579.

Processing of the dataset:

1. Renaming person to customer Id
2. creating columns for reward and amount.
3. Filtering out Transaction and offer received columns.
4. Dropping NA columns.

Combining the datasets:

  1. Transcript and Portfolio dataset were combined as combined data frame using offer id as key and left join.
  2. Later profile dataset was joined to merged dataset using customer id.

Analysis on the Merged data frame:

From the above visualization it is observed that Male customer base belong to basic salary range 30k to 40k salary range. The high salaried males are less in number.

In Female customer base its the average salary holders are leading the higher and lower salaries female base. It indicates Female with average salary more often visit the Starbucks.

Overall Male with basic salary (40 to 50k) visit most often, then comes the male with average salary and followed by the women with average and high salary category. Overall male population lead the number of visit.

The graphs indicate male lead in both viewing the offer and completing the offer more than Female gender. Male gender completes as offer twice the number of female.

Observing how long does each gender customer base take on average to complete the offer, small experiment showed that Male and female customers both complete in almost same number of days that is about 16 days.

It is clear from graph that young adults complete the offer quicker and then comes the teenagers who rush to complete the offer.

The above graph tells the popularity level of each offer type among the age groups, The overall Bogo offer is most popular among all the age groups. Next comes the discount both the teenagers and young adult like similar offers.

It is clear from the graph that Buy one Get on Free (BOGO) is the most popular offer that is liked by both male and female genders.

Step 4: Data Modelling:

Data Modeling : Applying Machine Learning Algorithm.

We need to split data into features and target labels, ONLY those features that we believe are important for our model to predict accurately.

Those features are as follows:

- time
- offer_id
- amount
- reward_x
- difficulty
- duration
- offer_type
- gender
- age_group
- income_range
- member_type
- Social
- Web
- age

Target columns are: Event

1 : offer completed
2 : offer viewed

  1. Application of Support Vector Machine Supervised learning algorithm to predict Event.

Test Accuracy of 98%.

As we observe the confusion matrix we see few False Negatives in prediction which is a not a good sign for a good model. So we need to explore models which could reduce the False negatives.

2. Decision Tree Algorithm:

The model looks really good there are no false negatives involving, and the test accuracy was 100%.

The above results show the accuracy of decision tree model and support vector machine supervised models. The ac curacies on the test set from decision tree model was 100% and svm was around 98%. Since our model was binary in nature if they view the offer or complete it based on features like age, gender, income range etc. Finally I would like to finalize Decision tree model for this offer prediction.

Observing the confusion matrix TPR is 100% and True negative rate is 0 which is very important for any model. Finally Decision tree model is finalized for further prediction.

Further Improvement:

Further this model features can be improved by including the NA values replaced with respective mean, mode values. In my case I have dropped for analysis. Further grid search can be implemented as the data keeps on coming which in turn could help in classification. Also further we could implement Neural Networks using tensor flow and predict more precisely which offer type will work for each individual.

Step 5: Conclusion:

Conclusion:

***This project started with an idea to analyse if the customer just views the offer or completes the given offer.***

The project notebook started with analyzing each every dataset individually, visualizing and finding the relation between various factors like gender, age, income towards the offer. Processing of the dataset took quiet a significant amount of time and effort. The steps included creating features for further analysis and converting the text items into dummy columns. The dimensions like age, income were mapped into groups for categorizing the offers or prediction.

Age was categorized into 4 groups:

1 : teenager
2 : young-adult
3 : adult
4 : elderly

income into 3 groups

1 : Basic
2 : Average
3 : High

The quick analysis on the datasets and the graphs:

1. income ranges from 30,000 and 115,000 with most of the customers.
2. it is observed that Male customer base belong to basic salary range 30k to 40k salary range. The high salaried males are less in number.
3. In Female customer base its the average salary holders are leading the higher and lower salaries female base. It indicates Female with average salary more often visit the Starbucks.
4. Overall Male with basic salary (40 to 50k) visit most often, then comes the male with average salary and followed by the women with average and high salary category. Overall male population lead the number of visit.
5. male lead in both viewing the offer and completing the offer more than Female gender. Male gender completes as offer twice the number of female.
6. Observing how long does each gender customer base take on average to complete the offer, small experiment showed that Male and female customers both complete in almost same number of days that is about ***16 days***.
7. young adults complete the offer quicker and then comes the teenagers who rush to complete the offer.
8. popularity level of each offer type among the age groups, The overall Bogo offer is most popular among all the age groups. Next comes the discount both the teenagers and young adult like similar offers.
9. The ac-curacies on the test set from decision tree model was 100% and svm was around 98%. Since our model was binary in nature if they view the offer or complete it based on features like age, gender, income range etc. Finally I would like to finalize Decision tree model for this offer prediction.

Take away from the project:

1. Data Processing skills.
2. Visualizing the dataset using graphs and pandas module.
3. Applying ML models for further prediction.
4. Storytelling from the project.

Further Improvement:

Application of ML models to predict the which offer type Bogo, discount etc which is best for each individual using tensor-flow deep learning model. Create a web application using HTML and CSS.

References:

  1. Udacity Data Science Nano-degree content and datasets.
  2. Starbucks for the dataset.
  3. https://en.wikipedia.org/wiki/Starbucks — Starbucks info.

4. https://medium.com/@najlaa.shariefi/starbucks-capstone-challenge-5cf9c97f532c

--

--

Rajath Nag Nagaraj (Raj)
Rajath Nag Nagaraj (Raj)

Written by Rajath Nag Nagaraj (Raj)

I go by "Raj" and I am a Senior Applied Gen-AI Scientist at a Fortune 100 Company.

No responses yet