Small Bakery, Big Data

I believe that small business are the biggest drivers of a country. They employ the most people after public services and they are the ones that most suffer by change.

A big company like Starbucks or Costa can deal with the volatility of market and usually have an army of people making sure the operations has as much insight as it can possible can.

What about a small bakery or coffee shop?

How can they know what is going on in their own business.

Luckily we can help, and we demonstrate now that in literally 15 minutes we can analyse a bakery's data and show some information.

We will be using the bakery transaction data from Kaggle, and with some inspiration from Derek Ren post on medium, as he said:

I believe even small business can utilise data to uncover business and customer insights.

Now let's dive in into the information.

Background and Defined Questions

Variables Description: This transaction datasets is really straightforward, it provides four information

Date: It includes transaction from 2016–10–30 to 2017–04–09, roughly 6 months transaction activities Time: When customer buy these items are recorded at the same time Transaction: Each transaction has a unique ID, altogether there were about 9k transactions over the 6 months Items: Specific name of the items, there are about 94 different products sold in the bakery


As usual, we defined three key question we would like to understand from our analysis

Business Performance: What are our core products that brought us most revenue? How is the growth of our core products? Traffic Insight: Is there peak pattern when certain pattern were heavily purchased around certain time?  Market Basket: What kind of products are more likely to be purchased together? What’s is our growth opportunity?

Business Performance

Since we only know the number of transaction in this dataset but not the revenue of the dataset, we could only look at how many times each products were purchased during the six months.

Top 12 Selling Products in Bakery

Overall, we could identify that coffee was sold most over the past six months. 5.4k cups of coffee were purchased, which account for 26.68% of the total transaction. Bread comes the second place (3.3k, 16.21%) and Tea was the third mostly purchased. (1.4k, 7%) The rest of the product transaction constructed a long tail.

Top 6 products in bakery

When we look at the sales of our core products by month, we find that most of them stay relatively uniform, most of them experienced a little bit drop between December and January and then the number climbed up little bit again. It could be that this is a festive period, or that there are less foot traffic due to weather. We need more information to determine the cause.

However, when we look at our second tier of core products (top 7–12), we find that some of the products performance is quite different. For example, Farm House (I assume is a bread type) and Medialuna (croissant) experienced an executive five-month decline. It is very hard to explain the decline without more information and context, we can consult with the bakery and ask more questions regarding recipes change or promotions that ended, etc, to determine the cause.

Traffic Insight

Secondly, I calculated the average sales each hour by each top product category to identify if there’s products that customers tend to on a specific time, so that we could set up specific in-store campaign to boost the sales, or create a loyalty scheme for customer that purchase certain items at those times.

Hourly transactions by product

Based on our visualisation, we find that:

Bread and Coffee seem to share similar pattern. Both products gained a peak around 10 am (Before work) and also a small peak around 2 pm (After Lunch Break).

Pastry shown one peak at 10 am as well. What’s more, there’s a small peak around 5 pm as well (after work before dinner).

Sandwich seems to a choice of lunch for many people. It reached its peak around 1pm and then fell down gradually.

Tea was chosen more often in the afternoon after lunch.

Hourly transactions by product

As we look through our second-tier products (7-12) , we find that morning peak (around 10 pm) and afternoon peak (2–3 pm) were quite prevalent across different product. Besides that, there are also some interesting points worth pointing out:

Hot chocolate demonstrated on peak around 6pm, which was quite different from other products. One hypothesis could be people would like to drink something to warm it up but don’t want drink with caffeine.

Market Basket Analysis

We analysed how many items were sold for each product and when they were sold. Finally we came to answer the question, which item is more likely to be purchased together with another item.

I won’t cover specific methodology of market basket analysis. For those who are interested in this techniques, feel free to one Kaggle kernel created by Xavier. Basically, there are three metrics evaluating the market basket:

Support: how frequently the item set appears in the data set..

Confidence: the percentage in which Y is bought with X.

Lift: how much X, Y are bought together more likely than X, Y are independent with each other

Market basket analysis against coffee

After calculating these three metrics of all different combination, we selected top 10 item sets by the order of lift, controlling the minimum threshold of support and confidence. We find that of all association rules, items are connected with coffee. The redder the circle is, the more likely these two items are purchases together, which indicates that toast and coffee are most likely to be bought together (lift = 1.47). The bigger the circle is, indicating that the set happened more frequently. Here, cake and coffee were most frequently bought together. (support = 0.054)

Coffee located in the centre of the association network is quite as what we expected since it takes 26.7% of the transaction. But besides that, If we exclude coffee in our analysis, will we find any interesting co-consumption pattern between other two product? Even though right now there’re not many transaction for them, but we could turn it into a growth opportunity?

After I exclude all of the coffee records, the association rules network looks more diversified even though we have to lower its level of support. And I find some interesting connection that may be worth further researching:

Salad + Extra Salami or Feta: People usually would like to personalise their salad recipe. By adding extra add-on, we could potentially increase the average price of salad.

Cookies + Alfajores + Juice: People who eat cookies/ alfajores, besides choosing coffee, would choose juice as their second choice.

Coke + Juice + Sandwiches: People who eat sandwiches would often choose juice or coke. Here we see a strong connection between food and beverages, a promotion can be made as meal deal.

Final Thoughts

There are obviously more topics we could explore on this datasets. And if we see this bakery business from a higher level, I think there are more information that needs to be included:

Price: price of each SKU, by joining the transaction data and price data, we could analyse the revenue of our product, and think about increasing our premium products.

Cost: from an operational and financial perspective, we also need to analyse the inventory and profitability of our products list. Probably there are some products we could cut down or include some new SKU.

Customer: to understand who are our target and most profitable customers, it would be ideal for us to know who bought what kind of products.

Stock: to understand if there is any product that repeatedly runs out of stock and try to predict when it will before it does.

Categories: Hot, cold, salty, sweet, etc, categories and segments can help understand trends.

This is just a simple exercise of what can be done with very simple data, and what can be achieved with this type of data. Next step could be to create a stream of data and dashboard that would show the entire insights of the transactions in real time.

For more information please contact us and we can drive change.

#unidata #unifyingdata #drivechange #datascience

Contact Us

Gloucester Terrace W2 6HP, LONDON, United Kingdom

  • White Facebook Icon
  • White LinkedIn Icon

© 2019 by UNI. Proudly created with