Tiew Kee Hui's Blog

Association Rule Mining on the Extended Bakery dataset

January 30, 2017 | 48 Minute Read

Introduction

We used the Extended Bakery Dataset's 75,000 receipt data from apriori.zip which can be found at this website.

We also used the EB-build-goods.sql in order to convert the product ID to their names. The original file can be found here.

Objective

What is the domain and what are the potential benefits to be derived from association rule mining?


The domain of association rule mining is that it is a mining method that specialises in finding frequent patterns, associations, correlations or causal structures in the ExtendedBakery data set that is provided. With associative rule mining we can possible get to improve the inventory management, customer buying prediction and time related sales.

Association rules mining or what is sometimes referred to as 'Market Basket Analysis' is among the preeminent component used in data mining to find useful insights to a particular domain. It is a rule-based machine learning method designed to discover frequent co-occurring associations among a collection of items in transaction and even in relational databases. Normally, data produced in transactions are categorical(non-numeric) data which makes association rules mining a pertinent method because it handles these forms of data well when searching for interesting discoveries.

An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent. The strength of an association rule can be measured in terms of its support and confidence. Support is an indication of how frequently the items appear in the database. This is of interest due to the fact that if a rule is measured to be very low in support, it is likely to be uninteresting from a business perspective. For example, it may prove unprofitable to promote items that customers seldom buy together. Confidence, on the other hand indicates the number of times the if/then statements have been found to be true. It essentially measures the reliability of the inference made by a rule.

The classic example of the Beer and Diapers association that is often mentioned in data mining books. The association suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer. This can help retailers to learn about purchasing behavior of their customers. Such information can also be utilized to support a variety of business-related applications such as marketing promotions, inventory management, and customer relationship management.

Association mining has been broadly used in many application domains besides the business field in the last years the application areas have increased significantly. Some recent applications are the discovery of patterns in biological databases, extraction of knowledge from software engineering metrics and the optimization of user's profiles for web system personalization. An example would be a case for Walmart in 2004 when a series of hurricanes crossed the state of Florida. Walmart mined their massive retail transaction database to see what their customers really wanted to buy prior to the arrival of a hurricane. They found one particular item that increased in sales by a factor of 7 over normal shopping days. That was a huge Lift factor for a real-world case. That one item was not bottled water, or batteries, or beer, or flashlights, or generators, or any of the usual things that we might imagine. The item was strawberry pop tarts. Therefore, Walmart stocked their stores with tons of strawberry pop tarts prior to the next hurricanes, and they sold them out. That is a win-win: Walmart wins by making the sell, and customers win by getting the product that they most want.

Now that it has been established what association rules are and how it is utilised, we are able to continue on how would one would be able to apply this to the current Extended Bakery dataset. The question now comes to, what good may come from the relationships and rules that will be found? Overall, many growth inducing outcomes can occur from the gleaning and utilization of rules. First of all, this would result in the owners of the business being much more knowledgeable on the subject of their own business. After pondering this information, one can also form many courses of action based on solid theories backed by proof rather than on simple guesses. The business may increase its sales of specific items of their shop by displaying items with higher correlation together. This should boost the amount of sales of this set, as it increases awareness of the presence of the other item/s in the set. They can also choose sets of items that sell well together to offer as a much more desirable form of promotion, for example set meals and coupon discounts. This option is viable as it would draw more customers into the shop, and thus increase sales and awareness of the shop. In conclusion, using the association rules may result in higher sales and also increases awareness of the store, both of the potential customers and the owner's.

Association Rule Mining

Dataset description

There are 3 files in the 75000 receipt folder. All the files are in .csv format. First, there is the sparse vectors file. The file name is 75000-out1.csv. Each row starts with a receipt ID followed by the food ID of the pastries and/or drinks sold in that receipt.

The next file is the full binary vector file. The file name is 75000-out2.csv. There are 51 columns. The first column is the receipt ID and the following 50 columns signifies the absence or presence of the item on the given receipt. For example, the second column would be 1 to represent if food ID = 0 was bought on the given receipt. The third column would be 0 to represent if food ID = 1 was not bought on the given receipt and so on.

The last file is the items table file. The file name is 75000i.csv. There are 3 columns. The first column is the receipt ID. The second column is the quantity of item purchased. The third column is the food ID of the item purchased. For example, if receipt ID = 1 bought two of food ID = 2 and food ID = 4 each, there would be two rows. The format will be like:

1 2 2

1 2 4

Preprocessing

Let’s start with the main file, 75000-out1.csv. As stated above, each row starts with a receipt ID followed by the food ID of the pastries and/or drinks sold in that receipt.

We want to eliminate some rows. The rows which are redundant are rows where only has one item because there will be no significant rules that can be generated if the receipt only contains one item. If the row only has one purchase, then the third column (V3) will be NA because the first column (V1) would be the receipt ID and the second column (V2) would be the first item purchase. Therefore, all rows which has their third column (V3) as NA will be rows which only has one item purchased.

Besides that, since we do not want our association rule mining algorithm to consider the receipt ID as part of the transaction we have to remove the first column of the dataset.

library(plyr)
library(dplyr)
library(ggplot2)
library(arules)
library(plotly)

num <- max(count.fields("75000-out1.csv", sep = ","))
bakery <- read.table("75000-out1.csv", header = FALSE, sep = ",", col.names = paste0("V",1:num), fill = TRUE)
head(bakery)
##   V1 V2 V3 V4 V5 V6 V7 V8 V9

## 1  1 11 21 NA NA NA NA NA NA

## 2  2  7 11 37 45 NA NA NA NA

## 3  3  3 33 42 NA NA NA NA NA

## 4  4  5 12 17 47 NA NA NA NA

## 5  5  6 18 42 NA NA NA NA NA

## 6  6  2  4 34 NA NA NA NA NA
#Check which receipt only has one item

removedReceipts <- is.na(bakery$V3)
bakery <- bakery[!removedReceipts, ]
head(bakery)
##   V1 V2 V3 V4 V5 V6 V7 V8 V9

## 1  1 11 21 NA NA NA NA NA NA

## 2  2  7 11 37 45 NA NA NA NA

## 3  3  3 33 42 NA NA NA NA NA

## 4  4  5 12 17 47 NA NA NA NA

## 5  5  6 18 42 NA NA NA NA NA

## 6  6  2  4 34 NA NA NA NA NA
#Removing the first column (Receipt ID)

bakery <- bakery[2:num]
head(bakery)
##   V2 V3 V4 V5 V6 V7 V8 V9

## 1 11 21 NA NA NA NA NA NA

## 2  7 11 37 45 NA NA NA NA

## 3  3 33 42 NA NA NA NA NA

## 4  5 12 17 47 NA NA NA NA

## 5  6 18 42 NA NA NA NA NA

## 6  2  4 34 NA NA NA NA NA
nrow(bakery)
## [1] 71408

From 75000 initial transactions, now we have 71408 transactions left.

It would be quite meaningless to look at rules based on only their food ID. It would be quite hard to decipher rules like 1 -> 20 because looking at the food ID only instead of the food name will tell you nothing. Therefore, all the food ID should be converted to their names. To do so, we have to use the EB-build-goods.sql file to find out which food ID corresponds to which food name. The file can be opened in any text editors. The contents of the file are SQL statements such as:

insert into goods values (0,‘Chocolate’,‘Cake’,8.95,‘Food’);

We want to convert this into a .csv file to load it into R to perform the conversion. We cleaned up all the SQL statements using the Replace function in Sublime Text 3 so that each row is in the following format:

0,Chocolate Cake

We then save that file as food.csv. The .csv file can be found here. After converting all the food ID to food name, it is saved into a new file called foodUpdated.csv

food <- read.csv('food.csv', header = FALSE, col.names = c('FoodId', 'FoodName'), sep = ',')

#Change the food ID to the food name

newBakery <- data.frame(lapply(bakery, function(x) {x <-  food$FoodName[match(x, food$FoodId)]}))

write.table(newBakery, 'foodUpdated.csv', sep = ',', col.names = FALSE, row.names = FALSE, na = '')

head(newBakery)
##                 V2               V3                  V4

## 1        Apple Pie   Ganache Cookie                <NA>

## 2    Coffee Eclair        Apple Pie        Almond Twist

## 3       Opera Cake Cheese Croissant        Orange Juice

## 4     Truffle Cake       Apple Tart      Chocolate Tart

## 5 Chocolate Eclair      Cherry Tart        Orange Juice

## 6      Casino Cake  Strawberry Cake Chocolate Croissant

##                    V5   V6   V7   V8   V9

## 1                <NA> <NA> <NA> <NA> <NA>

## 2          Hot Coffee <NA> <NA> <NA> <NA>

## 3                <NA> <NA> <NA> <NA> <NA>

## 4 Vanilla Frappuccino <NA> <NA> <NA> <NA>

## 5                <NA> <NA> <NA> <NA> <NA>

## 6                <NA> <NA> <NA> <NA> <NA>

Rule Mining Process

Since the dataset is still quite small, the Apriori algorithm would be enough to generate our rules rather than the FP tree. We want to generate rules with somewhat high confidence and applicable to a small crowd of people. The parameters for our algorithm is a minimum support of 0.02 (1428 receipts) and a minimum confidence of 0.5. We set the minimum length of the rule to be 2 to ensure that there are at least 2 items involved for all the rules generated. The algorithm generated 116 rules and completed all the processing in 0.07s in our local computer.

#Read it as a transaction file

transactions <- read.transactions("foodUpdated.csv", sep=",")

rules <- apriori(transactions, parameter = list(sup = 0.02, conf = 0.5, minlen = 2, target="rules"))
## Apriori

## 

## Parameter specification:

##  confidence minval smax arem  aval originalSupport maxtime support minlen

##         0.5    0.1    1 none FALSE            TRUE       5    0.02      2

##  maxlen target   ext

##      10  rules FALSE

## 

## Algorithmic control:

##  filter tree heap memopt load sort verbose

##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE

## 

## Absolute minimum support count: 1428 

## 

## set item appearances ...[0 item(s)] done [0.00s].

## set transactions ...[50 item(s), 71408 transaction(s)] done [0.02s].

## sorting and recoding items ... [50 item(s)] done [0.00s].

## creating transaction tree ... done [0.03s].

## checking subsets of size 1 2 3 4 5 done [0.01s].

## writing ... [116 rule(s)] done [0.00s].

## creating S4 object  ... done [0.01s].

Seems like there are 116 rules generated. Now, to remove redundant rules (if any).

#Remove redundant rules (if any)

redundant <- is.redundant(rules, measure = "confidence")
rules <- rules[!redundant]
inspect(sort(rules, by = "lift"))
##       lhs                      rhs                      support confidence      lift

## [1]   {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02177627  1.0000000 14.281600

## [2]   {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02177627  1.0000000 14.250249

## [3]   {Green Tea,                                                                   

##        Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02179028  0.9974359 14.244981

## [4]   {Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02684573  0.9973985 14.244447

## [5]   {Apple Danish,                                                                

##        Apple Tart,                                                                  

##        Cherry Soda}         => {Apple Croissant}     0.02162223  0.9929260 14.228951

## [6]   {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02177627  0.9980745 14.222810

## [7]   {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Lemon Lemonade}      => {Raspberry Cookie}    0.02177627  0.9942455 14.199417

## [8]   {Green Tea,                                                                   

##        Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02179028  0.9942492 14.168299

## [9]   {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Lemon Lemonade}      => {Raspberry Lemonade}  0.02177627  0.9942455 14.168247

## [10]  {Green Tea,                                                                   

##        Lemon Lemonade,                                                              

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02177627  0.9993573 14.161958

## [11]  {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02177627  0.9904459 14.145152

## [12]  {Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02684573  0.9922360 14.139611

## [13]  {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02177627  1.0000000 14.129007

## [14]  {Green Tea,                                                                   

##        Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02177627  0.9967949 14.125646

## [15]  {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Raspberry Cookie}    => {Lemon Lemonade}      0.02177627  0.9980745 14.101801

## [16]  {Green Tea,                                                                   

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02177627  0.9942455 14.089519

## [17]  {Green Tea,                                                                   

##        Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Lemon Cookie}        0.02177627  0.9936102 14.080516

## [18]  {Apple Croissant,                                                             

##        Apple Tart,                                                                  

##        Cherry Soda}         => {Apple Danish}        0.02162223  0.9910141 14.077250

## [19]  {Lemon Lemonade,                                                              

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02684573  0.9927499 14.068324

## [20]  {Lemon Cookie,                                                                

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02684573  0.9953271 14.062983

## [21]  {Green Tea,                                                                   

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02179028  0.9948849 14.056736

## [22]  {Apple Croissant,                                                             

##        Apple Danish,                                                                

##        Cherry Soda}         => {Apple Tart}          0.02162223  0.9897436 14.017376

## [23]  {Green Tea,                                                                   

##        Lemon Cookie,                                                                

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02177627  0.9904459 13.994016

## [24]  {Lemon Cookie,                                                                

##        Lemon Lemonade}      => {Raspberry Cookie}    0.02705579  0.9261745 13.227254

## [25]  {Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02704179  0.9248084 13.207744

## [26]  {Lemon Cookie,                                                                

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02697177  0.9228558 13.179857

## [27]  {Lemon Cookie,                                                                

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02697177  0.9228558 13.150925

## [28]  {Lemon Cookie,                                                                

##        Lemon Lemonade}      => {Raspberry Lemonade}  0.02691575  0.9213806 13.129904

## [29]  {Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02697177  0.9264069 13.128173

## [30]  {Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02704179  0.9288119 13.123190

## [31]  {Apple Danish,                                                                

##        Apple Tart}          => {Apple Croissant}     0.02678972  0.9157492 13.122981

## [32]  {Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02704179  0.9190861 13.097207

## [33]  {Lemon Cookie,                                                                

##        Raspberry Cookie}    => {Lemon Lemonade}      0.02705579  0.9257307 13.079655

## [34]  {Apple Tart,                                                                  

##        Cherry Soda}         => {Apple Croissant}     0.02181828  0.9116442 13.064156

## [35]  {Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02691575  0.9204981 13.044439

## [36]  {Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Lemon Cookie}        0.02705579  0.9195621 13.031175

## [37]  {Apple Croissant,                                                             

##        Apple Tart}          => {Apple Danish}        0.02678972  0.9161877 13.014349

## [38]  {Lemon Cookie,                                                                

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02691575  0.9209391 13.011955

## [39]  {Green Tea,                                                                   

##        Raspberry Lemonade}  => {Raspberry Cookie}    0.02190231  0.9098313 12.993847

## [40]  {Apple Croissant,                                                             

##        Apple Danish}        => {Apple Tart}          0.02678972  0.9144359 12.950822

## [41]  {Green Tea,                                                                   

##        Raspberry Lemonade}  => {Lemon Cookie}        0.02198633  0.9133217 12.942742

## [42]  {Green Tea,                                                                   

##        Lemon Lemonade}      => {Raspberry Cookie}    0.02191631  0.9061957 12.941925

## [43]  {Apple Tart,                                                                  

##        Cherry Soda}         => {Apple Danish}        0.02177627  0.9098888 12.924874

## [44]  {Green Tea,                                                                   

##        Lemon Cookie}        => {Raspberry Lemonade}  0.02198633  0.9069902 12.924836

## [45]  {Apple Croissant,                                                             

##        Cherry Soda}         => {Apple Danish}        0.02184629  0.9085614 12.906019

## [46]  {Apple Danish,                                                                

##        Cherry Soda}         => {Apple Croissant}     0.02184629  0.8991354 12.884901

## [47]  {Green Tea,                                                                   

##        Lemon Lemonade}      => {Raspberry Lemonade}  0.02184629  0.9033005 12.872258

## [48]  {Green Tea,                                                                   

##        Lemon Cookie}        => {Raspberry Cookie}    0.02181828  0.9000578 12.854265

## [49]  {Apple Croissant,                                                             

##        Cherry Soda}         => {Apple Tart}          0.02181828  0.9073966 12.851126

## [50]  {Green Tea,                                                                   

##        Raspberry Cookie}    => {Raspberry Lemonade}  0.02190231  0.9009217 12.838358

## [51]  {Green Tea,                                                                   

##        Lemon Lemonade}      => {Lemon Cookie}        0.02190231  0.9056167 12.833553

## [52]  {Green Tea,                                                                   

##        Raspberry Lemonade}  => {Lemon Lemonade}      0.02184629  0.9075044 12.822135

## [53]  {Green Tea,                                                                   

##        Lemon Cookie}        => {Lemon Lemonade}      0.02190231  0.9035240 12.765896

## [54]  {Green Tea,                                                                   

##        Raspberry Cookie}    => {Lemon Lemonade}      0.02191631  0.9014977 12.737267

## [55]  {Green Tea,                                                                   

##        Raspberry Cookie}    => {Lemon Cookie}        0.02181828  0.8974654 12.718042

## [56]  {Apple Danish,                                                                

##        Cherry Soda}         => {Apple Tart}          0.02177627  0.8962536 12.693312

## [57]  {Apple Croissant,                                                             

##        Apple Danish,                                                                

##        Apple Tart}          => {Cherry Soda}         0.02162223  0.8071093 12.589353

## [58]  {Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Green Tea}           0.02177627  0.8111633 12.551148

## [59]  {Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Green Tea}           0.02177627  0.8090531 12.518497

## [60]  {Lemon Cookie,                                                                

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Green Tea}           0.02177627  0.8073728 12.492498

## [61]  {Lemon Lemonade,                                                              

##        Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Green Tea}           0.02179028  0.8058001 12.468163

## [62]  {Lemon Cookie,                                                                

##        Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Green Tea}           0.02177627  0.8048654 12.453701

## [63]  {Apple Pie,                                                                   

##        Coffee Eclair,                                                               

##        Hot Coffee}          => {Almond Twist}        0.02932445  0.9938301 12.385239

## [64]  {Almond Twist,                                                                

##        Coffee Eclair,                                                               

##        Hot Coffee}          => {Apple Pie}           0.02932445  0.9928876 12.375654

## [65]  {Vanilla Frappuccino,                                                         

##        Walnut Cookie}       => {Chocolate Tart}      0.02810609  0.9396067 12.252637

## [66]  {Coffee Eclair,                                                               

##        Single Espresso}     => {Blackberry Tart}     0.02856823  0.9222423 11.691014

## [67]  {Chocolate Tart,                                                              

##        Walnut Cookie}       => {Vanilla Frappuccino} 0.02810609  0.9369748 11.646213

## [68]  {Raspberry Cookie,                                                            

##        Raspberry Lemonade}  => {Green Tea}           0.02190231  0.7522848 11.640119

## [69]  {Lemon Cookie,                                                                

##        Raspberry Lemonade}  => {Green Tea}           0.02198633  0.7522760 11.639984

## [70]  {Apple Croissant,                                                             

##        Apple Tart}          => {Cherry Soda}         0.02181828  0.7461686 11.638796

## [71]  {Apple Croissant,                                                             

##        Apple Danish}        => {Cherry Soda}         0.02184629  0.7456979 11.631454

## [72]  {Apple Danish,                                                                

##        Apple Tart}          => {Cherry Soda}         0.02177627  0.7443753 11.610824

## [73]  {Lemon Cookie,                                                                

##        Lemon Lemonade}      => {Green Tea}           0.02190231  0.7497603 11.601058

## [74]  {Lemon Lemonade,                                                              

##        Raspberry Lemonade}  => {Green Tea}           0.02184629  0.7471264 11.560304

## [75]  {Lemon Cookie,                                                                

##        Raspberry Cookie}    => {Green Tea}           0.02181828  0.7465261 11.551016

## [76]  {Lemon Lemonade,                                                              

##        Raspberry Cookie}    => {Green Tea}           0.02191631  0.7448834 11.525598

## [77]  {Almond Twist,                                                                

##        Coffee Eclair}       => {Apple Pie}           0.03604638  0.9245690 11.524109

## [78]  {Apple Pie,                                                                   

##        Coffee Eclair}       => {Almond Twist}        0.03604638  0.9209302 11.476751

## [79]  {Almond Twist,                                                                

##        Hot Coffee}          => {Apple Pie}           0.02946449  0.9072876 11.308709

## [80]  {Apple Pie,                                                                   

##        Hot Coffee}          => {Almond Twist}        0.02946449  0.9041685 11.267864

## [81]  {Coffee Eclair,                                                               

##        Hot Coffee}          => {Almond Twist}        0.02953451  0.8910013 11.103773

## [82]  {Coffee Eclair,                                                               

##        Hot Coffee}          => {Apple Pie}           0.02950650  0.8901563 11.095179

## [83]  {Casino Cake,                                                                 

##        Chocolate Coffee}    => {Chocolate Cake}      0.03506610  0.9474082 10.901149

## [84]  {Casino Cake,                                                                 

##        Chocolate Cake}      => {Chocolate Coffee}    0.03506610  0.9395872 10.854885

## [85]  {Apricot Croissant,                                                           

##        Hot Coffee}          => {Blueberry Tart}      0.03447793  0.9280060 10.745428

## [86]  {Blueberry Tart,                                                              

##        Hot Coffee}          => {Apricot Croissant}   0.03447793  0.9368341 10.710447

## [87]  {Blackberry Tart,                                                             

##        Coffee Eclair}       => {Single Espresso}     0.02856823  0.7469791 10.604431

## [88]  {Chocolate Tart,                                                              

##        Vanilla Frappuccino} => {Walnut Cookie}       0.02810609  0.7441602 10.564412

## [89]  {Apricot Danish,                                                              

##        Opera Cake}          => {Cherry Tart}         0.04317443  0.9553765  9.864304

## [90]  {Cherry Tart,                                                                 

##        Opera Cake}          => {Apricot Danish}      0.04317443  0.9477405  9.838095

## [91]  {Chocolate Cake,                                                              

##        Chocolate Coffee}    => {Casino Cake}         0.03506610  0.7580987  9.731136

## [92]  {Apricot Danish,                                                              

##        Cherry Tart}         => {Opera Cake}          0.04317443  0.7742341  9.060391

## [93]  {Almond Twist,                                                                

##        Apple Pie,                                                                   

##        Hot Coffee}          => {Coffee Eclair}       0.02932445  0.9952471  8.730787

## [94]  {Almond Twist,                                                                

##        Apple Pie}           => {Coffee Eclair}       0.03604638  0.9356598  8.208058

## [95]  {Blackberry Tart,                                                             

##        Single Espresso}     => {Coffee Eclair}       0.02856823  0.9230769  8.097675

## [96]  {Almond Twist,                                                                

##        Hot Coffee}          => {Coffee Eclair}       0.02953451  0.9094437  7.978078

## [97]  {Apple Pie,                                                                   

##        Hot Coffee}          => {Coffee Eclair}       0.02950650  0.9054577  7.943111

## [98]  {Almond Twist,                                                                

##        Apple Pie,                                                                   

##        Coffee Eclair}       => {Hot Coffee}          0.02932445  0.8135198  7.610615

## [99]  {Almond Twist,                                                                

##        Apple Pie}           => {Hot Coffee}          0.02946449  0.7648128  7.154952

## [100] {Almond Twist,                                                                

##        Coffee Eclair}       => {Hot Coffee}          0.02953451  0.7575431  7.086943

## [101] {Apricot Croissant,                                                           

##        Blueberry Tart}      => {Hot Coffee}          0.03447793  0.7545204  7.058665

## [102] {Apple Pie,                                                                   

##        Coffee Eclair}       => {Hot Coffee}          0.02950650  0.7538462  7.052358

## [103] {Chocolate Cake}      => {Chocolate Coffee}    0.04625532  0.5322269  6.148723

## [104] {Chocolate Coffee}    => {Chocolate Cake}      0.04625532  0.5343796  6.148723

## [105] {Blueberry Tart}      => {Apricot Croissant}   0.04569516  0.5291065  6.049062

## [106] {Apricot Croissant}   => {Blueberry Tart}      0.04569516  0.5224143  6.049062

## [107] {Cherry Tart}         => {Apricot Danish}      0.05576406  0.5757663  5.976788

## [108] {Apricot Danish}      => {Cherry Tart}         0.05576406  0.5788632  5.976788

## [109] {Bottled Water}       => {Berry Tart}          0.03970143  0.5090681  5.828368

## [110] {Truffle Cake}        => {Gongolais Cookie}    0.04612929  0.5397346  5.808797

## [111] {Cheese Croissant}    => {Orange Juice}        0.04523303  0.5294214  5.612370

## [112] {Napoleon Cake}       => {Strawberry Cake}     0.04531705  0.5274654  5.547164

## [113] {Marzipan Cookie}     => {Tuile Cookie}        0.05348140  0.5723063  5.529326

## [114] {Tuile Cookie}        => {Marzipan Cookie}     0.05348140  0.5167095  5.529326

## [115] {Opera Cake}          => {Cherry Tart}         0.04555512  0.5331039  5.504321

## [116] {Opera Cake}          => {Apricot Danish}      0.04519102  0.5288430  5.489696

Looks like there were no redundant rules.

Summary and selection of rules

All the rules (except the last rule) have lifts of more than 5.5. This means that the correlation between all the items in their respective rules are very high.

From all the rules above, the rules which stand out are rules 1, 2, 10, 13, 58. The items in these rules seems to be appearing frequently in many other smaller rules as well. These items are Lemon Lemonade, Raspberry Lemonade, Lemon Cookie, Raspberry Cookie and Green Tea. We should also keep in mind that the antedecent and consequent seem to be switching places (a -> b and b -> a) in order to form new rules. Let us remove the rules based on their itemset while keeping in mind that the consequent and antedecent are interchangeable with only minor variations in support, confidence and lift. This is to make it easier to read. We will also sort the rules by support and lift.

#Removing rules based on the itemset

new.rules <- rules[!duplicated(generatingItemsets(rules))]

#Sorting the rules based on support and lift

rules.sorted.sup <- sort(new.rules, by = "support")
rules.sorted.lift <- sort(new.rules, by = "lift")
inspect(rules.sorted.lift)
##      lhs                     rhs                      support confidence      lift

## [1]  {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02177627  0.9980745 14.222810

## [2]  {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02177627  1.0000000 14.129007

## [3]  {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02177627  0.9980745 14.101801

## [4]  {Lemon Cookie,                                                               

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02684573  0.9953271 14.062983

## [5]  {Green Tea,                                                                  

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02179028  0.9948849 14.056736

## [6]  {Apple Croissant,                                                            

##       Apple Danish,                                                               

##       Cherry Soda}        => {Apple Tart}          0.02162223  0.9897436 14.017376

## [7]  {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02177627  0.9904459 13.994016

## [8]  {Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02697177  0.9228558 13.150925

## [9]  {Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02704179  0.9288119 13.123190

## [10] {Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02705579  0.9257307 13.079655

## [11] {Lemon Cookie,                                                               

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02691575  0.9209391 13.011955

## [12] {Apple Croissant,                                                            

##       Apple Danish}       => {Apple Tart}          0.02678972  0.9144359 12.950822

## [13] {Green Tea,                                                                  

##       Lemon Cookie}       => {Raspberry Lemonade}  0.02198633  0.9069902 12.924836

## [14] {Apple Croissant,                                                            

##       Cherry Soda}        => {Apple Danish}        0.02184629  0.9085614 12.906019

## [15] {Apple Croissant,                                                            

##       Cherry Soda}        => {Apple Tart}          0.02181828  0.9073966 12.851126

## [16] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02190231  0.9009217 12.838358

## [17] {Green Tea,                                                                  

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02184629  0.9075044 12.822135

## [18] {Green Tea,                                                                  

##       Lemon Cookie}       => {Lemon Lemonade}      0.02190231  0.9035240 12.765896

## [19] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02191631  0.9014977 12.737267

## [20] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Lemon Cookie}        0.02181828  0.8974654 12.718042

## [21] {Apple Danish,                                                               

##       Cherry Soda}        => {Apple Tart}          0.02177627  0.8962536 12.693312

## [22] {Chocolate Tart,                                                             

##       Walnut Cookie}      => {Vanilla Frappuccino} 0.02810609  0.9369748 11.646213

## [23] {Casino Cake,                                                                

##       Chocolate Coffee}   => {Chocolate Cake}      0.03506610  0.9474082 10.901149

## [24] {Apricot Danish,                                                             

##       Opera Cake}         => {Cherry Tart}         0.04317443  0.9553765  9.864304

## [25] {Almond Twist,                                                               

##       Apple Pie,                                                                  

##       Hot Coffee}         => {Coffee Eclair}       0.02932445  0.9952471  8.730787

## [26] {Almond Twist,                                                               

##       Apple Pie}          => {Coffee Eclair}       0.03604638  0.9356598  8.208058

## [27] {Blackberry Tart,                                                            

##       Single Espresso}    => {Coffee Eclair}       0.02856823  0.9230769  8.097675

## [28] {Almond Twist,                                                               

##       Hot Coffee}         => {Coffee Eclair}       0.02953451  0.9094437  7.978078

## [29] {Apple Pie,                                                                  

##       Hot Coffee}         => {Coffee Eclair}       0.02950650  0.9054577  7.943111

## [30] {Almond Twist,                                                               

##       Apple Pie}          => {Hot Coffee}          0.02946449  0.7648128  7.154952

## [31] {Apricot Croissant,                                                          

##       Blueberry Tart}     => {Hot Coffee}          0.03447793  0.7545204  7.058665

## [32] {Chocolate Coffee}   => {Chocolate Cake}      0.04625532  0.5343796  6.148723

## [33] {Blueberry Tart}     => {Apricot Croissant}   0.04569516  0.5291065  6.049062

## [34] {Apricot Danish}     => {Cherry Tart}         0.05576406  0.5788632  5.976788

## [35] {Bottled Water}      => {Berry Tart}          0.03970143  0.5090681  5.828368

## [36] {Truffle Cake}       => {Gongolais Cookie}    0.04612929  0.5397346  5.808797

## [37] {Cheese Croissant}   => {Orange Juice}        0.04523303  0.5294214  5.612370

## [38] {Napoleon Cake}      => {Strawberry Cake}     0.04531705  0.5274654  5.547164

## [39] {Marzipan Cookie}    => {Tuile Cookie}        0.05348140  0.5723063  5.529326

## [40] {Opera Cake}         => {Cherry Tart}         0.04555512  0.5331039  5.504321

## [41] {Opera Cake}         => {Apricot Danish}      0.04519102  0.5288430  5.489696

Looks like a lot of rules were removed because they have belong to the same itemset. Take a look at rule number 2.

{Green Tea, Lemon Cookie, Raspberry Cookie, Raspberry Lemonade} => {Lemon Lemonade}

This was the rule we were discussing previously with the 5 items: Lemon Lemonade, Raspberry Lemonade, Lemon Cookie, Raspberry Cookie and Green Tea. Many other smaller rules (rules 1, 3, 4, 5 and many others) recycles these items as well. Thus, this is one rule that we could show our customer. Let us take a look when the rules are sorted by support.

inspect(rules.sorted.sup)
##      lhs                     rhs                      support confidence      lift

## [1]  {Apricot Danish}     => {Cherry Tart}         0.05576406  0.5788632  5.976788

## [2]  {Marzipan Cookie}    => {Tuile Cookie}        0.05348140  0.5723063  5.529326

## [3]  {Chocolate Coffee}   => {Chocolate Cake}      0.04625532  0.5343796  6.148723

## [4]  {Truffle Cake}       => {Gongolais Cookie}    0.04612929  0.5397346  5.808797

## [5]  {Blueberry Tart}     => {Apricot Croissant}   0.04569516  0.5291065  6.049062

## [6]  {Opera Cake}         => {Cherry Tart}         0.04555512  0.5331039  5.504321

## [7]  {Napoleon Cake}      => {Strawberry Cake}     0.04531705  0.5274654  5.547164

## [8]  {Cheese Croissant}   => {Orange Juice}        0.04523303  0.5294214  5.612370

## [9]  {Opera Cake}         => {Apricot Danish}      0.04519102  0.5288430  5.489696

## [10] {Apricot Danish,                                                             

##       Opera Cake}         => {Cherry Tart}         0.04317443  0.9553765  9.864304

## [11] {Bottled Water}      => {Berry Tart}          0.03970143  0.5090681  5.828368

## [12] {Almond Twist,                                                               

##       Apple Pie}          => {Coffee Eclair}       0.03604638  0.9356598  8.208058

## [13] {Casino Cake,                                                                

##       Chocolate Coffee}   => {Chocolate Cake}      0.03506610  0.9474082 10.901149

## [14] {Apricot Croissant,                                                          

##       Blueberry Tart}     => {Hot Coffee}          0.03447793  0.7545204  7.058665

## [15] {Almond Twist,                                                               

##       Hot Coffee}         => {Coffee Eclair}       0.02953451  0.9094437  7.978078

## [16] {Apple Pie,                                                                  

##       Hot Coffee}         => {Coffee Eclair}       0.02950650  0.9054577  7.943111

## [17] {Almond Twist,                                                               

##       Apple Pie}          => {Hot Coffee}          0.02946449  0.7648128  7.154952

## [18] {Almond Twist,                                                               

##       Apple Pie,                                                                  

##       Hot Coffee}         => {Coffee Eclair}       0.02932445  0.9952471  8.730787

## [19] {Blackberry Tart,                                                            

##       Single Espresso}    => {Coffee Eclair}       0.02856823  0.9230769  8.097675

## [20] {Chocolate Tart,                                                             

##       Walnut Cookie}      => {Vanilla Frappuccino} 0.02810609  0.9369748 11.646213

## [21] {Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02705579  0.9257307 13.079655

## [22] {Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02704179  0.9288119 13.123190

## [23] {Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02697177  0.9228558 13.150925

## [24] {Lemon Cookie,                                                               

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02691575  0.9209391 13.011955

## [25] {Lemon Cookie,                                                               

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02684573  0.9953271 14.062983

## [26] {Apple Croissant,                                                            

##       Apple Danish}       => {Apple Tart}          0.02678972  0.9144359 12.950822

## [27] {Green Tea,                                                                  

##       Lemon Cookie}       => {Raspberry Lemonade}  0.02198633  0.9069902 12.924836

## [28] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02191631  0.9014977 12.737267

## [29] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02190231  0.9009217 12.838358

## [30] {Green Tea,                                                                  

##       Lemon Cookie}       => {Lemon Lemonade}      0.02190231  0.9035240 12.765896

## [31] {Apple Croissant,                                                            

##       Cherry Soda}        => {Apple Danish}        0.02184629  0.9085614 12.906019

## [32] {Green Tea,                                                                  

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02184629  0.9075044 12.822135

## [33] {Apple Croissant,                                                            

##       Cherry Soda}        => {Apple Tart}          0.02181828  0.9073966 12.851126

## [34] {Green Tea,                                                                  

##       Raspberry Cookie}   => {Lemon Cookie}        0.02181828  0.8974654 12.718042

## [35] {Green Tea,                                                                  

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02179028  0.9948849 14.056736

## [36] {Apple Danish,                                                               

##       Cherry Soda}        => {Apple Tart}          0.02177627  0.8962536 12.693312

## [37] {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Raspberry Lemonade}  0.02177627  0.9980745 14.222810

## [38] {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie}   => {Lemon Lemonade}      0.02177627  0.9980745 14.101801

## [39] {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02177627  0.9904459 13.994016

## [40] {Green Tea,                                                                  

##       Lemon Cookie,                                                               

##       Raspberry Cookie,                                                           

##       Raspberry Lemonade} => {Lemon Lemonade}      0.02177627  1.0000000 14.129007

## [41] {Apple Croissant,                                                            

##       Apple Danish,                                                               

##       Cherry Soda}        => {Apple Tart}          0.02162223  0.9897436 14.017376

Let us take rule 1, rule 2 and rule 3 because their support level is the highest and rule 1 and 2 have an acceptable level of high confidence (>0.55). We chose the 3 rules with the highest support levels because we want the items that many people are buying.

{Apricot Danish} => {Cherry Tart}

{Marzipan Cookie} => {Tuile Cookie}

{Chocolate Coffee} => {Chocolate Cake}

Recommendation Part 1

First, let us explain this rule: {Green Tea, Lemon Cookie, Raspberry Cookie, Raspberry Lemonade} => {Lemon Lemonade}

Please keep in mind that we removed some rules based on the itemset and the antecedent and consequent are interchangeable. There are 2 choices that we would like to advice to the bakery manager:

  1. The manager should bundle the items together. It could be be any combination that he likes because there are smaller sized rules which is a subset of the 5 items. For example, he could bundle Lemon Cookies + Raspberry Cookies + Green Tea or Lemon Cookies + Raspberry Cookies + Raspberry Lemonade or Lemon Cookies + Raspberry Cookies + Lemon Lemonade, and so on. He could even bundle them all in a package if he wanted to because the rule holds true even for 5 items. But how much of each item should he include per bundle if he were to bundle all 5 items together? We shall try and answer that question later.

  2. The manager could bundle the cookies together but exclude the drinks. Perhaps, he could place the drinks nearby the cookies so that they could just grab a drink after purchasing the cookies. He could experiment by slightly increasing the price of Green Tea if he wanted to because since Green Tea is already on the LHS of the rule, it means that people are already buying it together with the two types of cookies. We chose Green Tea because from the EB-build-goods.sql file, we saw that the original price of Green Tea is $1.85 while both the Lemonades are at $3.25. Since Green Tea is that much cheaper than both the Lemonades, maybe a slight increase of price will not be a deterrant for buying Green Tea.

Next, we will explain the three 2 items rules:

{Apricot Danish} => {Cherry Tart}

{Marzipan Cookie} => {Tuile Cookie}

{Chocolate Coffee} => {Chocolate Cake}

We would recommend the manager to produce coupons for the two items rules. For example, the coupons could be:

  1. Buy 5 Apricot Danishes for 1 free Cherry Tart

  2. Buy Chocolate Cake and Chocolate Coffee for $ X.XX

  3. Buy 10 Marzipan Cookies and 10 Tuile Cookies for $ X.XX

Recommendation Part 2 (How much to bundle?)

Let us explore on how many items should the manager bundle together for the rule if he were to bundle all 5 items together:

{Green Tea, Lemon Cookie, Raspberry Cookie, Raspberry Lemonade} => {Lemon Lemonade}

#Reading in the binary vector receipt file

binaryVector = read.csv("75000-out2.csv", header = FALSE)

#Removing the column for receipt ID

binaryVector <- binaryVector[2:51]

#Changing the column names to their respective food names

colnames(binaryVector) <- food$FoodName

#Reading in the file which has data on the amount of items purchased for the item based on the receipt

purchases <- read.csv("75000i.csv", header = FALSE, col.names = c("ReceiptID", "Amount", "Food"))

#Removing the unnecessary receipts by their ReceiptID

removedReceipts.num <- which(removedReceipts == TRUE)
purchases <- purchases[!(purchases$ReceiptID %in% removedReceipts.num), ]
purchases$Food <- food$FoodName[match(purchases$Food, food$FoodId)]

#Seeing which receipts contains all the items we are interested in

toCheck <- which(binaryVector$`Lemon Lemonade` == 1 &amp; binaryVector$`Raspberry Lemonade` == 1 &amp; binaryVector$`Lemon Cookie` == 1 &amp; binaryVector$`Raspberry Cookie` == 1 &amp; binaryVector$`Green Tea` == 1)

#Filtering out the receipts based on the receipt ID and only for the food we want to check

receiptsToCheck <- purchases[(purchases$ReceiptID %in% toCheck), ]
receiptsToCheck <- receiptsToCheck[receiptsToCheck$Food %in% c("Lemon Lemonade", "Raspberry Lemonade", "Lemon Cookie", "Raspberry Cookie", "Green Tea"), ]
receiptsToCheck$Food <- as.factor(receiptsToCheck$Food)
head(receiptsToCheck)
##     ReceiptID Amount               Food

## 76         23      4   Raspberry Cookie

## 77         23      1       Lemon Cookie

## 78         23      1     Lemon Lemonade

## 79         23      5 Raspberry Lemonade

## 80         23      3          Green Tea

## 188        55      2   Raspberry Cookie
#Data frame which has the amount of item purchased by item name

dfToCheck <- data.frame(receiptsToCheck %>% group_by(Food) %>%summarise(Frequency = sum(Amount)))

#Get the mean number of items purchased in a receipt

dfToCheck$Mean <- dfToCheck$Frequency / length(unique(receiptsToCheck$ReceiptID))

ggplotly(ggplot(dfToCheck, aes(x=Food, y=Mean, fill=Food)) + geom_bar(stat = "identity") + ggtitle("Average number of items purchased per receipt") + geom_text(aes(label=round(Mean,2))))



An average of 3 items each for all 5 items were purchased whenever all of them were bought in a single receipt. This would mean that we should bundle 3 Lemon Cookies, 3 Raspberry Cookies, 3 Green Tea, 3 Raspberry Lemonade and 3 Lemon Lemonade in a single package. This is a very nice coincidence because there are 3 kinds of flavours (Lemon, Raspberry and Green Tea) in the package.

The R Script for the exploratory exercise can be found here.