Lets take a store data
pen,pencil
pencil,book,eraser
pen,book,eraser,chalk
pen,eraser,chalk
pen,pencil,book
pen,pencil,book,eraser
pen,Ink
pen,pencil,book
pen,pencil,eraser
pencil,book,chalk
To start with Apriori follow the below steps.
Step 1: Initially we need to find Item 1 Frequent Dataset
c1
------
book 6
chalk 3
eraser 6
pen 8
pencil 7
Ink 1
We will say that an item set is frequent if it appears in at least 3 transactions of the itemset: the value 3 is the support threshold.
Support count = 3 (user defined)
So the items less that support count can be discarded form F1 frequent Dataset.
so our new set will be
L1
------
book 6
chalk 3
eraser 6
pen 8
pencil 7
Step 2: We need to generate size 2 frequent item pair sets by joining L1 set
eg:{book} U {chalk} => {book,chalk} and so on..
{book,chalk}
{book,eraser}
{book,pen}
{book,pencil}
{chalk,eraser}
{chalk,pen}
{chalk,pencil}
{eraser,pen}
{eraser,pencil}
{pen,pencil}
Once the transactions are joined we need to identify the no occurence of the above data items in original transaction(That will be the support count of C2)
C2 ---------------- {book,chalk} 2 {book,eraser} 2 {book,pen} 4 {book,pencil} 5 {chalk,eraser} 2 {chalk,pen} 2 {chalk,pencil} 0 {eraser,pen} 5 {eraser,pencil} 3 {pen,pencil} 5
Transactions less that support count can be discarded form C2 frequent Dataset
L2
----------------
{book,pen} 4
{book,pencil} 5
{eraser,pen} 5
{eraser,pencil} 3
{pen,pencil} 5
To find C3 loop through L2
eg: {book,pen} U {book,pencil} => {book,pen,pencil}
C3
-------------------------
{book,pen,pencil} 3
{chalk,eraser,pen} 2
{eraser,pen,pencil} 2
Transactions less that support count can be discarded form C3 frequent Dataset
L3
-------------------------
{book,pen,pencil} 3
There are no transaction to join further.
So our Frequent item sets are
L1:
-------
book 6
chalk 3
eraser 6
pen 8
pencil 7
L2:
-----------------
{book,pen} 4
{book,pencil} 5
{eraser,pen} 5
{eraser,pencil} 3
{pen,pencil} 5
L3
-------------------------
{book,pen,pencil} 3
Step 3: We need to generate Strong Assosiaction Rules for frequent Set using L1,L2and L3
Say confidence is 60% and Support count is 3.So we have to find the Transactions with no.of item 3 and which has a confidence >=60.Now we can identify L3 set
{book,pen,pencil} 3
Finding Ruleset
{book,pen} => pencil
{book,pencil} => pen
{pen,pencil} => book
pencil => {book,pen}
pen => {book,pencil}
book => {pen,pencil}
Now we need to find the confidence of each transaction
eg: {book,pen} => pencil = support Cnt{book,pen,pencil}/ support count({pencil})
Therefore rules having confidence greater than and equal to 60 are
book,pen=>pencil 75.0
book,pencil=>pen 60.0
pen,pencil=>book 60.0
These are the strongest rules.
If a customer buys book and pen he have a tendency to buy a pencil too. Like wise if he buys book and pencil he may buy pen too.