Saturday, 23 August 2014

Example for Apriori Algorithm


Lets take a store data
pen,pencil
pencil,book,eraser
pen,book,eraser,chalk
pen,eraser,chalk
pen,pencil,book
pen,pencil,book,eraser
pen,Ink
pen,pencil,book
pen,pencil,eraser
pencil,book,chalk
To start with Apriori follow the below steps.
Step 1: Initially we need to find Item 1 Frequent Dataset
c1
------
book 6
chalk 3
eraser 6
pen 8
pencil 7
Ink 1
We will say that an item set is frequent if it appears in at least 3 transactions of the itemset: the value 3 is the support threshold.

Support count = 3 (user defined)

So the items less that support count can be discarded form F1 frequent Dataset.
so our new set will be
L1
------
book 6
chalk 3
eraser 6
pen 8
pencil 7
Step 2: We need to generate size 2 frequent item pair sets by joining L1 set
eg:{book} U {chalk} => {book,chalk} and so on..
{book,chalk}
{book,eraser}
{book,pen}
{book,pencil}


{chalk,eraser} 
{chalk,pen} 
{chalk,pencil}

{eraser,pen} 
{eraser,pencil} 

{pen,pencil}
Once the transactions are joined we need to identify the no occurence of the above data items in original transaction(That will be the support count of C2)
C2
----------------
{book,chalk} 2
{book,eraser} 2
{book,pen} 4
{book,pencil} 5


{chalk,eraser} 2
{chalk,pen} 2
{chalk,pencil} 0

{eraser,pen} 5
{eraser,pencil} 3

{pen,pencil} 5
Transactions less that support count can be discarded form C2 frequent Dataset
L2
----------------
{book,pen} 4
{book,pencil} 5
{eraser,pen} 5
{eraser,pencil} 3
{pen,pencil} 5
To find C3 loop through L2
eg: {book,pen} U {book,pencil} => {book,pen,pencil}
C3
-------------------------
{book,pen,pencil} 3
{chalk,eraser,pen} 2
{eraser,pen,pencil} 2
Transactions less that support count can be discarded form C3 frequent Dataset
L3
-------------------------
{book,pen,pencil} 3
There are no transaction to join further.
So our Frequent item sets are
L1:
-------
book 6
chalk 3
eraser 6
pen 8
pencil 7

L2:
-----------------
{book,pen} 4
{book,pencil} 5
{eraser,pen} 5
{eraser,pencil} 3
{pen,pencil} 5


L3
-------------------------
{book,pen,pencil} 3
Step 3: We need to generate Strong Assosiaction  Rules for frequent Set using L1,L2and L3

Say confidence is 60% and Support count is 3.So we have to find the Transactions with no.of item 3  and which has a confidence >=60.Now we can identify L3 set
{book,pen,pencil} 3

Finding Ruleset
{book,pen} => pencil
{book,pencil} => pen
{pen,pencil} => book

pencil => {book,pen}
pen => {book,pencil}
book => {pen,pencil}
Now we need to find the confidence of each transaction
eg: {book,pen} => pencil
           = support Cnt{book,pen,pencil}/ support count({pencil})

Therefore rules having confidence greater than and equal to 60 are
book,pen=>pencil 75.0
book,pencil=>pen 60.0
pen,pencil=>book 60.0
These are the strongest rules.
If a customer buys book and pen he have a tendency to buy a pencil too. Like wise if he buys book and pencil he may buy pen too.

2 comments:

  1. please provide code in java

    thanks
    siva'
    9030865822

    ReplyDelete
  2. {book,pen} => pencil
    = support Cnt{book,pen}/ support count({pencil})
    = 4 / 8 = 50% which is lesser than 60%

    ReplyDelete