A Naive Bayes Classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions.The main advantage of naive Bayes is that it only requires a smaller amount of data for training inorder to estimate the class labels necessary for classification. Because independent variables are assumed.
In general all of Machine Learning Algorithms need to be trained for supervised learning tasks like classification, prediction etc.
By training it means to train them on particular inputs so that later on we may test them for unknown inputs (which they have never seen before) for which they may classify or predict etc (in case of supervised learning) based on their learning. This is what most of the Machine Learning techniques like Neural Networks, SVM, Bayesian etc. are based upon.
How to Apply NaiveBayes to Predict an Outcome
Let's try it out using an example.
In the above training data we have 2 class labels buys_computer No and Yes. And we know 4 characteristics.
1. Whether the age is youth,middle_aged or senior.
2. Whether income is high,low or medium.
3. Whether they have student or not.
4. Whether credit is excellent,fair.
There are many things to pre-compute from the training dataset for future prediction.
Prior Probabilities
Prior Probabilities
-------------------
P(yes) = 9/14 = 0.643
Given that the class label is "yes" the universe is 14 = yes(9) + no(5). 9 of them is yes
P(no) = 5/14 = 0.357
Given that the class label is "no" the universe is 14 = yes(9) + no(5). 5 of them is no
Probability of Likelihood
Probability of Likelihood
-------------------------
P(youth/yes) = 2/9 = 0.222
Given that the class label is "yes" the universe is 9. 2 of them are youth.
P(youth/no) = 3/5 = 0.600
...
...
P(fair/yes) = 6/9 = 0.667
P(fair/no) = 2/5 = 0.400
How to classify an outcome
Let's say we are given the properties of an unknown buys_computer (class). We are told that the properties are
Let's say we are given the properties of an unknown buys_computer (class). We are told that the properties are
X => age = youth, income = medium, student = yes, credit rating = fair
X => age = youth, income = medium, student = yes, credit rating = fair
We need to
Maximize P(X|Ci )P(Ci ), for i = 1, 2
P(Ci ) - the prior probability of each class, can be computed based on the training tuples:
P(yes/youth,medium,yes and fair)
= P(youth/yes)* P(medium/yes)* P(yes/yes)* P(fair/yes) * P(yes)
= (0.222* 0.444* 0.667* 0.667) * 0.643
= 0.028
P(no/youth,income,medium,yes and fair)
= P(youth/no)* P(medium/no)* P(yes/no)* P(fair/no) * P(no)
= (0.600* 0.400* 0.200* 0.400) * 0.357
= 0.007