Naive Bayes Classifier is a classification technique based on Bayes’ Theorem. It is base on the principle that the predictors are independent of each other. In simple words, we can say that the Naive Bayes classifier assumes that the presence of a particular feature in a class is independent(Unrelated) with the presence of any other feature in the same class. Let's understand this concept by an example, suppose a fruit may be considered to be orange if it is orange in color, approximately round, and about 2.5 inches in diameter. Here we can see that all of these properties independently contribute to the probability that this fruit is orange, even if these features depend on each other. This is the reason, why it is known as ‘Naive’. (Naive meaning: Unaffected).
Naive Bayes algorithm is simple to understand and easy to build. It does not contain any complicated iterative parameter estimation. We can use a Naive Bayes classifier in small data set as well as with a large data set that may be highly sophisticated classification.
The naive Bayes classifier is based on the Bayes theorem of probability. Bayes theorem can be used for calculating posterior probability P(yX) from P(y), P(X), and P(Xy). The mathematical equation for Bayes Theorem is,
From the equation, we have,
Since Naive Bayes classifier assumes the independence of predictors (features), so for independent features, we calculate the output probability using Bayes theorem as,
Which can be represented as,
Since the denominator is constant, we can write,
Now, To create a Naive Bayes classifier model, we find the probability of a given set of inputs for all possible values of the class variable y and pick up the output with maximum probability. This can be expressed mathematically as:
So, finally, we are left with the task of calculating P(y) and P(xi  y).
NOTE: P(y) is also called class probability and P(xi  y) is called conditional probability.
Let’s understand the working and algorithm of Naive Bayes Classifier using an example. Below is the training data set for playing golf under different circumstances. We have different features as Outlook, Temperature, Humidity, Windy, and we are given a label as play golf under different situations of those features. We need to predict whether to play or not for new test data(that we provide) by using a naive Bayes classification algorithm. Let’s do it step by step and learn this algorithm.
OUTLOOK 
TEMPERATURE 
HUMIDITY 
WINDY 
PLAY GOLF 

0 
Rainy 
Hot 
High 
False 
No 
1 
Rainy 
Hot 
High 
True 
No 
2 
Overcast 
Hot 
High 
False 
Yes 
3 
Sunny 
Mild 
High 
False 
Yes 
4 
Sunny 
Cool 
Normal 
False 
Yes 
5 
Sunny 
Cool 
Normal 
True 
No 
6 
Overcast 
Cool 
Normal 
True 
Yes 
7 
Rainy 
Mild 
High 
False 
No 
8 
Rainy 
Cool 
Normal 
False 
Yes 
9 
Sunny 
Mild 
Normal 
False 
Yes 
10 
Rainy 
Mild 
Normal 
True 
Yes 
11 
Overcast 
Mild 
High 
True 
Yes 
12 
Overcast 
Hot 
Normal 
False 
Yes 
13 
Sunny 
Mild 
High 
True 
No 
Here, The attributes are Outlook, Windy, Temperature, and humidity. And the class (or Target) is Play Golf.
Step 1: Convert the given training data set into a frequency table
Step 2: Create a Likelihood table (or you can say a probability table) by finding the probabilities.
In those tables we have calculated both P(y) (i.e. P(yes) and P(no)) and P(xi  y) (e.g. p(humidity,highyes)).
Step 3: Now, apply the Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability will be the outcome of the prediction.
Let's Suppose our test data be, test = (Sunny, Hot, Normal, False). For this, we need to predict whether it will be okay to play golf or not.
Let's Calculate:
Probability of playing golf:
Probability of not playing golf:
Here, we can see that in both of the probabilities there is a common factor p(test), so we can ignore it. Thus we get the calculation as follows,
and,
To convert these numbers into actual probabilities, we normalize them as follows,
and,
From the above calculations, we see that
Thus, the prediction for golf played is 'Yes'.
Pros:
Cons:
Post a Comment
No Comments