CHAPTER 11
Intermediate
Naive Bayes Classification
Updated: May 16, 2026
6 min read
# CHAPTER 11
Naive Bayes Classification
1. Introduction
Every algorithm we have used so far relies on drawing physical lines or geometric boundaries through data points. But how do you draw a line through a text document? You can't. Text requires a completely different approach. Naive Bayes abandons geometry entirely and relies purely on Probability. It is the algorithm behind the world's first successful spam filters, and it remains one of the fastest, most effective tools for Natural Language Processing (NLP).2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the basics of Bayes' Theorem.
- Explain why the algorithm is called "Naive".
-
Train a
MultinomialNBmodel for text classification.
-
Train a
GaussianNBmodel for continuous numerical data.
- Build a basic Sentiment Analysis system.
3. Bayes' Theorem Simplified
Bayes' Theorem is a mathematical formula for calculating Conditional Probability: *What is the probability of an event happening, given that another event has already happened?*In machine learning, we ask: *What is the probability that this email is SPAM, given that it contains the word "VIAGRA"?*
- The algorithm calculates this by looking at historical data: Out of all the past emails that contained the word "VIAGRA", what percentage were Spam? If it was 99%, the model confidently predicts Spam.
4. Why is it "Naive"?
If an email contains the phrase "Free Money", a human knows those two words belong together. The Naive Bayes algorithm is "Naive" because it assumes every single word (feature) is completely independent of every other word. It assumes the word "Free" has absolutely zero relationship to the word "Money". Despite this mathematically flawed assumption, the algorithm works astonishingly well in reality!5. Gaussian vs. Multinomial
Scikit-learn offers different versions of Naive Bayes depending on your data:-
Gaussian Naive Bayes (
GaussianNB): Use this when your features are continuous decimal numbers (like Height, Weight, Salary). It assumes the data follows a bell-curve (Normal Distribution).
-
Multinomial Naive Bayes (
MultinomialNB): Use this when your features are discrete counts (like the number of times a word appears in an email). This is the absolute standard for Text Classification.
6. Mini Project: Sentiment Analysis System
Let's build a simpleMultinomialNB model to classify movie reviews as Positive (1) or Negative (0).
*Note: Algorithms cannot read text. We must convert the text into numerical word counts using a CountVectorizer first.*
python
7. The Zero-Frequency Problem (Laplace Smoothing)
What if the new review contains the word "Atrocious", but the model has *never* seen that word during training? Mathematically, the probability of "Atrocious" being Spam is 0/0. Because Naive Bayes multiplies probabilities together, multiplying by 0 will destroy the entire equation, ruining the prediction. The Fix: Scikit-learn automatically applies *Laplace Smoothing* (alpha=1.0), which adds a baseline count of "1" to every possible word, ensuring the math never hits exactly zero.8. Common Mistakes
-
Using GaussianNB for Text: If you pass text frequency counts into
GaussianNB, the mathematical assumptions will break down, resulting in terrible accuracy. Always match the algorithm to the data type.
- Using Naive Bayes for complex numerical patterns: While NB is incredible for text, it is generally outperformed by Random Forests and Logistic Regression on standard tabular (CSV) data because the "naive" assumption of feature independence is rarely true in finance or healthcare.
9. Best Practices
-
Text Classification Baseline: Whenever you face an NLP problem (Spam, Sentiment, Topic Categorization), ALWAYS run a
CountVectorizer+MultinomialNBpipeline first. It takes 3 lines of code, trains in milliseconds, and often hits 90%+ accuracy without any tuning.
10. Exercises
- 1. According to the "Naive" assumption in Naive Bayes, what is the relationship between the features in the dataset?
- 2. Which variant of Naive Bayes should be used for predicting categories based on text frequency counts?
11. MCQ Quiz with Answers
Question 1
Why is the Naive Bayes algorithm called "Naive"?
Question 2
To train a Naive Bayes model on raw English sentences, what must be done to the text first?
12. Interview Questions
- Q: Explain Bayes' Theorem in the context of a Spam Filter. What exactly is the model calculating?
- Q: What is Laplace Smoothing, and why is it mathematically required in a Multinomial Naive Bayes text classifier?