CHAPTER 07
Intermediate
K-Nearest Neighbors (KNN)
Updated: May 16, 2026
6 min read
# CHAPTER 7
K-Nearest Neighbors (KNN)
1. Introduction
Logistic Regression uses complex calculus to draw a straight line. But humans don't categorize things using calculus. We categorize things by association. If you see an unknown animal standing in the middle of a flock of sheep, you assume the unknown animal is probably a sheep. K-Nearest Neighbors (KNN) works exactly like this. It is one of the simplest, most intuitive, and non-linear algorithms in machine learning. In this chapter, we will learn how to classify data based purely on proximity.2. Learning Objectives
By the end of this chapter, you will be able to:- Explain the logic behind K-Nearest Neighbors.
- Understand Euclidean Distance.
-
Explain the importance of the hyperparameter
K.
-
Train a
KNeighborsClassifierinscikit-learn.
- Understand the computational drawbacks of KNN.
3. How KNN Works (The "Lazy" Learner)
KNN is fundamentally different from almost every other algorithm because it doesn't actually "learn" a mathematical formula during training. When you callmodel.fit(), it simply memorizes the entire training dataset. It just stores the coordinates of all the dots.
When you ask it to make a prediction for a *new* dot, it follows these steps:
- 1. It calculates the physical distance from the new dot to *every single other dot* in the memorized dataset.
-
2.
It finds the
Kclosest dots (the Nearest Neighbors).
-
3.
It takes a majority vote. If
K=3, and the 3 closest neighbors are 2 Cats and 1 Dog, it predicts the new dot is a Cat!
4. Distance Metrics
How does it calculate distance? By default,scikit-learn uses Euclidean Distance (the straight-line distance between two points, calculated using the Pythagorean theorem: $c^2 = a^2 + b^2$).
5. Choosing the K-Value
The hyperparameterK (the number of voting neighbors) is the most critical setting in KNN.
- K = 1: The algorithm only looks at the *single* closest neighbor. This leads to massive Overfitting (High Variance). It creates highly jagged, complex decision boundaries that memorize outliers.
- K = 100: The algorithm looks at 100 neighbors. It becomes too smooth and ignores local clusters, leading to Underfitting (High Bias).
- The Sweet Spot: Usually an odd number between 3 and 15 (Odd numbers prevent 50/50 ties in binary classification).
6. Mini Project: Iris Flower Classification
Let's build a KNN model to classify flower species based on the length and width of their petals.
python
7. The Mandatory Rule: Feature Scaling
WARNING: KNN relies entirely on calculating physical geometry (distances). If one feature is Salary (measured in $100,000s) and another is Age (measured in 10s), the massive numbers in Salary will mathematically dwarf the Age column. The algorithm will think Age doesn't matter at all! You MUST use aStandardScaler (or MinMaxScaler) on your X features before using KNN!
8. Common Mistakes
-
Using an Even K-Value in Binary Classification: If predicting Yes/No, and you set
K=4, you might get 2 "Yes" votes and 2 "No" votes. Scikit-learn will have to break the tie arbitrarily. Always use an odd number (3, 5, 7) for binary tasks.
- Using KNN in Production with Massive Datasets: Because KNN calculates the distance to *every single training point* during the prediction phase, it is incredibly slow at inference time. If you have 10 million rows, predicting one new user will take seconds. In production, we need models that predict in milliseconds.
9. Best Practices
- Use for Non-Linear, Complex Clusters: If you plot your data and the classes form weird, amoeba-like shapes rather than straight lines, KNN will easily wrap its boundary around them because it has no rigid mathematical equation restricting its shape.
10. Exercises
-
1.
If you configure
KNeighborsClassifier(n_neighbors=5), and the 5 closest neighbors to a new point belong to Classes[1, 1, 0, 1, 0], what will the final prediction be?
- 2. Why is KNN referred to as a "Lazy" learning algorithm?
11. MCQ Quiz with Answers
Question 1
How does the K-Nearest Neighbors algorithm make a prediction for a new, unseen data point?
Question 2
Which preprocessing step is absolutely mandatory before fitting a KNN model to prevent features with large numeric scales from dominating the distance math?
12. Interview Questions
-
Q: Explain the impact of the hyperparameter
Kon the Bias-Variance tradeoff in a KNN model. What happens when K is very small vs. very large?
- Q: Explain why KNN is extremely fast to train but incredibly slow to make predictions compared to Logistic Regression.
13. FAQs
Q: Can KNN be used for Regression (predicting continuous numbers)? A: Yes!KNeighborsRegressor works the exact same way, but instead of taking a majority *vote* of the neighbors' classes, it calculates the mathematical *average* of the neighbors' continuous values.