Skip to main content
Classification Algorithms
CHAPTER 07 Intermediate

K-Nearest Neighbors (KNN)

Updated: May 16, 2026
6 min read

# CHAPTER 7

K-Nearest Neighbors (KNN)

1. Introduction

Logistic Regression uses complex calculus to draw a straight line. But humans don't categorize things using calculus. We categorize things by association. If you see an unknown animal standing in the middle of a flock of sheep, you assume the unknown animal is probably a sheep. K-Nearest Neighbors (KNN) works exactly like this. It is one of the simplest, most intuitive, and non-linear algorithms in machine learning. In this chapter, we will learn how to classify data based purely on proximity.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Explain the logic behind K-Nearest Neighbors.
  • Understand Euclidean Distance.
  • Explain the importance of the hyperparameter K.
  • Train a KNeighborsClassifier in scikit-learn.
  • Understand the computational drawbacks of KNN.

3. How KNN Works (The "Lazy" Learner)

KNN is fundamentally different from almost every other algorithm because it doesn't actually "learn" a mathematical formula during training. When you call model.fit(), it simply memorizes the entire training dataset. It just stores the coordinates of all the dots.

When you ask it to make a prediction for a *new* dot, it follows these steps:

  1. 1. It calculates the physical distance from the new dot to *every single other dot* in the memorized dataset.
  1. 2. It finds the K closest dots (the Nearest Neighbors).
  1. 3. It takes a majority vote. If K=3, and the 3 closest neighbors are 2 Cats and 1 Dog, it predicts the new dot is a Cat!

4. Distance Metrics

How does it calculate distance? By default, scikit-learn uses Euclidean Distance (the straight-line distance between two points, calculated using the Pythagorean theorem: $c^2 = a^2 + b^2$).

5. Choosing the K-Value

The hyperparameter K (the number of voting neighbors) is the most critical setting in KNN.
  • K = 1: The algorithm only looks at the *single* closest neighbor. This leads to massive Overfitting (High Variance). It creates highly jagged, complex decision boundaries that memorize outliers.
  • K = 100: The algorithm looks at 100 neighbors. It becomes too smooth and ignores local clusters, leading to Underfitting (High Bias).
  • The Sweet Spot: Usually an odd number between 3 and 15 (Odd numbers prevent 50/50 ties in binary classification).

6. Mini Project: Iris Flower Classification

Let's build a KNN model to classify flower species based on the length and width of their petals.
python
12345678910111213141516171819202122232425262728
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# 1. Historical Training Data
# Features: [Petal Length, Petal Width]
X_train = np.array([
    [1.5, 0.2], # Species 0 (Setosa)
    [1.4, 0.2], # Species 0
    [4.7, 1.4], # Species 1 (Versicolor)
    [4.5, 1.5]  # Species 1
])

y_train = np.array([0, 0, 1, 1])

# 2. Initialize the Model
# n_neighbors is the "K" value! We ask the 3 closest flowers.
knn_model = KNeighborsClassifier(n_neighbors=3)

# 3. "Train" the Model (It just memorizes the points)
knn_model.fit(X_train, y_train)

# 4. Make a Prediction!
# A new flower has a Petal Length of 4.6 and Width of 1.4
X_test = np.array([[4.6, 1.4]])

prediction = knn_model.predict(X_test)
print(f"Predicted Flower Species: Class {prediction[0]}")
# Output: Predicted Flower Species: Class 1

7. The Mandatory Rule: Feature Scaling

WARNING: KNN relies entirely on calculating physical geometry (distances). If one feature is Salary (measured in $100,000s) and another is Age (measured in 10s), the massive numbers in Salary will mathematically dwarf the Age column. The algorithm will think Age doesn't matter at all! You MUST use a StandardScaler (or MinMaxScaler) on your X features before using KNN!

8. Common Mistakes

  • Using an Even K-Value in Binary Classification: If predicting Yes/No, and you set K=4, you might get 2 "Yes" votes and 2 "No" votes. Scikit-learn will have to break the tie arbitrarily. Always use an odd number (3, 5, 7) for binary tasks.
  • Using KNN in Production with Massive Datasets: Because KNN calculates the distance to *every single training point* during the prediction phase, it is incredibly slow at inference time. If you have 10 million rows, predicting one new user will take seconds. In production, we need models that predict in milliseconds.

9. Best Practices

  • Use for Non-Linear, Complex Clusters: If you plot your data and the classes form weird, amoeba-like shapes rather than straight lines, KNN will easily wrap its boundary around them because it has no rigid mathematical equation restricting its shape.

10. Exercises

  1. 1. If you configure KNeighborsClassifier(n_neighbors=5), and the 5 closest neighbors to a new point belong to Classes [1, 1, 0, 1, 0], what will the final prediction be?
  1. 2. Why is KNN referred to as a "Lazy" learning algorithm?

11. MCQ Quiz with Answers

Question 1

How does the K-Nearest Neighbors algorithm make a prediction for a new, unseen data point?

Question 2

Which preprocessing step is absolutely mandatory before fitting a KNN model to prevent features with large numeric scales from dominating the distance math?

12. Interview Questions

  • Q: Explain the impact of the hyperparameter K on the Bias-Variance tradeoff in a KNN model. What happens when K is very small vs. very large?
  • Q: Explain why KNN is extremely fast to train but incredibly slow to make predictions compared to Logistic Regression.

13. FAQs

Q: Can KNN be used for Regression (predicting continuous numbers)? A: Yes! KNeighborsRegressor works the exact same way, but instead of taking a majority *vote* of the neighbors' classes, it calculates the mathematical *average* of the neighbors' continuous values.

14. Summary

K-Nearest Neighbors is a triumph of simplicity. By abandoning complex equations and relying purely on geometric proximity and majority voting, KNN can draw highly complex, non-linear boundaries. However, its strict requirement for scaled features and heavy computational load during prediction make it better suited for prototyping rather than high-speed enterprise deployment.

15. Next Chapter Recommendation

KNN classifies based on distance, but what if we want an algorithm that classifies based on logical, human-readable rules? In Chapter 8: Decision Tree Classification, we will abandon geometry entirely and learn how algorithms can build dynamic, automated flowcharts to make decisions.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·