Skip to main content
Classification Algorithms
CHAPTER 02 Intermediate

Setting Up Python and Machine Learning Environment

Updated: May 16, 2026
6 min read

# CHAPTER 2

Setting Up Python and Machine Learning Environment

1. Introduction

You cannot build an enterprise-grade AI model using a text editor and an empty terminal. Machine Learning requires a specific, highly optimized software stack to process massive datasets and perform complex matrix math. The phrase "It works on my machine" is a common nightmare in AI. In this chapter, we will establish a bulletproof, standardized environment to ensure your code runs flawlessly as you build classification models.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Install a stable version of Python.
  • Create and manage isolated Virtual Environments.
  • Install the core ML stack (scikit-learn, numpy, pandas).
  • Set up Visual Studio Code (VS Code).
  • Configure and launch Jupyter Notebooks.

3. Installing Python

Machine learning algorithms rely heavily on 64-bit Python. *Warning: Do not download "Beta" versions of Python. Stick to stable releases (e.g., Python 3.10 or 3.11) to ensure compatibility with scientific libraries.*

Windows:

  1. 1. Go to python.org/downloads and download the Windows installer.
  1. 2. CRITICAL: On the very first screen of the installer, check the box that says "Add Python to PATH". If you forget this, your terminal will not recognize Python commands.

macOS:

  1. 1. Open Terminal and install Homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. 2. Install Python via Homebrew: brew install python

Linux (Ubuntu): Run these commands in your terminal: sudo apt update sudo apt install python3 python3-pip python3-venv

4. Virtual Environments (The Sandbox)

Never install machine learning libraries globally on your main computer. Different projects require different versions of libraries, which will inevitably cause conflicts. We use a Virtual Environment to create an isolated sandbox for this course.
  1. 1. Open your terminal/command prompt.
  1. 2. Create a folder for the course: mkdir classification_course and enter it: cd classification_course
  1. 3. Create the virtual environment (named ai_env):
  • Windows: python -m venv ai_env
  • Mac/Linux: python3 -m venv ai_env
  1. 4. Activate the environment:
  • Windows: ai_env\Scripts\activate
  • Mac/Linux: source ai_env/bin/activate
*(You should now see (ai_env) at the beginning of your command prompt line).*

5. Installing the ML Stack (Scikit-Learn)

With your sandbox activated, we will use Python's package manager (pip) to download the industry-standard libraries.

Run the following command:

bash
1
pip install scikit-learn numpy pandas matplotlib jupyter

*What are we installing?*

  • scikit-learn: The ultimate library containing all Classification algorithms (Decision Trees, SVMs, etc.).
  • numpy: The engine for ultra-fast matrix math.
  • pandas: The "programmable spreadsheet" for loading and cleaning datasets.
  • matplotlib: The visualization library for drawing graphs.
  • jupyter: The interactive coding environment.

6. VS Code and Jupyter Notebook Setup

Data Scientists do not write code in standard .py files when exploring data. They use Jupyter Notebooks (.ipynb), which allow you to run code in small blocks and see the graphs directly below the code.
  1. 1. Download Visual Studio Code from code.visualstudio.com.
  1. 2. Open VS Code, go to the Extensions tab (the squares icon on the left).
  1. 3. Search for and install the Python extension and the Jupyter extension.
  1. 4. Open your classification_course folder in VS Code.
  1. 5. Create a new file named test_env.ipynb.
  1. 6. Open the file, click "Select Kernel" in the top right corner, and select your ai_env Python environment.

7. Verify the Installation

Let's ensure everything is working. Type this into the first cell of your Notebook and click Play (Shift+Enter):
python
12345678910111213141516
import sklearn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(f"Scikit-Learn version installed: {sklearn.__version__}")
print(f"Pandas version installed: {pd.__version__}")

# Test the visualization engine
x = np.array([1, 2, 3])
y = np.array([1, 0, 1]) # Binary classes

plt.scatter(x, y, color='blue', s=100)
plt.title("Classification Environment Verification")
plt.yticks([0, 1], ["Class 0", "Class 1"])
plt.show()

*If it prints the versions and displays a graph with 3 blue dots, your computer is ready to build AI!*

8. Common Mistakes

  • Forgetting to activate the environment: If you restart your computer, open a terminal, and type import sklearn, it will throw a ModuleNotFoundError. You must run ai_env\Scripts\activate every time you open a new terminal for this project.
  • Using Anaconda recklessly: While Anaconda is a popular alternative distribution for Python, it installs hundreds of libraries you will never use, bloating your hard drive. The pip and venv method shown here is much lighter and is the standard for professional deployment.

9. Best Practices

  • Freeze your requirements: Once your project is working, run pip freeze > requirements.txt in your terminal. This saves a list of every library and its exact version number. When you share your code, others can run pip install -r requirements.txt to replicate your exact environment.

10. Exercises

  1. 1. Create a new virtual environment named sandbox, activate it, and install only pandas.
  1. 2. Write a block of Python code in a Jupyter Notebook to print out the current version of NumPy you have installed.

11. MCQ Quiz with Answers

Question 1

Why is it standard practice to use a Virtual Environment (venv) when building Machine Learning projects?

Question 2

Which tool is primarily used by Data Scientists to write interactive code blocks and visualize graphs inline?

12. Interview Questions

  • Q: Explain the purpose of a requirements.txt file and how it aids in model deployment.
  • Q: Describe the distinct roles of NumPy, Pandas, and Scikit-learn in a standard machine learning pipeline.

13. FAQs

Q: Do I need to buy an expensive NVIDIA GPU to do classification? A: No! While Deep Learning (Neural Networks) requires massive GPUs, the traditional Classification algorithms we use in this course (via Scikit-learn) are highly optimized to run purely on your computer's standard CPU.

14. Summary

By systematically installing Python, isolating your dependencies with virtual environments, and configuring Jupyter Notebooks inside VS Code, you have built an enterprise-grade foundation. You are no longer fighting with installation errors; your computer is now a dedicated Machine Learning workstation.

15. Next Chapter Recommendation

We have the tools, but we must ensure we speak the language. In Chapter 3: Python Basics for Machine Learning, we will review the exact Python syntax—variables, data structures, and functions—required to write analytical scripts.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·