Setting Up Python and Machine Learning Environment
# CHAPTER 2
Setting Up Python and Machine Learning Environment
1. Introduction
You cannot build an enterprise-grade AI model using a text editor and an empty terminal. Machine Learning requires a specific, highly optimized software stack to process massive datasets and perform complex matrix math. The phrase "It works on my machine" is a common nightmare in AI. In this chapter, we will establish a bulletproof, standardized environment to ensure your code runs flawlessly as you build classification models.2. Learning Objectives
By the end of this chapter, you will be able to:- Install a stable version of Python.
- Create and manage isolated Virtual Environments.
-
Install the core ML stack (
scikit-learn,numpy,pandas).
- Set up Visual Studio Code (VS Code).
- Configure and launch Jupyter Notebooks.
3. Installing Python
Machine learning algorithms rely heavily on 64-bit Python. *Warning: Do not download "Beta" versions of Python. Stick to stable releases (e.g., Python 3.10 or 3.11) to ensure compatibility with scientific libraries.*Windows:
-
1.
Go to
python.org/downloadsand download the Windows installer.
- 2. CRITICAL: On the very first screen of the installer, check the box that says "Add Python to PATH". If you forget this, your terminal will not recognize Python commands.
macOS:
-
1.
Open Terminal and install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
2.
Install Python via Homebrew:
brew install python
Linux (Ubuntu):
Run these commands in your terminal:
sudo apt update
sudo apt install python3 python3-pip python3-venv
4. Virtual Environments (The Sandbox)
Never install machine learning libraries globally on your main computer. Different projects require different versions of libraries, which will inevitably cause conflicts. We use a Virtual Environment to create an isolated sandbox for this course.- 1. Open your terminal/command prompt.
-
2.
Create a folder for the course:
mkdir classification_courseand enter it:cd classification_course
-
3.
Create the virtual environment (named
ai_env):
-
Windows:
python -m venv ai_env
-
Mac/Linux:
python3 -m venv ai_env
- 4. Activate the environment:
-
Windows:
ai_env\Scripts\activate
-
Mac/Linux:
source ai_env/bin/activate
(ai_env) at the beginning of your command prompt line).*
5. Installing the ML Stack (Scikit-Learn)
With your sandbox activated, we will use Python's package manager (pip) to download the industry-standard libraries.
Run the following command:
*What are we installing?*
-
scikit-learn: The ultimate library containing all Classification algorithms (Decision Trees, SVMs, etc.).
-
numpy: The engine for ultra-fast matrix math.
-
pandas: The "programmable spreadsheet" for loading and cleaning datasets.
-
matplotlib: The visualization library for drawing graphs.
-
jupyter: The interactive coding environment.
6. VS Code and Jupyter Notebook Setup
Data Scientists do not write code in standard.py files when exploring data. They use Jupyter Notebooks (.ipynb), which allow you to run code in small blocks and see the graphs directly below the code.
-
1.
Download Visual Studio Code from
code.visualstudio.com.
- 2. Open VS Code, go to the Extensions tab (the squares icon on the left).
- 3. Search for and install the Python extension and the Jupyter extension.
-
4.
Open your
classification_coursefolder in VS Code.
-
5.
Create a new file named
test_env.ipynb.
-
6.
Open the file, click "Select Kernel" in the top right corner, and select your
ai_envPython environment.
7. Verify the Installation
Let's ensure everything is working. Type this into the first cell of your Notebook and click Play (Shift+Enter):*If it prints the versions and displays a graph with 3 blue dots, your computer is ready to build AI!*
8. Common Mistakes
-
Forgetting to activate the environment: If you restart your computer, open a terminal, and type
import sklearn, it will throw aModuleNotFoundError. You must runai_env\Scripts\activateevery time you open a new terminal for this project.
-
Using Anaconda recklessly: While Anaconda is a popular alternative distribution for Python, it installs hundreds of libraries you will never use, bloating your hard drive. The
pipandvenvmethod shown here is much lighter and is the standard for professional deployment.
9. Best Practices
-
Freeze your requirements: Once your project is working, run
pip freeze > requirements.txtin your terminal. This saves a list of every library and its exact version number. When you share your code, others can runpip install -r requirements.txtto replicate your exact environment.
10. Exercises
-
1.
Create a new virtual environment named
sandbox, activate it, and install onlypandas.
- 2. Write a block of Python code in a Jupyter Notebook to print out the current version of NumPy you have installed.
11. MCQ Quiz with Answers
Why is it standard practice to use a Virtual Environment (venv) when building Machine Learning projects?
Which tool is primarily used by Data Scientists to write interactive code blocks and visualize graphs inline?
12. Interview Questions
-
Q: Explain the purpose of a
requirements.txtfile and how it aids in model deployment.
- Q: Describe the distinct roles of NumPy, Pandas, and Scikit-learn in a standard machine learning pipeline.