Skip to main content
Scikit-learn Basics
CHAPTER 02 Intermediate

Setting Up Python and Scikit-learn Environment

Updated: May 16, 2026
6 min read

# CHAPTER 2

Setting Up Python and Scikit-learn Environment

1. Introduction

A chef cannot cook without a kitchen, and a data scientist cannot build models without the proper environment. Before we write our first machine learning algorithm, we need to install Python, an Integrated Development Environment (IDE), and the necessary scientific libraries. In this chapter, we will set up a professional Machine Learning development environment that will serve you throughout your career.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Install the latest version of Python on Windows, macOS, or Linux.
  • Understand and create Python Virtual Environments.
  • Install VS Code and the Jupyter Notebook extension.
  • Use pip to install Scikit-learn, NumPy, and Pandas.
  • Verify your installation with a simple test script.

3. Installing Python

Scikit-learn requires Python. It is highly recommended to use Python 3.8 or newer.

Windows:

  1. 1. Go to python.org/downloads.
  1. 2. Download the Windows installer.
  1. 3. CRITICAL STEP: When you run the installer, you must check the box that says "Add Python to PATH" at the bottom of the window before clicking "Install Now".

macOS:

  1. 1. macOS comes with Python pre-installed, but it is often an older version.
  1. 2. The best way to install modern Python on a Mac is using Homebrew. Open your terminal and run: brew install python

Linux (Ubuntu/Debian): Open your terminal and run: sudo apt update sudo apt install python3 python3-pip python3-venv

4. Virtual Environments

When you work on multiple Python projects, they might require different versions of libraries (e.g., Project A needs Scikit-learn 1.0, Project B needs Scikit-learn 1.2). If you install everything globally, projects will conflict and break. Virtual Environments solve this by creating an isolated folder for each project's dependencies.

Let's create a folder for our course and set up an environment:

  1. 1. Open your terminal/command prompt.
  1. 2. Create a folder: mkdir ml_course and enter it: cd ml_course
  1. 3. Create the virtual environment (named env):
  • Windows: python -m venv env
  • Mac/Linux: python3 -m venv env
  1. 4. Activate the environment:
  • Windows Command Prompt: env\Scripts\activate
  • Windows PowerShell: .\env\Scripts\Activate.ps1
  • Mac/Linux: source env/bin/activate

*You will know it worked because your terminal prompt will now start with (env).*

5. Installing the ML Libraries

With the virtual environment active, we use Python's package manager, pip, to install our tools.

Run this command in your terminal:

bash
1
pip install scikit-learn pandas numpy matplotlib jupyter

*Note: This will download and install the core libraries we need for machine learning, data manipulation, and visualization.*

6. Setting Up VS Code and Jupyter Notebooks

While you can write ML code in a standard .py file, Data Scientists prefer Jupyter Notebooks. Notebooks allow you to write code in chunks (cells), run them one at a time, and see visualizations (like charts and graphs) directly beneath the code.
  1. 1. Download and install Visual Studio Code (VS Code) from code.visualstudio.com.
  1. 2. Open VS Code and go to the Extensions tab (the squares icon on the left).
  1. 3. Search for and install the Python extension (by Microsoft).
  1. 4. Search for and install the Jupyter extension (by Microsoft).

7. Step-by-Step Implementation: Your First Notebook

Let's verify everything is working.
  1. 1. In VS Code, open the ml_course folder you created earlier.
  1. 2. Create a new file named test.ipynb. The .ipynb extension tells VS Code this is a Jupyter Notebook.
  1. 3. Open test.ipynb. In the top right corner, click "Select Kernel" -> "Python Environments" -> and select the env virtual environment we created earlier.
  1. 4. Type the following code into the first cell and click the "Play" button next to the cell to run it:
python
1234567
import sklearn
import pandas as pd
import numpy as np

print(f"Scikit-learn version: {sklearn.__version__}")
print(f"Pandas version: {pd.__version__}")
print("Environment setup successful! 🚀")

*If this prints the versions without an error, you are ready to go!*

8. Alternative Setup: Google Colab

If your computer is very old or you are struggling with the installation, you can skip all of this and use Google Colab. Google Colab is a free, cloud-based Jupyter Notebook environment that runs in your browser. Python, Scikit-learn, and Pandas are already pre-installed. Just go to colab.research.google.com and start coding!

9. Common Mistakes

  • Forgetting to activate the virtual environment: If you open a new terminal tomorrow and run python script.py, you will get a ModuleNotFoundError because you forgot to run the activate command first.
  • Installing globally using sudo pip on Mac/Linux: This can overwrite system Python libraries and break your operating system. Always use virtual environments.

10. Best Practices

  • requirements.txt: When you share your code, others need to know what libraries to install. You can automatically generate a list of your installed libraries by running pip freeze > requirements.txt.

11. Exercises

  1. 1. Open your terminal, create a new folder called ml_practice, create a virtual environment inside it, activate it, and install only numpy.
  1. 2. Create a Jupyter Notebook, import numpy as np, and print np.array([1, 2, 3]).

12. MCQ Quiz with Answers

Question 1

Why is it highly recommended to use Python Virtual Environments?

Question 2

Which tool allows Data Scientists to write Python code in interactive cells and view charts directly inline?

13. Interview Questions

  • Q: Explain what pip is and how it relates to Scikit-learn.
  • Q: What is the difference between writing Python code in a .py file versus a .ipynb (Jupyter Notebook) file?

14. FAQs

Q: Can I use PyCharm or Anaconda instead of VS Code? A: Yes! Anaconda is highly popular in data science because it comes pre-packaged with Jupyter and Scikit-learn. PyCharm also has great support for ML. VS Code is taught here because it is lightweight and the most widely used editor globally.

15. Summary

Setting up an ML environment requires care. By leveraging Python Virtual Environments, we ensure our projects remain stable and isolated. Combining VS Code with Jupyter Notebooks gives us a powerful, interactive canvas to manipulate data, train models, and visualize results seamlessly.

16. Next Chapter Recommendation

If you already know Python, you can breeze through the next section. If you are new to the language or need a refresher, Chapter 3: Python Basics for Machine Learning will cover the exact Python syntax (lists, dictionaries, loops, functions) you need to succeed in this course.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·