Skip to main content
Classification Algorithms
CHAPTER 03 Intermediate

Python Basics for Machine Learning

Updated: May 16, 2026
6 min read

# CHAPTER 3

Python Basics for Machine Learning

1. Introduction

To command a machine learning algorithm, you must speak its language. Python is the undisputed lingua franca of Artificial Intelligence. Its clean, readable syntax allows Data Scientists to focus on complex math and data structures rather than wrestling with memory management or complex compiling rules. In this chapter, we will cover the foundational Python concepts that you will use in every single data script you write.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define variables and identify core data types.
  • Store complex data in Lists and Dictionaries.
  • Use if/else logic to categorize data.
  • Iterate through records using for loops.
  • Write reusable functions for data analysis.

3. Variables and Data Types

In Python, you do not declare a variable's type. Python figures it out automatically the moment you assign a value.
python
12345678910111213141516
# Integer (Whole numbers, used for counts or discrete classes)
customer_age = 34
fraud_class = 1  # 1 for Fraud, 0 for Not Fraud

# Float (Decimals, used for probabilities or scaled features)
transaction_amount = 250.75
spam_probability = 0.89

# String (Text data, which must be encoded later)
user_city = "London"

# Boolean (True/False flags)
is_active = True

# Printing output cleanly using f-strings
print(f"Customer from {user_city} spent ${transaction_amount}")

4. Data Structures: Lists

In AI, we deal with thousands of data points. We group them using a List (an ordered, mutable collection of items).
python
1234567891011121314
# A list of probabilities output by a model
probabilities = [0.15, 0.88, 0.45, 0.92, 0.05]

# Accessing the first probability (Python is 0-indexed!)
print(probabilities[0])  # Output: 0.15

# Slicing: Getting the first 3 probabilities
print(probabilities[0:3]) # Output: [0.15, 0.88, 0.45]

# Appending a new prediction
probabilities.append(0.77)

# Finding the length of the list
print(len(probabilities)) # Output: 6

5. Data Structures: Dictionaries

Dictionaries store data in key: value pairs. They are perfect for structuring a single row of messy data before loading it into Pandas.
python
123456789101112131415
# Storing features for a single email
email_features = {
    "word_count": 450,
    "contains_link": True,
    "sender_domain": "unknown.com"
}

# Accessing a specific feature
print(email_features["word_count"]) # Output: 450

# Updating a feature
email_features["contains_link"] = False

# Adding a new feature
email_features["spam_label"] = 1

6. Conditions (If / Elif / Else)

We use conditional logic to make hard decisions based on thresholds (e.g., converting a probability into a final class prediction).
python
123456789
spam_probability = 0.82
threshold = 0.50

if spam_probability >= threshold:
    print("Classified as: SPAM (Class 1)")
elif spam_probability < 0.10:
    print("Classified as: DEFINITELY SAFE (Class 0)")
else:
    print("Classified as: SAFE (Class 0)")

7. Loops (For and While)

Loops allow you to iterate through datasets. While we prefer Pandas for massive data, for loops are essential for custom metrics and tuning.
python
12345678910111213
# Iterating over a list of model predictions
predictions = [1, 0, 1, 1, 0]

spam_count = 0
for pred in predictions:
    if pred == 1:
        spam_count += 1

print(f"Total spam emails found: {spam_count}")

# Looping a specific number of times
for i in range(3):
    print(f"Training Epoch {i}...")

8. List Comprehensions

A "Pythonic" way to transform lists in a single, highly readable line of code. It replaces bulky for loops.
python
123456789
# Raw probabilities
probs = [0.1, 0.9, 0.4, 0.8]

# Convert probabilities to hard classes (1 if >= 0.5, else 0)
# This is a List Comprehension!
classes = [1 if p >= 0.5 else 0 for p in probs]

print(classes) 
# Output: [0, 1, 0, 1]

9. Functions for Data Workflows

Functions encapsulate logic into reusable blocks, preventing messy, repetitive code.
python
12345678910111213
def calculate_accuracy(correct_predictions, total_predictions):
    """
    Calculates the accuracy percentage of a model.
    """
    if total_predictions == 0:
        return 0.0
    
    accuracy = (correct_predictions / total_predictions) * 100
    return round(accuracy, 2)

# Calling the function
acc = calculate_accuracy(correct_predictions=85, total_predictions=100)
print(f"Model Accuracy: {acc}%")

10. Common Mistakes

  • Indentation Errors: Unlike C++ or Java which use {} to define code blocks, Python uses physical whitespace (indentation). If you forget to indent the code inside an if statement, Python will crash with an IndentationError.
  • Modifying a list while looping through it: If you use a for loop to iterate through a list, and you try to .remove() items from that list inside the loop, it will skip elements and cause chaotic bugs. Always create a new list or use a list comprehension instead.

11. Best Practices

  • Docstrings and Comments: Always write a brief explanation ("""...""") under your function definitions. When you look at your code 6 months from now, you will thank yourself.

12. Exercises

  1. 1. Create a dictionary that stores the configuration for a classification model: algorithm as "KNN", neighbors as 5, and metric as "euclidean".
  1. 2. Write a list comprehension that loops through [10, 20, 30, 40] and divides every number by 10.

13. MCQ Quiz with Answers

Question 1

Which data structure is best used to store data as Key-Value pairs, similar to a JSON file?

Question 2

How does Python indicate that a block of code belongs inside a for loop?

14. Interview Questions

  • Q: Explain the difference between a List and a Dictionary, and provide an example of when you would use each in a data processing pipeline.
  • Q: What is a List Comprehension in Python, and why is it preferred for simple data transformations?

15. FAQs

Q: Do I need to learn Object-Oriented Programming (Classes) for this course? A: Not extensively. While Scikit-learn is built entirely on Classes, as a beginner, you will primarily be *using* their pre-built classes (instantiating them and calling methods like .fit()) rather than writing your own from scratch.

16. Summary

Python's elegance is its greatest strength. By mastering variables, structuring data with lists and dictionaries, executing conditional logic, and building reusable functions, you now possess the grammatical foundation required to write analytical scripts and command machine learning algorithms.

17. Next Chapter Recommendation

Standard Python lists are great, but they are incredibly slow when dealing with millions of rows of data. In Chapter 4: NumPy, Pandas, and Data Preparation, we will introduce the heavy-duty data science libraries required to wrangle massive CSV files.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·