Skip to main content
Regression Models
CHAPTER 03 Intermediate

Python Basics for Regression Analysis

Updated: May 16, 2026
5 min read

# CHAPTER 3

Python Basics for Regression Analysis

1. Introduction

To build predictive models, you must speak the language of Data Science. That language is Python. Python is universally favored by the analytics community because its syntax is incredibly readable, allowing analysts to focus on complex statistics rather than memory management or semicolons. In this chapter, we will cover the core Python programming concepts—from variables to functions—that form the backbone of every data analysis script.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define variables and identify core data types.
  • Store collections of data in Lists and Dictionaries.
  • Control program flow using if/else conditions.
  • Iterate through data using for and while loops.
  • Write reusable functions for data analysis.

3. Variables and Data Types

In Python, variables are created the moment you assign a value to them. You do not need to declare their type (e.g., int or string).
python
123456789101112131415
# Integer (Whole numbers, like number of bedrooms)
bedrooms = 3

# Float (Decimals, like square footage or regression coefficients)
square_feet = 1500.5
slope = 2.45

# String (Text data, like cities or categories)
city = "New York"

# Boolean (True/False flags)
is_renovated = True

# Printing variables using f-strings
print(f"The house in {city} has {bedrooms} bedrooms and is {square_feet} sqft.")

4. Data Structures: Lists

In data science, you rarely process one number at a time. A Python List is an ordered, changeable collection of items.
python
1234567891011121314
# A list of historical house prices
prices = [250000, 300000, 450000, 200000, 500000]

# Accessing the first item (0-indexed)
print(prices[0])  # Output: 250000

# Accessing the last item
print(prices[-1]) # Output: 500000

# Slicing (getting the first 3 items)
print(prices[0:3]) # Output: [250000, 300000, 450000]

# Appending a new value
prices.append(320000)

5. Data Structures: Dictionaries

Dictionaries store data in key: value pairs. Dictionaries are incredibly useful for structuring messy data before loading it into a Pandas DataFrame.
python
123456789101112131415
# Storing a single row of house data
house_data = {
    "Address": "123 Main St",
    "Price": 300000,
    "Bedrooms": 4
}

# Accessing a value by its key
print(house_data["Price"]) # Output: 300000

# Updating a value
house_data["Price"] = 315000

# Adding a new key
house_data["Has_Pool"] = False

6. Conditions (If / Else)

We use conditional logic to clean data or make analytical decisions based on thresholds.
python
123456789
house_price = 450000
budget = 400000

if house_price <= budget:
    print("This house is within budget.")
elif house_price <= budget + 50000:
    print("This house is slightly over budget. Negotiate.")
else:
    print("This house is out of budget. Drop from analysis.")

7. Loops (For and While)

Loops are how we iterate through datasets. While we usually use Pandas to avoid writing manual loops, understanding them is crucial for custom metrics.
python
1234567891011
# A 'for' loop iterating over a list
salaries = [50000, 60000, 70000]
for salary in salaries:
    # Give everyone a 10% raise
    new_salary = salary * 1.10
    print(f"New Salary: ${new_salary}")

# A 'for' loop using range
# Loop 5 times
for i in range(5):
    print(f"Processing row {i}...")

8. List Comprehensions

A highly efficient, "Pythonic" way to create a new list by transforming an existing list in a single line of code.
python
12345678
# Raw prices in dollars
prices = [100000, 200000, 300000]

# Convert to thousands of dollars (e.g., 100k)
prices_in_k = [p / 1000 for p in prices]

print(prices_in_k) 
# Output: [100.0, 200.0, 300.0]

9. Functions for Data Analysis

Functions allow you to encapsulate code into reusable blocks. You will often write custom functions to calculate specific statistical errors.
python
123456789101112
def calculate_average(number_list):
    """
    Takes a list of numbers and returns the mathematical average (mean).
    """
    total_sum = sum(number_list)
    count = len(number_list)
    return total_sum / count

# Calling the function
house_prices = [300000, 400000, 500000]
avg_price = calculate_average(house_prices)
print(f"The average price is: ${avg_price}")

10. Common Mistakes

  • Indentation Errors: Python does not use {} brackets to define code blocks like C++ or Java. It uses whitespace (indentation). If you forget to indent the code inside a for loop or if statement, Python will crash with an IndentationError.
  • Zero-Indexing: Beginners often try to access the first item of a list using list[1]. In Python, the first item is always list[0].

11. Best Practices

  • Type Hinting: While Python doesn't require you to declare types, adding "Type Hints" makes data scripts much easier to read and debug. (e.g., def calculate_average(prices: list) -> float:)
  • Docstrings: Always write a brief explanation ("""...""") under your function definition explaining what the function expects and what it returns.

12. Exercises

  1. 1. Create a dictionary that holds the configuration for a machine learning run: algorithm as "Linear Regression", test_size of 0.2, and random_state as 42.
  1. 2. Write a for loop that iterates over a list of numbers [1, 2, 3, 4, 5] and prints the square of each number.

13. MCQ Quiz with Answers

Question 1

Which data structure stores elements in Key-Value pairs?

Question 2

How does Python define the scope of a code block (like the code inside an if statement)?

14. Interview Questions

  • Q: Explain the difference between a List and a Dictionary in Python, and provide a data analytics use case for each.
  • Q: What is a List Comprehension, and why is it preferred over a standard for loop for simple data transformations?

15. FAQs

Q: Do I need to be an expert Python software engineer to do Machine Learning? A: No. While advanced Object-Oriented Python knowledge is necessary for deploying applications, building and analyzing predictive models mostly requires a solid understanding of the basics covered here, plus a deep understanding of data manipulation libraries (which we cover next!).

16. Summary

Python's elegant and highly readable syntax is the foundation of modern data science. By mastering variables, data structures like lists and dictionaries, control flow logic, and reusable functions, you possess the grammatical tools necessary to write complex analytics scripts.

17. Next Chapter Recommendation

Standard Python lists are great, but they are far too slow and disorganized for massive datasets containing thousands of rows and columns. In Chapter 4: NumPy, Pandas, and Data Preparation, we will introduce the industry-standard libraries used to load CSV files and clean data.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·