CHAPTER 12 Beginner

Data Selection and Filtering

Updated: May 18, 2026

5 min read

# CHAPTER 12

Data Selection and Filtering

1. Chapter Introduction

Selecting the right subset of data is the most frequent Pandas operation. This chapter covers all four access methods — loc, iloc, boolean indexing, and query() — with real-world patterns.

2. loc — Label-Based Selection

python

1234567891011121314151617181920

import pandas as pd

df = pd.DataFrame({
    &#039;Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    &#039;Dept': ['Eng', 'Mkt', 'Eng', 'Sales', 'HR'],
    &#039;Salary': [85000, 62000, 91000, 55000, 58000],
    &#039;Experience': [5, 3, 8, 2, 4],
    &#039;Rating': [4.5, 3.8, 4.9, 3.5, 4.1]
}, index=[&#039;E001', 'E002', 'E003', 'E004', 'E005'])

# loc[row_label, col_label] — label-based
print(df.loc[&#039;E001'])                        # Single row
print(df.loc[&#039;E001', 'Salary'])              # Single value: 85000
print(df.loc[&#039;E001':'E003'])                 # Row slice (inclusive!)
print(df.loc[&#039;E001':'E003', ['Name', 'Salary']])  # Rows + specific cols
print(df.loc[:, &#039;Salary':'Rating'])          # All rows, col range

# Update using loc
df.loc[&#039;E001', 'Salary'] = 90000
df.loc[&#039;E002', ['Rating', 'Experience']] = [4.0, 4]

3. iloc — Position-Based Selection

python

123456789101112

# iloc[row_pos, col_pos] — integer position-based
print(df.iloc[0])                   # First row
print(df.iloc[0, 2])                # Row 0, col 2: Salary
print(df.iloc[1:4])                 # Rows 1,2,3 (stop exclusive)
print(df.iloc[::2])                 # Every other row
print(df.iloc[:, [0, 2]])           # Cols 0 and 2 (Name, Salary)
print(df.iloc[-1])                  # Last row
print(df.iloc[0:3, 0:3])           # Top-left 3x3 block

# Key difference: loc slice is INCLUSIVE, iloc is EXCLUSIVE at end
# df.loc['E001':'E003'] → 3 rows (E001, E002, E003)
# df.iloc[0:3]          → 3 rows (0, 1, 2) same result here

4. Boolean Filtering

python

12345678910111213141516171819202122232425

df = pd.read_csv(&#039;employees.csv') if False else pd.DataFrame({
    &#039;Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve', 'Frank'],
    &#039;Dept': ['Eng', 'Mkt', 'Eng', 'Sales', 'HR', 'Eng'],
    &#039;Salary': [85000, 62000, 91000, 55000, 58000, 78000],
    &#039;Experience': [5, 3, 8, 2, 4, 6],
    &#039;Rating': [4.5, 3.8, 4.9, 3.5, 4.1, 4.3]
})

# Single condition
high_salary = df[df[&#039;Salary'] > 70000]
print(high_salary)

# Multiple conditions (& = AND, | = OR, ~ = NOT)
senior_eng = df[(df[&#039;Dept'] == 'Eng') & (df['Experience'] >= 5)]
print(senior_eng)

low_or_high = df[(df[&#039;Salary'] < 60000) | (df['Salary'] > 85000)]
not_sales = df[~(df[&#039;Dept'] == 'Sales')]

# isin — filter by list of values
selected_depts = df[df[&#039;Dept'].isin(['Eng', 'HR'])]

# str operations on string columns
engineers = df[df[&#039;Name'].str.startswith('A')]
upper_case = df[df[&#039;Dept'].str.upper() == 'ENG']

5. query() — SQL-Like Filtering

python

1234567891011121314

# query() — cleaner syntax for complex filters
high_rated = df.query("Rating > 4.0")
senior_high_pay = df.query("Experience >= 5 and Salary > 75000")

# Using Python variables in query (@ prefix)
min_salary = 65000
max_exp = 7
result = df.query("Salary > @min_salary and Experience <= @max_exp")

# String queries
eng_staff = df.query("Dept == &#039;Eng'")
not_mkt = df.query("Dept != &#039;Mkt'")

print(df.query("Salary > 70000 and Rating >= 4.0"))

6. Selecting and Filtering Together

python

1234567891011121314

# Get specific columns after filtering
result = df.loc[df[&#039;Dept'] == 'Eng', ['Name', 'Salary', 'Rating']]

# Get all salaries above median as Series
median_sal = df[&#039;Salary'].median()
above_median = df.loc[df[&#039;Salary'] > median_sal, 'Salary']

# Update filtered rows
df.loc[df[&#039;Rating'] >= 4.5, 'Level'] = 'Star'
df.loc[df[&#039;Rating'] < 4.5, 'Level'] = 'Standard'

# at and iat — fastest single-cell access
print(df.at[0, &#039;Name'])     # Label-based single cell
print(df.iat[0, 0])         # Position-based single cell

7. Common Mistakes

df[df['col'] == 'val']['other'] = x: This is chained indexing — doesn't reliably update. Use df.loc[condition, 'other'] = x.

and/or vs &/|: Python's and/or doesn't work element-wise. Use & and | with parentheses around each condition.

8. MCQs

Question 1

`df.loc[0:3]` on default index returns?

Question 2

`df.iloc[0:3]` returns?

Question 3

`df[df['Dept'].isin(['Eng', 'HR'])]` filters?

Question 4

`~mask` in boolean filter?

Question 5

`df.query("Salary > @threshold")` uses?

Question 6

`df.at[0, 'Name']` is faster than `df.loc[0, 'Name']` because?

Question 7

`df.loc[:, 'A':'C']` selects?

Question 8

Boolean condition with `|` requires?

Question 9

`df[df['Name'].str.startswith('A')]` returns?

Question 10

Correct way to update filtered rows?

9. Interview Questions

Q: What is the difference between loc and iloc in Pandas?

Q: How do you filter a DataFrame with multiple conditions?

10. Summary

Four selection tools: loc (label), iloc (position), boolean filtering (condition masks), query() (SQL-like strings). Always use loc for conditional updates to avoid chained indexing. Parentheses around each condition in multi-condition boolean filters are mandatory.

11. Next Chapter Recommendation

In Chapter 13: Data Cleaning in Pandas, we tackle real-world messy data — duplicates, formatting issues, inconsistent values, and column standardization.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Data Selection and Filtering #

1. Chapter Introduction #

2. loc — Label-Based Selection #

3. iloc — Position-Based Selection #

4. Boolean Filtering #

5. query() — SQL-Like Filtering #

6. Selecting and Filtering Together #

7. Common Mistakes #

8. MCQs #

df.loc[0:3] on default index returns?

df.iloc[0:3] returns?

df[df['Dept'].isin(['Eng', 'HR'])] filters?

~mask in boolean filter?

df.query("Salary > @threshold") uses?

df.at[0, 'Name'] is faster than df.loc[0, 'Name'] because?

df.loc[:, 'A':'C'] selects?

Boolean condition with | requires?

df[df['Name'].str.startswith('A')] returns?

Correct way to update filtered rows?

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Data Selection and Filtering

1. Chapter Introduction

2. loc — Label-Based Selection

3. iloc — Position-Based Selection

4. Boolean Filtering

5. query() — SQL-Like Filtering

6. Selecting and Filtering Together

7. Common Mistakes

8. MCQs

`df.loc[0:3]` on default index returns?

`df.iloc[0:3]` returns?

`df[df['Dept'].isin(['Eng', 'HR'])]` filters?

`~mask` in boolean filter?

`df.query("Salary > @threshold")` uses?

`df.at[0, 'Name']` is faster than `df.loc[0, 'Name']` because?

`df.loc[:, 'A':'C']` selects?

Boolean condition with `|` requires?

`df[df['Name'].str.startswith('A')]` returns?

9. Interview Questions

10. Summary

11. Next Chapter Recommendation