CHAPTER 13 Beginner

Introduction to Pandas

Updated: May 18, 2026

5 min read

# CHAPTER 13

Introduction to Pandas

1. Chapter Introduction

NumPy is fast, but it is limited. It requires homogeneous data (all numbers or all text) and lacks column names. Real-world data is messy—it contains names (text), ages (integers), and salaries (floats) all in one table. To handle this, we need Pandas. Often described as "Excel on steroids," Pandas is the absolute core of the Python data science workflow.

2. What is Pandas?

Pandas is a software library built *on top* of NumPy. It provides high-performance, easy-to-use data structures and data analysis tools.

If NumPy is for pure math, Pandas is for data manipulation, cleaning, and preparation.

Installation & Importing:

bash

!pip install pandas

python

# 'pd' is the universal industry standard alias
import pandas as pd

3. The Two Core Structures

Pandas has two primary data structures: the Series (1D) and the DataFrame (2D).

1. The Pandas Series: A Series is essentially a single column of data. It is a 1D array, but unlike NumPy, it has an explicitly defined "Index" (labels for each row).

python

12345678910

import pandas as pd

# Create a Series from a list
ages = pd.Series([22, 35, 58], name="Age")
print(ages)
# Output:
# 0    22
# 1    35
# 2    58
# Name: Age, dtype: int64

2. The Pandas DataFrame: A DataFrame is a 2D table (like an Excel sheet or SQL table). It is essentially a collection of Series sharing the same index.

python

1234567891011121314

# Create a DataFrame from a Dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Department": ["Sales", "IT", "HR"]
}

df = pd.DataFrame(data)
print(df)
# Output:
#       Name  Age Department
# 0    Alice   25      Sales
# 1      Bob   30         IT
# 2  Charlie   35         HR

4. Data Loading

Data Scientists rarely type out DataFrames by hand. They load them from files. Pandas can read almost any file format with a single line of code.

python

12345678

# The most common command in data science
# Read a CSV file into a DataFrame
df = pd.read_csv(&#039;employee_data.csv')

# Other formats:
# pd.read_excel('data.xlsx')
# pd.read_json('data.json')
# pd.read_sql('SELECT * FROM table', connection)

5. Data Inspection

Once the data is loaded, the very first thing you do is inspect it to see what you are dealing with.

python

1234567891011121314

# 1. Look at the top 5 rows (CRITICAL)
print(df.head())

# 2. Look at the bottom 3 rows
print(df.tail(3))

# 3. Get the shape (Rows, Columns)
print(df.shape) 

# 4. Get a technical summary (Column names, null counts, data types)
print(df.info())

# 5. Get a statistical summary (Mean, min, max for numerical columns)
print(df.describe())

6. Mini Project: Employee Dataset Analyzer

Let's simulate loading an employee dataset and profiling it.

python

1234567891011121314151617

import pandas as pd

# Simulating data loading
data = {
    "ID": [101, 102, 103, 104],
    "Name": ["John", "Sarah", "Mike", "Emma"],
    "Salary": [60000, 85000, 50000, 92000],
    "Years_Exp": [2, 5, 1, 7]
}
df = pd.DataFrame(data)

print("--- DATASET PREVIEW ---")
print(df.head(2))

print("\n--- QUICK STATISTICS ---")
# .describe() automatically ignores text columns like 'Name'
print(df.describe())

7. Common Mistakes

Printing the whole DataFrame: If you load a 1-million-row CSV and type print(df), Jupyter might freeze trying to render it. *Always* use df.head() to look at the data.

Confusing Series and DataFrames: A Series is a single column. A DataFrame is a table. If you extract one column from a DataFrame (df['Name']), it becomes a Series.

8. MCQs

Question 1

What is Pandas?

Question 2

What is the standard alias for importing Pandas?

Question 3

What is a 1D column of data called in Pandas?

Question 4

What is a 2D table of data called in Pandas?

Question 5

How do you load a CSV file into a Pandas DataFrame?

Question 6

What method should you always use to preview the first 5 rows of a newly loaded DataFrame?

Question 7

Which method provides a technical summary, showing the number of non-null values and data types for each column?

Question 8

Which method calculates the mean, minimum, and maximum for all numerical columns?

Question 9

If a DataFrame has 100 rows and 5 columns, what does `df.shape` return?

Q10. Can a DataFrame hold mixed data types (e.g., text, floats, booleans) across different columns? a) Yes, that is the main advantage over a NumPy matrix b) No, it must be homogeneous — Answer: a

9. Interview Questions

Q: What is the difference between a Pandas Series and a Pandas DataFrame?

Q: Walk me through the exact steps (and functions) you would use immediately after loading an unknown CSV file into Pandas.

10. Summary

Pandas is the core of the Data Science workflow. It allows you to load mixed tabular data from CSVs or SQL databases into a DataFrame. Once loaded, use df.head() to visually inspect the data, df.info() to check for missing values and data types, and df.describe() to understand the basic statistics of your numbers.

11. Next Chapter Recommendation

In Chapter 14: Pandas Series and DataFrames, we will learn how to navigate this data—selecting specific rows, filtering for specific conditions (like Salary > 50000), and adding new columns.

Explore More

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Introduction to Pandas #

1. Chapter Introduction #

2. What is Pandas? #

3. The Two Core Structures #

4. Data Loading #

5. Data Inspection #

6. Mini Project: Employee Dataset Analyzer #

7. Common Mistakes #

8. MCQs #

What is Pandas?

What is the standard alias for importing Pandas?

What is a 1D column of data called in Pandas?

What is a 2D table of data called in Pandas?

How do you load a CSV file into a Pandas DataFrame?

What method should you always use to preview the first 5 rows of a newly loaded DataFrame?

Which method provides a technical summary, showing the number of non-null values and data types for each column?

Which method calculates the mean, minimum, and maximum for all numerical columns?

If a DataFrame has 100 rows and 5 columns, what does df.shape return?

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

🧪 Related Labs 1

🎥 Related Videos 1

🗺️ Related Roadmaps 1

Send Feedback / Bug

Feedback Submitted!

Introduction to Pandas

1. Chapter Introduction

2. What is Pandas?

3. The Two Core Structures

4. Data Loading

5. Data Inspection

6. Mini Project: Employee Dataset Analyzer

7. Common Mistakes

8. MCQs

If a DataFrame has 100 rows and 5 columns, what does `df.shape` return?

9. Interview Questions

10. Summary

11. Next Chapter Recommendation