Skip to main content
Pandas & NumPy
CHAPTER 25 Beginner

Pandas with SQL Databases

Updated: May 18, 2026
5 min read

# CHAPTER 25

Pandas with SQL Databases

1. Chapter Introduction

Most production data lives in SQL databases — MySQL, PostgreSQL, SQLite. Pandas bridges the gap between databases and Python analytics: read SQL results into DataFrames, transform data in Python, write results back to the database.

2. SQLite Setup (Zero-Config)

python
123456789101112131415161718192021222324252627282930313233343536
import pandas as pd
import sqlite3
import numpy as np

# Create in-memory SQLite database (no server needed)
conn = sqlite3.connect('sales.db')   # Or ':memory:' for in-memory

# Create tables from DataFrames
products = pd.DataFrame({
    'product_id': [1, 2, 3, 4, 5],
    'name': ['Laptop', 'Phone', 'Monitor', 'Desk', 'Chair'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Furniture', 'Furniture'],
    'price': [1200, 500, 300, 450, 200]
})

customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Carol', 'David'],
    'city': ['NYC', 'LA', 'Chicago', 'Houston']
})

np.random.seed(42)
n = 100
orders = pd.DataFrame({
    'order_id': range(1, n+1),
    'customer_id': np.random.randint(1, 5, n),
    'product_id': np.random.randint(1, 6, n),
    'quantity': np.random.randint(1, 10, n),
    'order_date': pd.date_range('2024-01-01', periods=n, freq='2D').astype(str)
})

# Write to SQLite
products.to_sql('products', conn, if_exists='replace', index=False)
customers.to_sql('customers', conn, if_exists='replace', index=False)
orders.to_sql('orders', conn, if_exists='replace', index=False)
print("Tables created successfully!")

3. Reading SQL Data

python
123456789101112131415161718192021222324252627282930
# Simple query
df = pd.read_sql("SELECT * FROM products", conn)
print(df)

# Filtered query
df_electronics = pd.read_sql("""
    SELECT name, price
    FROM products
    WHERE category = 'Electronics'
    ORDER BY price DESC
""", conn)
print(df_electronics)

# Join query
df_orders = pd.read_sql("""
    SELECT
        o.order_id,
        c.name as customer_name,
        p.name as product_name,
        p.price,
        o.quantity,
        (p.price * o.quantity) as revenue,
        o.order_date
    FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
    JOIN products p ON o.product_id = p.product_id
    ORDER BY o.order_date
""", conn)
print(f"\nOrders with details ({len(df_orders)} rows):")
print(df_orders.head(10))

4. MySQL Integration (with SQLAlchemy)

python
12345678910111213141516171819202122
# MySQL connection (requires: pip install sqlalchemy mysql-connector-python)
from sqlalchemy import create_engine

# Connection string format:
# mysql+mysqlconnector://user:password@host:port/database
engine = create_engine('mysql+mysqlconnector://root:password@localhost:3306/mydb')

# Read from MySQL
df = pd.read_sql("SELECT * FROM employees WHERE department = 'Engineering'", engine)

# Read with params (safe parameterized queries)
dept = 'Engineering'
df = pd.read_sql("SELECT * FROM employees WHERE department = %(dept)s",
                 engine, params={'dept': dept})

# Write DataFrame to MySQL
df_results = pd.DataFrame({
    'report_date': [pd.Timestamp.now().date()],
    'total_revenue': [df_orders['revenue'].sum()],
    'order_count': [len(df_orders)]
})
df_results.to_sql('daily_reports', engine, if_exists='append', index=False)

5. Mini Project: Sales Database Analyzer

python
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import pandas as pd
import sqlite3
import numpy as np

# Using our existing connection
print("=" * 55)
print("SALES DATABASE ANALYSIS")
print("=" * 55)

# 1. Revenue by product
revenue_by_product = pd.read_sql("""
    SELECT p.name as Product, p.category as Category,
           SUM(o.quantity * p.price) as Total_Revenue,
           SUM(o.quantity) as Units_Sold,
           COUNT(o.order_id) as Order_Count
    FROM orders o
    JOIN products p ON o.product_id = p.product_id
    GROUP BY p.product_id, p.name, p.category
    ORDER BY Total_Revenue DESC
""", conn)
print("\n1. Revenue by Product:")
print(revenue_by_product.to_string(index=False))

# 2. Customer analysis
customer_analysis = pd.read_sql("""
    SELECT c.name as Customer, c.city as City,
           COUNT(o.order_id) as Orders,
           SUM(o.quantity * p.price) as Total_Spent
    FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
    JOIN products p ON o.product_id = p.product_id
    GROUP BY c.customer_id, c.name, c.city
    ORDER BY Total_Spent DESC
""", conn)
print("\n2. Customer Spending:")
print(customer_analysis.to_string(index=False))

# 3. Category breakdown (SQL + Pandas)
category_rev = revenue_by_product.groupby('Category').agg(
    Total=('Total_Revenue', 'sum'),
    Products=('Product', 'count')
)
print("\n3. Revenue by Category:")
print(category_rev)

conn.close()

6. Common Mistakes

  • SQL injection risk: Never use f-strings for SQL queries with user input. Always use parameterized queries (%(param)s or ?).
  • if_exists='replace' drops entire table: Use if_exists='append' to add rows, or fail to prevent accidents.

7. MCQs

Question 1

pd.read_sql(query, conn) returns?

Question 2

to_sql(if_exists='append') does?

Question 3

SQLAlchemy is needed for?

Question 4

Parameterized queries prevent?

Question 5

to_sql(index=False) prevents?

Question 6

if_exists='replace' in to_sql?

Question 7

SQLite advantage for development?

Question 8

pd.read_sql_table('products', engine) reads?

Question 9

Connection string for MySQL (SQLAlchemy)?

Q10. Best approach for writing analysis results to DB? a) Manual INSERT c) DataFrame.to_sql() with if_exists='append' c) Print and copy — Answer: b

8. Interview Questions

  • Q: How do you read data from a MySQL database into a Pandas DataFrame?
  • Q: What is the risk of using f-strings to build SQL queries?

9. Summary

Pandas-SQL integration: read_sql() converts query results to DataFrames, to_sql() writes DataFrames to tables. Always use parameterized queries for security. SQLite needs no server — perfect for prototyping. SQLAlchemy engines connect to all production databases (MySQL, PostgreSQL, MSSQL).

10. Next Chapter Recommendation

In Chapter 26: Preparing Data for Machine Learning, we engineer features, encode categoricals, normalize, and split data for scikit-learn models.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·