Skip to main content
Python for Data Science
CHAPTER 26 Beginner

Working with APIs and Web Data

Updated: May 18, 2026
5 min read

# CHAPTER 26

Working with APIs and Web Data

1. Chapter Introduction

In previous chapters, data was magically handed to you in clean .csv files. In the real world, Data Scientists have to fetch data themselves from live web servers. If you want live stock prices, weather updates, or Twitter data, you must connect to an API (Application Programming Interface). This chapter teaches you how to request data from the web and parse it into Pandas DataFrames.

2. What is a REST API?

An API is a bridge that allows two applications to talk to each other. A REST API works over the internet using standard web protocols. Instead of a human typing a URL into a browser to see a webpage, a Python script sends a GET Request to an API URL to receive raw JSON data.

3. The requests Library

The requests library is the industry standard for making HTTP calls in Python. *(Install via !pip install requests if not present)*

python
123456789101112131415161718
import requests
import json

# 1. Define the API endpoint URL (This is a public, free API)
url = "https://api.github.com/users/github"

# 2. Send a GET request to the URL
response = requests.get(url)

# 3. Check the Status Code (200 = OK, 404 = Not Found, 500 = Server Error)
print(f"Status Code: {response.status_code}")

# 4. Extract the JSON data from the response
data = response.json()

# 5. The data is now a standard Python Dictionary!
print(f"Name: {data['name']}")
print(f"Followers: {data['followers']}")

4. Passing Parameters to APIs

APIs often require you to pass parameters to filter the data (like searching for a specific city's weather). You do this using the params argument.

python
123456789101112131415
# A free API for searching Universities
url = "http://universities.hipolabs.com/search"

# We want to search for universities in the United Kingdom
# The API documentation tells us the parameter name is 'country'
parameters = {"country": "United Kingdom"}

# Send the request with the parameters attached
response = requests.get(url, params=parameters)

# The result is a List of Dictionaries
uk_unis = response.json()

print(f"Found {len(uk_unis)} universities in the UK.")
print(f"First result: {uk_unis[0]['name']}")

5. Converting API Data to Pandas

APIs return JSON (Lists of Dictionaries). Pandas is perfectly designed to convert Lists of Dictionaries into DataFrames.

python
1234567
import pandas as pd

# Convert the JSON data from the previous step directly into a DataFrame
df = pd.DataFrame(uk_unis)

print(df.head())
# You can now use all your Pandas skills to analyze the web data!

6. Mini Project: Weather API Analyzer

Let's build a script that fetches the current space station (ISS) location and converts it to a DataFrame.

python
123456789101112131415161718192021222324252627282930313233
import requests
import pandas as pd
from datetime import datetime

# Public API for the International Space Station
url = "http://api.open-notify.org/iss-now.json"

try:
    response = requests.get(url)
    # Check if request was successful
    response.raise_for_status() 
    
    data = response.json()
    
    # Extract specific nested data
    timestamp = datetime.fromtimestamp(data['timestamp'])
    latitude = float(data['iss_position']['latitude'])
    longitude = float(data['iss_position']['longitude'])
    
    # Format into a Dictionary, then a DataFrame
    iss_data = {
        "Time": [timestamp],
        "Latitude": [latitude],
        "Longitude": [longitude]
    }
    
    df = pd.DataFrame(iss_data)
    
    print("--- LIVE ISS TRACKER ---")
    print(df)
    
except requests.exceptions.RequestException as e:
    print(f"Error fetching data: {e}")

7. API Keys and Authentication

Most valuable APIs (like OpenAI, Twitter, or Stock Markets) are not free. They require an API Key (a secret password) to track your usage.

NEVER hardcode your API key in your script like this: api_key = "12345ABC" If you push that script to GitHub, hackers will steal your key.

Best Practice: Store the key in a separate file called .env, and use the os module or python-dotenv library to load it securely.

8. Common Mistakes

  • Ignoring Status Codes: Assuming every requests.get() succeeds. If the server is down, response.json() will crash. Always check if response.status_code == 200: before processing the data.
  • DDoS-ing an API: If you put a requests.get() inside a for loop that runs 10,000 times a second, the API will ban your IP address. Always use time.sleep(1) to pause between requests.

9. MCQs

Question 1

What does API stand for?

Question 2

What is the industry-standard Python library for fetching data from web APIs?

Question 3

Which HTTP method is used to retrieve data from a server?

Question 4

What HTTP status code indicates a successful request?

Question 5

In what format do most modern APIs return data?

Question 6

What does response.json() do in the requests library?

Question 7

How do you pass search filters to a GET request?

Question 8

Why is it dangerous to hardcode an API key directly into your .py or .ipynb file?

Question 9

If an API returns a List of Dictionaries, what Pandas function easily converts it into a tabular format?

Question 10

What Python module should you use to pause your script to prevent hammering an API with too many requests?

10. Interview Questions

  • Q: Walk me through the Python code required to hit a REST API, check for errors, and convert the JSON response into a Pandas DataFrame.
  • Q: How do you securely manage secret API keys in a data science project?

11. Summary

Data Science requires data. The requests library allows Python to communicate with web servers via GET requests. Always check that response.status_code == 200 to ensure success. Extract the data using .json(), and easily convert the resulting list of dictionaries into a Pandas DataFrame for analysis. Remember to never hardcode API keys and respect server rate limits using time.sleep().

12. Next Chapter Recommendation

In Chapter 27: Real-World Data Science Projects, we will discuss five distinct architectures for end-to-end data science portfolio projects that combine APIs, Pandas, Seaborn, and Machine Learning.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·