Data Structures in Jupyter
# CHAPTER 8
Data Structures in Jupyter
1. Chapter Introduction
Variables hold a single piece of information, likeage = 25. But what if you need to store the ages of 10,000 customers? You need a Data Structure. Data structures are containers that organize and store data efficiently. This chapter covers Python's four built-in data structures—Lists, Tuples, Sets, and Dictionaries—and how to interact with them in Jupyter.
2. Lists: Ordered and Mutable
A List is a collection of items in a specific order. You can change, add, or remove items after the list is created (Mutable). They are defined using square brackets [].
Cell 1:
3. Tuples: Ordered and Immutable
A Tuple is exactly like a list, except it CANNOT be changed after it is created (Immutable). They are defined using parentheses (). Use tuples for data that should never be altered (like geographic coordinates).
Cell 2:
4. Sets: Unordered and Unique
A Set is an unordered collection with NO duplicate elements. They are defined using curly braces {}. Sets are incredibly fast for checking if an item exists or for removing duplicates from a list.
Cell 3:
5. Dictionaries: Key-Value Pairs
A Dictionary stores data in Key-Value pairs, just like a real dictionary maps a word to its definition. They are defined using curly braces {}, with a colon : separating the key and value. Dictionaries are the most important data structure for interacting with web APIs (JSON).
Cell 4:
6. Nested Data Structures
In real-world data science, structures are often nested inside each other. For example, a List of Dictionaries.
Cell 5:
7. Jupyter Display Magic
In Jupyter, if you put a large dictionary or list on the last line of a cell, Jupyter will format it beautifully (often with syntax highlighting) instead of printing a massive, unreadable block of text.
Cell 6:
8. Common Mistakes
-
Zero-Indexing Confusion: Beginners often try to get the first item in a list using
fruits[1]. In Python, the first item is ALWAYS index0.fruits[1]gets the *second* item.
-
Trying to order Sets: Because Sets are unordered, you cannot use an index on them.
my_set[0]will cause an error.
9. MCQs
Which data structure uses square brackets [] and allows you to change its contents?
How do you access the *first* item in a Python list named data?
Which data structure uses parentheses () and CANNOT be modified after creation?
If you have a list with duplicates and you want to quickly remove them, you can convert the list into a:
Dictionaries store data in what format?
How do you access the value "Alice" in this dictionary: user = {"name": "Alice"}?
What happens if you try to append a new item to a Tuple?
Which index number accesses the *last* item in a list?
Why is it better to just type a large dictionary's variable name on the last line of a Jupyter cell instead of using print()?
10. Interview Questions
- Q: Explain the difference between a List and a Tuple. When would you choose to use a Tuple over a List?
- Q: You have a list of 100,000 employee names, and you need to check if "Alice" is in the list. To optimize for speed, which data structure should you convert the list into before checking?
11. Summary
Python provides four core data structures. Lists[] are ordered and mutable, perfect for sequences of data. Tuples () are ordered and immutable, used for fixed data. Sets {} are unordered and unique, used for fast membership testing and deduplication. Dictionaries {"key": "value"} are mapped structures essential for complex data storage.