Skip to main content
System Design – Complete Beginner to Advanced Guide
CHAPTER 04 Intermediate

Databases in System Design

Updated: May 16, 2026
30 min read

# CHAPTER 4

Databases in System Design

1. Introduction

Web servers are transient; they can crash, reboot, and be replaced in seconds without a user noticing. Data, however, is sacred. If a user uploads a photo, pays for a subscription, or saves a password, that data must survive server crashes, power outages, and data center fires. The Database is the beating heart of any software architecture. However, as traffic scales from 10,000 rows of data to 10 Billion rows, traditional database setups will violently collapse. In this chapter, we will master Databases in System Design. We will navigate the great divide between SQL (Relational) and NoSQL (Non-Relational) databases, understand the mechanics of Replication, and learn how to permanently secure the memory of our applications.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Compare and contrast SQL (Relational) and NoSQL (Non-Relational) databases.
  • Define Database Normalization and strict schemas.
  • Understand the CAP Theorem implications for choosing a database.
  • Engineer Database Replication (Master-Slave) for high availability and read-scaling.
  • Architect Database Sharding (Partitioning) for massive horizontal scale.

3. SQL (Relational Databases)

SQL (Structured Query Language) databases have dominated the industry for 40 years.
  • Examples: MySQL, PostgreSQL, Oracle.
  • The Structure: Data is stored in strict, highly structured Tables (Rows and Columns), exactly like Excel spreadsheets.
  • Relationships: Tables are linked together using "Foreign Keys" (e.g., A User table is mathematically linked to an Orders table).
  • ACID Compliance: SQL databases guarantee absolute data integrity. If a bank transfer fails halfway through, the database guarantees the entire transaction is rolled back. There is zero chance of money disappearing.
  • The Weakness: Because of strict relational mathematics, SQL databases are incredibly difficult to scale horizontally across multiple servers. They prefer to be scaled Vertically (one massive server).

4. NoSQL (Non-Relational Databases)

NoSQL emerged to handle the massive, unstructured data explosion of Web 2.0.
  • Examples: MongoDB, Cassandra, DynamoDB.
  • The Structure: Data is not stored in rigid tables. It is often stored as flexible JSON-like documents. A single document can contain a user's profile, their friends list, and their settings all in one nested blob.
  • The Strength (Flexibility): You do not need to define a strict schema beforehand. You can add new fields on the fly.
  • The Superpower (Scale): NoSQL databases were built from day one to scale horizontally across thousands of cheap servers. If you need to store billions of tweets, NoSQL is the answer.

5. Choosing Between SQL and NoSQL

The most common system design interview question.
  • Choose SQL when: You are building a financial system, a billing engine, or an inventory tracker. You need strict data integrity, complex queries, and absolute consistency.
  • Choose NoSQL when: You are building a social media feed, IoT sensor logs, or an analytics engine. You have massive amounts of rapidly changing, unstructured data, and you prioritize speed and horizontal scalability over strict relational consistency.

6. Database Replication (Master-Slave)

How do you scale an SQL database to handle massive read traffic?
  • The Bottleneck: If 100,000 users are trying to read a blog post from a single database simultaneously, it will crash.
  • The Solution: Replication.
  • The Architecture:
  • Master DB: You have one primary database. It is the ONLY database allowed to perform Writes (Inserts, Updates, Deletes).
  • Slave (Replica) DBs: You create 5 identical copies of the Master. They constantly sync data from the Master. They are ONLY allowed to perform Reads.
  • The Result: When 100,000 users request the blog post, a load balancer splits those "Read" requests evenly across the 5 Replicas.

7. Database Sharding (Partitioning)

What happens when your database physically runs out of hard drive space?
  • The Concept: Sharding is horizontally scaling a database. You take one massive database of 100 million users and split it into 5 smaller databases of 20 million users each, hosted on 5 different physical servers.
  • The Routing Logic (The Shard Key): You must define a rule. For example: "Users with IDs 1-20M go to Server A. Users with IDs 20M-40M go to Server B."
  • *Warning:* Sharding introduces immense engineering complexity. Querying data across multiple shards (e.g., finding the top 10 highest-paying users globally) is extremely slow and difficult.

8. Diagrams/Visual Suggestions

*Architecture Diagram: Master-Slave Replication*
text
12345678
[ Application Servers ]
       |   (Write Requests)
       v
[ MASTER Database ] --(Data Sync)--> [ REPLICA 1 ] (Read-only)
                    --(Data Sync)--> [ REPLICA 2 ] (Read-only)
                    --(Data Sync)--> [ REPLICA 3 ] (Read-only)

[ Application Servers ] --(Read Requests)--> [ REPLICAS ]

9. Best Practices

  • Polyglot Persistence: Modern complex systems do not use just one database type. A massive architecture (like Amazon) will use SQL (PostgreSQL) for billing and user accounts, NoSQL (MongoDB) for the product catalog, and an In-Memory store (Redis) for shopping carts. Use the right tool for the specific job.

10. Common Mistakes

  • Sharding Too Early: Sharding is the absolute "last resort" of database scaling. Once you shard a database, it is nearly impossible to un-shard it. *The Fix:* Exhaust all other options first. Upgrade the hardware (Vertical scale), add Read Replicas, optimize your SQL queries, and implement aggressive Caching before you ever consider sharding.

11. Mini Project: Architect an E-Commerce DB

Let's design the data layer for a store.
  1. 1. The Users & Orders (SQL): We need strict integrity so nobody gets charged twice. We choose PostgreSQL. We set up 1 Master DB for writing orders, and 3 Replicas for users checking their order history.
  1. 2. The Product Catalog (NoSQL): Products have wildly different attributes (a TV has 'screen size'; a shirt has 'fabric type'). Strict SQL tables would be messy. We choose MongoDB to store flexible JSON documents for fast catalog searching.
  1. 3. *Result:* A resilient, purpose-built, polyglot data architecture.

12. Practice Exercises

  1. 1. Compare SQL and NoSQL databases. Provide one real-world scenario where SQL is mandatory, and one scenario where NoSQL is heavily preferred.
  1. 2. Explain the architecture of "Master-Slave Replication." How does this specific pattern solve the bottleneck of massive "Read" traffic?

13. MCQs with Answers

Question 1

When architecting a financial banking application where data integrity is absolute and transactions must possess ACID compliance (guaranteeing money cannot be lost during a server failure), which database architecture is strictly required?

Question 2

An architecture has one massive database that has reached its maximum physical hard drive capacity (10 Terabytes). The engineering team decides to split the database horizontally, moving Users A-M to Server 1, and Users N-Z to Server 2. What is this advanced scaling technique called?

14. Interview Questions

  • Q: A system is experiencing immense lag because 50,000 users are simultaneously trying to view a live leaderboard. Walk me through how you would implement Database Replication to resolve this read-heavy bottleneck.
  • Q: What is Database Sharding? Explain the concept of a "Shard Key" and discuss the immense operational complexities introduced when you shard a relational database.
  • Q: You are building the architecture for an Uber-like ride-sharing app. Would you use SQL or NoSQL to store the constantly updating GPS coordinates of the drivers? Defend your choice.

15. Summary

In Chapter 4, we secured the foundation of permanent memory. We navigated the great architectural debate between the strict, mathematically reliable structure of Relational SQL and the flexible, infinitely scalable power of Document NoSQL. We learned that massive traffic requires specific architectural patterns: utilizing Master-Slave Replication to distribute crushing read loads, and deploying Sharding as the ultimate weapon to conquer hard drive ceilings. By understanding the strengths and weaknesses of these storage engines, we can architect polyglot systems that ensure data is never lost and always accessible.

16. Next Chapter Recommendation

Reading from a database is physically slow. To achieve lightning-fast response times, we must build an intermediate layer of memory. Proceed to Chapter 5: Caching Strategies.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·