CHAPTER 14 Intermediate

Recurrent Neural Networks (RNN)

Updated: May 16, 2026

6 min read

# CHAPTER 14

Recurrent Neural Networks (RNN)

1. Introduction

A standard Dense neural network has amnesia. When it looks at Word #2 in a sentence, it has completely forgotten Word #1. This is fine for classifying a single static image, but it is disastrous for processing language, where context and word order dictate the meaning of the sentence. To solve this, researchers created the Recurrent Neural Network (RNN). An RNN has a built-in "memory" loop, allowing it to remember past inputs to understand the present. In this chapter, we will learn how RNNs model time and sequence.

2. Learning Objectives

By the end of this chapter, you will be able to:

Define Sequential Data.

Explain the architecture of a Recurrent Neural Network (RNN).

Understand how the Hidden State acts as memory.

Implement a SimpleRNN layer in Keras.

Identify the limitations of basic RNNs (Vanishing Gradients).

3. What is Sequential Data?

Sequential data is any data where the *order* of the data points matters.

Text: "I am happy" makes sense. "Happy am I" changes the context.

Time-Series: Stock market prices over a week. The price on Tuesday is heavily dependent on the price on Monday.

Audio: A spoken sentence is just a sequence of soundwaves over time.

Standard CNNs and Dense networks cannot handle sequential data effectively.

4. How an RNN Works

Imagine reading a book. As you read the current word, your brain holds the context of the previous words to understand the sentence. An RNN does the exact same thing.

1. At Time Step 1, the RNN reads Word 1 ("The"). It processes it and generates a Hidden State (a memory summary).

2. At Time Step 2, the RNN reads Word 2 ("Dog"). *Crucially, it also reads the Hidden State from Time Step 1!* It combines the new word with the old memory to generate a new, updated Hidden State.

3. This loop continues until the end of the sentence. By the final word, the Hidden State contains a mathematical summary of the entire sentence's context.

5. Implementing a Simple RNN in Keras

Let's build an RNN for Sentiment Analysis (predicting if a movie review is Positive or Negative).

python

1234567891011121314151617181920212223

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

vocab_size = 10000
embed_dim = 32
max_length = 100

model = Sequential([
    # 1. Word Embeddings (Covered in Chapter 13)
    Embedding(input_dim=vocab_size, output_dim=embed_dim, input_length=max_length),
    
    # 2. The Recurrent Layer!
    # It reads the sequence word by word, updating its memory state.
    SimpleRNN(64),
    
    # 3. Dense Output Layer
    # Based on the final memory state, is the review positive (1) or negative (0)?
    Dense(1, activation=&#039;sigmoid')
])

model.compile(optimizer=&#039;adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

*Notice we did not use Flatten()! The SimpleRNN layer automatically reads the sequence and outputs a flat 1D vector representing the final memory state, which feeds perfectly into the Dense layer.*

6. Multi-Layer RNNs

Just like stacking Dense layers, you can stack RNN layers to extract deeper sequential patterns. However, there is a catch. By default, an RNN only outputs its *final* memory state at the end of the sentence. If you stack a second RNN on top of it, the second RNN has no sequence to read! You must set return_sequences=True on the first RNN so it outputs a memory state for *every single word*, creating a sequence for the next layer.

python

1234567891011

model = Sequential([
    Embedding(vocab_size, embed_dim, input_length=max_length),
    
    # RNN 1: Returns a sequence of memory states
    SimpleRNN(64, return_sequences=True),
    
    # RNN 2: Reads the sequence from RNN 1, outputs only the final state
    SimpleRNN(32),
    
    Dense(1, activation=&#039;sigmoid')
])

7. The Vanishing Gradient Problem

SimpleRNN is brilliant in theory, but terrible in practice for long sequences. If a paragraph is 100 words long, by the time the SimpleRNN reaches Word 100, the mathematical memory of Word 1 has been multiplied and diluted so many times that it completely vanishes (The Vanishing Gradient Problem). *Result:* A SimpleRNN can only remember short sentences (about 10-15 words). For long documents, it suffers from severe amnesia.

8. Common Mistakes

Forgetting return_sequences=True: If you try to stack two RNNs without this parameter, TensorFlow will throw a massive shape mismatch error because the second RNN is expecting a 2D sequence, but the first RNN only passed it a 1D flat vector.

Using RNNs for non-sequential data: If you use an RNN to process tabular data (like predicting house prices based on Square Footage and Beds), it will fail. Tabular features have no logical time-step order.

9. Best Practices

Use RNNs for Baselines: While SimpleRNN is rarely used in modern production (due to the Vanishing Gradient problem), building one is an excellent way to baseline your NLP pipeline before moving to more complex models.

10. Exercises

1. What is the "Hidden State" in an RNN, and what biological function does it mimic?

2. If you are building a 3-layer RNN architecture, which layers need the return_sequences=True parameter?

11. MCQ Quiz with Answers

Question 1

What specific capability makes an RNN superior to a Dense network for processing text?

Question 2

Why is a `SimpleRNN` ineffective at processing very long documents?

12. Interview Questions

Q: Explain the flow of data and the updating of the Hidden State through a single SimpleRNN layer during one time step.

Q: When stacking recurrent layers in Keras, why is the return_sequences=True parameter necessary?

13. FAQs

Q: Can RNNs predict the future? A: Yes! By feeding an RNN historical stock prices (e.g., Days 1-10), you can train it to predict the sequence of Day 11. This is called Time-Series Forecasting.

14. Summary

Recurrent Neural Networks introduced the concept of memory into Deep Learning. By utilizing a hidden state that loops and updates with every new time step, RNNs can process sequential data like language and time-series. However, their struggle to retain long-term memory requires a more advanced solution.

15. Next Chapter Recommendation

How do we fix the Vanishing Gradient problem? How do we build a model that can remember the first word of a 500-word essay? In Chapter 15: LSTM and Sequence Models, we will introduce the heavy machinery of NLP: The Long Short-Term Memory network.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Recurrent Neural Networks (RNN) #

1. Introduction #

2. Learning Objectives #

3. What is Sequential Data? #

4. How an RNN Works #

5. Implementing a Simple RNN in Keras #

6. Multi-Layer RNNs #

7. The Vanishing Gradient Problem #

8. Common Mistakes #

9. Best Practices #

10. Exercises #

11. MCQ Quiz with Answers #

What specific capability makes an RNN superior to a Dense network for processing text?

Why is a SimpleRNN ineffective at processing very long documents?

12. Interview Questions #

13. FAQs #

14. Summary #

15. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 4

Send Feedback / Bug

Feedback Submitted!

Recurrent Neural Networks (RNN)

1. Introduction

2. Learning Objectives

3. What is Sequential Data?

4. How an RNN Works

5. Implementing a Simple RNN in Keras

6. Multi-Layer RNNs

7. The Vanishing Gradient Problem

8. Common Mistakes

9. Best Practices

10. Exercises

11. MCQ Quiz with Answers

Why is a `SimpleRNN` ineffective at processing very long documents?

12. Interview Questions

13. FAQs

14. Summary

15. Next Chapter Recommendation