CHAPTER 14
Intermediate
Recurrent Neural Networks (RNN)
Updated: May 16, 2026
6 min read
# CHAPTER 14
Recurrent Neural Networks (RNN)
1. Introduction
A standard Dense neural network has amnesia. When it looks at Word #2 in a sentence, it has completely forgotten Word #1. This is fine for classifying a single static image, but it is disastrous for processing language, where context and word order dictate the meaning of the sentence. To solve this, researchers created the Recurrent Neural Network (RNN). An RNN has a built-in "memory" loop, allowing it to remember past inputs to understand the present. In this chapter, we will learn how RNNs model time and sequence.2. Learning Objectives
By the end of this chapter, you will be able to:- Define Sequential Data.
- Explain the architecture of a Recurrent Neural Network (RNN).
- Understand how the Hidden State acts as memory.
-
Implement a
SimpleRNNlayer in Keras.
- Identify the limitations of basic RNNs (Vanishing Gradients).
3. What is Sequential Data?
Sequential data is any data where the *order* of the data points matters.- Text: "I am happy" makes sense. "Happy am I" changes the context.
- Time-Series: Stock market prices over a week. The price on Tuesday is heavily dependent on the price on Monday.
- Audio: A spoken sentence is just a sequence of soundwaves over time.
4. How an RNN Works
Imagine reading a book. As you read the current word, your brain holds the context of the previous words to understand the sentence. An RNN does the exact same thing.- 1. At Time Step 1, the RNN reads Word 1 ("The"). It processes it and generates a Hidden State (a memory summary).
- 2. At Time Step 2, the RNN reads Word 2 ("Dog"). *Crucially, it also reads the Hidden State from Time Step 1!* It combines the new word with the old memory to generate a new, updated Hidden State.
- 3. This loop continues until the end of the sentence. By the final word, the Hidden State contains a mathematical summary of the entire sentence's context.
5. Implementing a Simple RNN in Keras
Let's build an RNN for Sentiment Analysis (predicting if a movie review is Positive or Negative).
python
*Notice we did not use Flatten()! The SimpleRNN layer automatically reads the sequence and outputs a flat 1D vector representing the final memory state, which feeds perfectly into the Dense layer.*
6. Multi-Layer RNNs
Just like stacking Dense layers, you can stack RNN layers to extract deeper sequential patterns. However, there is a catch. By default, an RNN only outputs its *final* memory state at the end of the sentence. If you stack a second RNN on top of it, the second RNN has no sequence to read! You must setreturn_sequences=True on the first RNN so it outputs a memory state for *every single word*, creating a sequence for the next layer.
python
7. The Vanishing Gradient Problem
SimpleRNN is brilliant in theory, but terrible in practice for long sequences.
If a paragraph is 100 words long, by the time the SimpleRNN reaches Word 100, the mathematical memory of Word 1 has been multiplied and diluted so many times that it completely vanishes (The Vanishing Gradient Problem).
*Result:* A SimpleRNN can only remember short sentences (about 10-15 words). For long documents, it suffers from severe amnesia.
8. Common Mistakes
-
Forgetting
return_sequences=True: If you try to stack two RNNs without this parameter, TensorFlow will throw a massive shape mismatch error because the second RNN is expecting a 2D sequence, but the first RNN only passed it a 1D flat vector.
- Using RNNs for non-sequential data: If you use an RNN to process tabular data (like predicting house prices based on Square Footage and Beds), it will fail. Tabular features have no logical time-step order.
9. Best Practices
-
Use RNNs for Baselines: While
SimpleRNNis rarely used in modern production (due to the Vanishing Gradient problem), building one is an excellent way to baseline your NLP pipeline before moving to more complex models.
10. Exercises
- 1. What is the "Hidden State" in an RNN, and what biological function does it mimic?
-
2.
If you are building a 3-layer RNN architecture, which layers need the
return_sequences=Trueparameter?
11. MCQ Quiz with Answers
Question 1
What specific capability makes an RNN superior to a Dense network for processing text?
Question 2
Why is a SimpleRNN ineffective at processing very long documents?
12. Interview Questions
-
Q: Explain the flow of data and the updating of the Hidden State through a single
SimpleRNNlayer during one time step.
-
Q: When stacking recurrent layers in Keras, why is the
return_sequences=Trueparameter necessary?