Skip to main content
TensorFlow Introduction
CHAPTER 16 Intermediate

Saving, Loading, and Deploying Models

Updated: May 16, 2026
6 min read

# CHAPTER 16

Saving, Loading, and Deploying Models

1. Introduction

A deep learning model is entirely useless if it remains trapped inside a Jupyter Notebook. If you train a model for 48 hours on a cloud GPU to detect cancer in X-rays, you must be able to save that "brain" to a file, download it, and deploy it onto hospital servers or mobile apps. In this chapter, we transition from Model Training to Model Deployment, learning how to serialize Keras models and introduce the technologies used to serve them to the world.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Save a trained Keras model to the hard drive.
  • Understand the SavedModel vs. .keras formats.
  • Load a saved model and make predictions.
  • Understand basic web deployment via REST APIs.
  • Introduce TensorFlow Lite for mobile deployment.

3. Saving a Keras Model

Saving a model in TensorFlow is a single line of code. When you save a model, TensorFlow saves:
  1. 1. The Architecture (what layers you used).
  1. 2. The Weights (the millions of numbers learned during training).
  1. 3. The Optimizer State (so you can pause training today and resume it tomorrow exactly where you left off).
python
12345678910
import tensorflow as tf

# Assume 'model' is a fully trained Keras Sequential model
print("Saving model to disk...")

# Modern Keras format (Recommended for Keras 3 / TF 2.13+)
model.save('my_classifier.keras')

# Legacy SavedModel format (Creates a folder containing protobuf files)
model.save('my_classifier_folder')

4. Loading a Model

Imagine a software engineer is writing the code for the backend web server. They don't need the training data or the compilation code. They just load the finished file.
python
123456789101112131415
import tensorflow as tf
import numpy as np

# Load the entire model back into memory
loaded_model = tf.keras.models.load_model('my_classifier.keras')

# Verify it loaded correctly
loaded_model.summary()

# Create dummy input data (e.g., a single 28x28 grayscale image)
dummy_image = np.zeros((1, 28, 28))

# Make a prediction using the loaded model!
predictions = loaded_model.predict(dummy_image)
print("Prediction Array:", predictions)

5. Checkpointing During Training

What if your computer crashes at Epoch 49 out of 50? You lose hours of training. To prevent this, we use the ModelCheckpoint callback. It automatically saves the model to the hard drive at the end of every epoch.
python
1234567891011
from tensorflow.keras.callbacks import ModelCheckpoint

# Save only the best model (lowest validation loss) to 'best_model.keras'
checkpoint = ModelCheckpoint(
    filepath='best_model.keras',
    monitor='val_loss',
    save_best_only=True
)

# Pass the callback into model.fit
# model.fit(X_train, y_train, epochs=50, validation_split=0.2, callbacks=[checkpoint])

6. Deployment Basics (REST API)

To allow a mobile app or a website to use your TensorFlow model, you usually wrap it in a REST API using a Python web framework like FastAPI or Flask.
  1. 1. The web server loads your .keras file into memory.
  1. 2. A user uploads a photo on the website. The website sends an HTTP POST request containing the image data to your API.
  1. 3. The Python API converts the image into a NumPy array and calls loaded_model.predict().
  1. 4. The API sends the result ("It's a Dog!") back to the website via JSON.

*(Note: For massive enterprise scale, companies use TensorFlow Serving (TFX), a dedicated C++ server designed specifically to handle thousands of predictions per second without Flask).*

7. TensorFlow Lite (Mobile and IoT)

If you want your model to run directly on an iPhone (without needing internet access), you cannot use the massive .keras file. It takes up too much space and battery. TensorFlow Lite (TFLite) is a converter that crushes your model down to a tiny, highly efficient .tflite file.
python
1234567891011
# Convert a standard Keras model to TF Lite
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Apply quantization (crushes the file size by converting 32-bit floats to 8-bit integers)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()

# Save the tiny file to disk
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

*An Android developer can now drag and drop this model.tflite file straight into Android Studio!*

8. Common Mistakes

  • Saving only Weights: You can call model.save_weights('weights.h5'). However, this *only* saves the numbers. If you try to load them later, you must manually rewrite the exact Sequential layer architecture first. Always use model.save() to save the entire package.
  • Preprocessing Mismatch: If your training script scales images by dividing by 255.0, your Flask web API MUST also divide incoming user images by 255.0 before calling .predict(). If you forget, the model will output garbage predictions.

9. Best Practices

  • Use .keras format: In older tutorials, you will see models saved as .h5. This format is deprecated. Always use the modern .keras extension or the default SavedModel folder format.

10. Exercises

  1. 1. Write the code to load a saved model named sentiment_v1.keras into a variable named production_model.
  1. 2. Why is TensorFlow Lite necessary for mobile deployment?

11. MCQ Quiz with Answers

Question 1

What does model.save('my_model.keras') actually save to the hard drive?

Question 2

Which TensorFlow Callback is used to save the model automatically during training, ensuring you don't lose progress if your computer crashes?

12. Interview Questions

  • Q: Explain the difference between deploying a model via a REST API (like FastAPI) versus deploying it using TensorFlow Lite on an edge device.
  • Q: If you receive a raw user image via a web API, what crucial steps must occur before you can pass it to your loaded Keras model for prediction?

13. FAQs

Q: What is TensorFlow.js? A: Similar to TF Lite, TensorFlow.js allows you to convert your Keras model and run it directly in a user's web browser using JavaScript. This uses the user's local hardware to make predictions, saving you from paying for expensive cloud servers!

14. Summary

Saving and loading models is the bridge between the Data Scientist and the Software Engineer. By utilizing Keras's built-in saving functions, utilizing ModelCheckpoint to secure our training sessions, and understanding deployment pathways like REST APIs and TF Lite, we ensure our AI models can actually reach the end-user.

15. Next Chapter Recommendation

When working with 100GB datasets, your RAM will crash if you try to load it all using Pandas or standard loops. In Chapter 17: TensorFlow Data Pipelines, we will learn the industrial-grade tf.data API to stream data into the GPU with maximum efficiency.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·