CHAPTER 16 Intermediate

Saving, Loading, and Deploying Models

Updated: May 16, 2026

6 min read

# CHAPTER 16

Saving, Loading, and Deploying Models

1. Introduction

A deep learning model is entirely useless if it remains trapped inside a Jupyter Notebook. If you train a model for 48 hours on a cloud GPU to detect cancer in X-rays, you must be able to save that "brain" to a file, download it, and deploy it onto hospital servers or mobile apps. In this chapter, we transition from Model Training to Model Deployment, learning how to serialize Keras models and introduce the technologies used to serve them to the world.

2. Learning Objectives

By the end of this chapter, you will be able to:

Save a trained Keras model to the hard drive.

Understand the SavedModel vs. .keras formats.

Load a saved model and make predictions.

Understand basic web deployment via REST APIs.

Introduce TensorFlow Lite for mobile deployment.

3. Saving a Keras Model

Saving a model in TensorFlow is a single line of code. When you save a model, TensorFlow saves:

1. The Architecture (what layers you used).

2. The Weights (the millions of numbers learned during training).

3. The Optimizer State (so you can pause training today and resume it tomorrow exactly where you left off).

python

12345678910

import tensorflow as tf

# Assume 'model' is a fully trained Keras Sequential model
print("Saving model to disk...")

# Modern Keras format (Recommended for Keras 3 / TF 2.13+)
model.save(&#039;my_classifier.keras')

# Legacy SavedModel format (Creates a folder containing protobuf files)
model.save(&#039;my_classifier_folder')

4. Loading a Model

Imagine a software engineer is writing the code for the backend web server. They don't need the training data or the compilation code. They just load the finished file.

python

123456789101112131415

import tensorflow as tf
import numpy as np

# Load the entire model back into memory
loaded_model = tf.keras.models.load_model(&#039;my_classifier.keras')

# Verify it loaded correctly
loaded_model.summary()

# Create dummy input data (e.g., a single 28x28 grayscale image)
dummy_image = np.zeros((1, 28, 28))

# Make a prediction using the loaded model!
predictions = loaded_model.predict(dummy_image)
print("Prediction Array:", predictions)

5. Checkpointing During Training

What if your computer crashes at Epoch 49 out of 50? You lose hours of training. To prevent this, we use the ModelCheckpoint callback. It automatically saves the model to the hard drive at the end of every epoch.

python

1234567891011

from tensorflow.keras.callbacks import ModelCheckpoint

# Save only the best model (lowest validation loss) to 'best_model.keras'
checkpoint = ModelCheckpoint(
    filepath=&#039;best_model.keras',
    monitor=&#039;val_loss',
    save_best_only=True
)

# Pass the callback into model.fit
# model.fit(X_train, y_train, epochs=50, validation_split=0.2, callbacks=[checkpoint])

6. Deployment Basics (REST API)

To allow a mobile app or a website to use your TensorFlow model, you usually wrap it in a REST API using a Python web framework like FastAPI or Flask.

1. The web server loads your .keras file into memory.

2. A user uploads a photo on the website. The website sends an HTTP POST request containing the image data to your API.

3. The Python API converts the image into a NumPy array and calls loaded_model.predict().

4. The API sends the result ("It's a Dog!") back to the website via JSON.

*(Note: For massive enterprise scale, companies use TensorFlow Serving (TFX), a dedicated C++ server designed specifically to handle thousands of predictions per second without Flask).*

7. TensorFlow Lite (Mobile and IoT)

If you want your model to run directly on an iPhone (without needing internet access), you cannot use the massive .keras file. It takes up too much space and battery. TensorFlow Lite (TFLite) is a converter that crushes your model down to a tiny, highly efficient .tflite file.

python

1234567891011

# Convert a standard Keras model to TF Lite
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Apply quantization (crushes the file size by converting 32-bit floats to 8-bit integers)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()

# Save the tiny file to disk
with open(&#039;model.tflite', 'wb') as f:
    f.write(tflite_model)

*An Android developer can now drag and drop this model.tflite file straight into Android Studio!*

8. Common Mistakes

Saving only Weights: You can call model.save_weights('weights.h5'). However, this *only* saves the numbers. If you try to load them later, you must manually rewrite the exact Sequential layer architecture first. Always use model.save() to save the entire package.

Preprocessing Mismatch: If your training script scales images by dividing by 255.0, your Flask web API MUST also divide incoming user images by 255.0 before calling .predict(). If you forget, the model will output garbage predictions.

9. Best Practices

Use .keras format: In older tutorials, you will see models saved as .h5. This format is deprecated. Always use the modern .keras extension or the default SavedModel folder format.

10. Exercises

1. Write the code to load a saved model named sentiment_v1.keras into a variable named production_model.

2. Why is TensorFlow Lite necessary for mobile deployment?

11. MCQ Quiz with Answers

Question 1

What does `model.save('my_model.keras')` actually save to the hard drive?

Question 2

Which TensorFlow Callback is used to save the model automatically during training, ensuring you don't lose progress if your computer crashes?

12. Interview Questions

Q: Explain the difference between deploying a model via a REST API (like FastAPI) versus deploying it using TensorFlow Lite on an edge device.

Q: If you receive a raw user image via a web API, what crucial steps must occur before you can pass it to your loaded Keras model for prediction?

13. FAQs

Q: What is TensorFlow.js? A: Similar to TF Lite, TensorFlow.js allows you to convert your Keras model and run it directly in a user's web browser using JavaScript. This uses the user's local hardware to make predictions, saving you from paying for expensive cloud servers!

14. Summary

Saving and loading models is the bridge between the Data Scientist and the Software Engineer. By utilizing Keras's built-in saving functions, utilizing ModelCheckpoint to secure our training sessions, and understanding deployment pathways like REST APIs and TF Lite, we ensure our AI models can actually reach the end-user.

15. Next Chapter Recommendation

When working with 100GB datasets, your RAM will crash if you try to load it all using Pandas or standard loops. In Chapter 17: TensorFlow Data Pipelines, we will learn the industrial-grade tf.data API to stream data into the GPU with maximum efficiency.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Saving, Loading, and Deploying Models #

1. Introduction #

2. Learning Objectives #

3. Saving a Keras Model #

4. Loading a Model #

5. Checkpointing During Training #

6. Deployment Basics (REST API) #

7. TensorFlow Lite (Mobile and IoT) #

8. Common Mistakes #

9. Best Practices #

10. Exercises #

11. MCQ Quiz with Answers #

What does model.save('my_model.keras') actually save to the hard drive?

Which TensorFlow Callback is used to save the model automatically during training, ensuring you don't lose progress if your computer crashes?

12. Interview Questions #

13. FAQs #

14. Summary #

15. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 4

Send Feedback / Bug

Feedback Submitted!

Saving, Loading, and Deploying Models

1. Introduction

2. Learning Objectives

3. Saving a Keras Model

4. Loading a Model

5. Checkpointing During Training

6. Deployment Basics (REST API)

7. TensorFlow Lite (Mobile and IoT)

8. Common Mistakes

9. Best Practices

10. Exercises

11. MCQ Quiz with Answers

What does `model.save('my_model.keras')` actually save to the hard drive?

12. Interview Questions

13. FAQs

14. Summary

15. Next Chapter Recommendation