Saving, Loading, and Deploying Models
# CHAPTER 16
Saving, Loading, and Deploying Models
1. Introduction
A deep learning model is entirely useless if it remains trapped inside a Jupyter Notebook. If you train a model for 48 hours on a cloud GPU to detect cancer in X-rays, you must be able to save that "brain" to a file, download it, and deploy it onto hospital servers or mobile apps. In this chapter, we transition from Model Training to Model Deployment, learning how to serialize Keras models and introduce the technologies used to serve them to the world.2. Learning Objectives
By the end of this chapter, you will be able to:- Save a trained Keras model to the hard drive.
-
Understand the
SavedModelvs..kerasformats.
- Load a saved model and make predictions.
- Understand basic web deployment via REST APIs.
- Introduce TensorFlow Lite for mobile deployment.
3. Saving a Keras Model
Saving a model in TensorFlow is a single line of code. When you save a model, TensorFlow saves:- 1. The Architecture (what layers you used).
- 2. The Weights (the millions of numbers learned during training).
- 3. The Optimizer State (so you can pause training today and resume it tomorrow exactly where you left off).
4. Loading a Model
Imagine a software engineer is writing the code for the backend web server. They don't need the training data or the compilation code. They just load the finished file.5. Checkpointing During Training
What if your computer crashes at Epoch 49 out of 50? You lose hours of training. To prevent this, we use theModelCheckpoint callback. It automatically saves the model to the hard drive at the end of every epoch.
6. Deployment Basics (REST API)
To allow a mobile app or a website to use your TensorFlow model, you usually wrap it in a REST API using a Python web framework like FastAPI or Flask.-
1.
The web server loads your
.kerasfile into memory.
-
2.
A user uploads a photo on the website. The website sends an HTTP
POSTrequest containing the image data to your API.
-
3.
The Python API converts the image into a NumPy array and calls
loaded_model.predict().
- 4. The API sends the result ("It's a Dog!") back to the website via JSON.
*(Note: For massive enterprise scale, companies use TensorFlow Serving (TFX), a dedicated C++ server designed specifically to handle thousands of predictions per second without Flask).*
7. TensorFlow Lite (Mobile and IoT)
If you want your model to run directly on an iPhone (without needing internet access), you cannot use the massive.keras file. It takes up too much space and battery.
TensorFlow Lite (TFLite) is a converter that crushes your model down to a tiny, highly efficient .tflite file.
*An Android developer can now drag and drop this model.tflite file straight into Android Studio!*
8. Common Mistakes
-
Saving only Weights: You can call
model.save_weights('weights.h5'). However, this *only* saves the numbers. If you try to load them later, you must manually rewrite the exactSequentiallayer architecture first. Always usemodel.save()to save the entire package.
-
Preprocessing Mismatch: If your training script scales images by dividing by 255.0, your Flask web API MUST also divide incoming user images by 255.0 before calling
.predict(). If you forget, the model will output garbage predictions.
9. Best Practices
-
Use
.kerasformat: In older tutorials, you will see models saved as.h5. This format is deprecated. Always use the modern.kerasextension or the default SavedModel folder format.
10. Exercises
-
1.
Write the code to load a saved model named
sentiment_v1.kerasinto a variable namedproduction_model.
- 2. Why is TensorFlow Lite necessary for mobile deployment?
11. MCQ Quiz with Answers
What does model.save('my_model.keras') actually save to the hard drive?
Which TensorFlow Callback is used to save the model automatically during training, ensuring you don't lose progress if your computer crashes?
12. Interview Questions
- Q: Explain the difference between deploying a model via a REST API (like FastAPI) versus deploying it using TensorFlow Lite on an edge device.
- Q: If you receive a raw user image via a web API, what crucial steps must occur before you can pass it to your loaded Keras model for prediction?
13. FAQs
Q: What is TensorFlow.js? A: Similar to TF Lite, TensorFlow.js allows you to convert your Keras model and run it directly in a user's web browser using JavaScript. This uses the user's local hardware to make predictions, saving you from paying for expensive cloud servers!14. Summary
Saving and loading models is the bridge between the Data Scientist and the Software Engineer. By utilizing Keras's built-in saving functions, utilizingModelCheckpoint to secure our training sessions, and understanding deployment pathways like REST APIs and TF Lite, we ensure our AI models can actually reach the end-user.
15. Next Chapter Recommendation
When working with 100GB datasets, your RAM will crash if you try to load it all using Pandas or standard loops. In Chapter 17: TensorFlow Data Pipelines, we will learn the industrial-gradetf.data API to stream data into the GPU with maximum efficiency.