
Serving Keras Models With TensorFlow
Table of Contents
The next step after training a model with TensorFlow is to deploy it. TensorFlow Serving is a great choice for serving TensorFlow models.
Discover how to serve models using TensorFlow serving in this article.
What is TensorFlow Serving?
TensorFlow Serving is a tool for deploying machine learning models in production environments. The tool is not limited to only serving TensorFlow models and provides an API to query the model. It helps you in loading the machine learning models and serving them to the client while managing all the versions.
mlnuggets newsletter
Join the newsletter to receive the technical deep dives in your inbox.
TensorFlow Serving Architecture
TensorFlow Serving is made up of:
- Servables that are the main abstraction in serving models
- Servable versions that enable the usage of different models and configurations
- Models are represented as servables
- Loaders manage the lifecycle of a servable
- Sources find and provide servables
- Aspired versions are versions of servables that should be ready to be loaded
- Managers handle the lifecycle of servables
- Core for managing the lifecycle of the servable and metrics
How to Install TensorFlow Serving
Install TensorFlow Serving via Docker to start using it to deploy models:
docker pull tensorflow/serving
Serving a Keras Image Classification Model With TensorFlow Serving
You have already seen how to build an image classification model with Keras and Convolutional Neural Networks. Let’s, therefore, look at how to deploy a Keras pre-trained model.
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
The first step is to download the model:
model = VGG16(weights='imagenet')
Next, test the model on a sample image to make sure you get the expected results. The process involves:
- Setting the path to the image
- Loading the model using Keras
- Converting the image to an array
- Processing the image using the model’s preprocessing function
- Running predictions
- Getting the predicted labels using the model's decode function
img_path = 'cow.jpeg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [('n02129165', 'lion', 0.9999999), ('n02130308',
# 'cheetah', 7.703386e-08), ('n02128385', 'leopard', 6.330456e-09)]
You are now certain that the model is working as expected. The next step is to save it in readiness for deployment.
model.save('vgg16/1')
The final step is to deploy the model using TensorFlow Serving. Here is the meaning of the parameters in the following deployment command:
name
is the name of the container running the model. Any name will suffice.-p 8501:8501
publishes the container's port8501
to the local machine8501
making it possible to make calls to the model in the container.–mount type=bind,source=/fullpath/vgg16/,target=/models/vgg16
means that the model located at the source will be copied into the container’s/models/vgg16
folder.-e MODEL_NAME=vgg16
tells TensorFlow Serving to load the model called vgg16.-t tensorflow/serving
specifies the image to be used, which is the one you downloaded previously.
Adding & at the end of the command will run the container in the background.
Run the command below on the terminal to serve the model:
docker run -p 8501:8501 --name tfserving_vgg16
--mount type=bind,source=/home/derrick/DataScience/serving/vgg16/,target=/models/vgg16
-e MODEL_NAME=vgg16 -t tensorflow/serving
The command will provision a REST API at http://localhost:8501.
mlnuggets newsletter
Join the newsletter to receive the technical deep dives in your inbox.
Making API Calls to the Served Model
The final step is to make API calls to the deployed model. You can do this by sending the calls to http://localhost:8501/v1/models/vgg16:predict
.
import json
import requests
data = json.dumps({"instances": x.tolist()})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8000/v1/models/vgg16/versions/1:predict',
data=data,
headers=headers)
predictions = json.loads(json_response.text)
print('Predicted:', decode_predictions(np.array(predictions['predictions']), top=3)[0])
Pull the GPU image to serve models with GPUs enabled.
docker pull tensorflow/serving:latest-gpu
mlnuggets newsletter
Join the newsletter to receive the technical deep dives in your inbox.
Final Thoughts
Serving the model with TensorFlow Serving is only the beginning of the deployment journey. Thereafter, you’ll need to test how the model performs on real-world data. You also have to consider the desired model latency and throughput and check if your deployment is meeting all those requirements. Check out ML School if you are interested in diving deeper into model deployment.
mlnuggets Newsletter
Join the newsletter to receive the latest updates in your inbox.