Serving Keras Models With TensorFlow

Derrick Mwiti
Derrick Mwiti
4 min read

Table of Contents

The next step after training a model with TensorFlow is to deploy it. TensorFlow Serving is a great choice for serving TensorFlow models.

Discover how to serve models using TensorFlow serving in this article.

What is TensorFlow Serving?

TensorFlow Serving is a tool for deploying machine learning models in production environments. The tool is not limited to only serving TensorFlow models and provides an API to query the model. It helps you in loading the machine learning models and serving them to the client while managing all the versions.

TensorFlow Serving Architecture

TensorFlow Serving is made up of:

  • Servables that are the main abstraction in serving models
  • Servable versions that enable the usage of different models and configurations
  • Models are represented as servables
  • Loaders manage the lifecycle of a servable
  • Sources find and provide servables
  • Aspired versions are versions of servables that should be ready to be loaded
  • Managers handle the lifecycle of servables
  • Core for managing the lifecycle of the servable and metrics

How to Install TensorFlow Serving

Install TensorFlow Serving via Docker to start using it to deploy models:

docker pull tensorflow/serving

Serving a Keras Image Classification Model With TensorFlow Serving

You have already seen how to build an image classification model with Keras and Convolutional Neural Networks. Let’s, therefore, look at how to deploy a Keras pre-trained model.

from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np

The first step is to download the model:

model = VGG16(weights='imagenet')

Next, test the model on a sample image to make sure you get the expected results. The process involves:

  • Setting the path to the image
  • Loading the model using Keras
  • Converting the image to an array
  • Processing the image using the model’s preprocessing function
  • Running predictions
  • Getting the predicted labels using the model's decode function
img_path = 'cow.jpeg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [('n02129165', 'lion', 0.9999999), ('n02130308',
# 'cheetah', 7.703386e-08), ('n02128385', 'leopard', 6.330456e-09)]

You are now certain that the model is working as expected. The next step is to save it in readiness for deployment.'vgg16/1')

The final step is to deploy the model using TensorFlow Serving. Here is the meaning of the parameters in the following deployment command:

  • name is the name of the container running the model. Any name will suffice.
  • -p 8501:8501 publishes the container's port 8501 to the local machine 8501 making it possible to make calls to the model in the container.
  • –mount type=bind,source=/fullpath/vgg16/,target=/models/vgg16 means that the model located at the source will be copied into the container’s /models/vgg16 folder.
  • -e MODEL_NAME=vgg16 tells TensorFlow Serving to load the model called vgg16.
  • -t tensorflow/serving specifies the image to be used, which is the one you downloaded previously.

Adding & at the end of the command will run the container in the background.

Run the command below on the terminal to serve the model:

docker run -p 8501:8501 --name tfserving_vgg16
--mount type=bind,source=/home/derrick/DataScience/serving/vgg16/,target=/models/vgg16
-e MODEL_NAME=vgg16 -t tensorflow/serving

The command will provision a REST API at http://localhost:8501.

Making API Calls to the Served Model

The final step is to make API calls to the deployed model. You can do this by sending the calls to http://localhost:8501/v1/models/vgg16:predict.

import json
import requests
data = json.dumps({"instances": x.tolist()})
headers = {"content-type": "application/json"}
json_response ='http://localhost:8000/v1/models/vgg16/versions/1:predict',
predictions = json.loads(json_response.text)
print('Predicted:', decode_predictions(np.array(predictions['predictions']), top=3)[0])

Pull the GPU image to serve models with GPUs enabled.

docker pull tensorflow/serving:latest-gpu

Final Thoughts

Serving the model with TensorFlow Serving is only the beginning of the deployment journey. Thereafter, you’ll need to test how the model performs on real-world data. You also have to consider the desired model latency and throughput and check if your deployment is meeting all those requirements. Check out ML School if you are interested in diving deeper into model deployment.


Derrick Mwiti Twitter

Google Developer Expert - Machine Learning


Community guidelines