Distributed training with TensorFlow: How to train Keras models on multiple GPUs

Derrick Mwiti
Derrick Mwiti

Table of Contents

Training computer vision models requires a lot of time because of the size of the models and image data. Therefore, training these models can take prolonged periods of time, especially when training on a single GPU. You can reduce the training time by distributing the training across several GPUs. This article will teach you how to train a TensorFlow image classification model on multiple GPUs.

Data loading with TensorFlow

The data you will use is provided by Hugging Face for the AI or Not image classification competition. You can follow along with this Kaggle Notebook. Select the accelerator option with 2 GPUs.

The first step is to load the data from the directory containing the images.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing import image_dataset_from_directory
size = 224
training_set = image_dataset_from_directory("/kaggle/input/aidata/train",shuffle=True,batch_size=32,image_size=(size, size))
val_dataset = image_dataset_from_directory("/kaggle/input/aidata/val",shuffle=True,batch_size=32,image_size=(size, size))
Interested in learning more about image classification with TensorFlow
Check our How to build CNN in TensorFlow tutorial.

Data augmentation with Keras

Image augmentation helps improve the mode's performance by exposing it to images at various angles and aspect ratios.

Let's perform some basic image augmentation.

data_augmentation = keras.Sequential(

Visualizing image data with Matplotlib

Visualize the images using Matplotlib based on the augmentations defined above.  

import numpy as np
import matplotlib.pyplot as plt
for images, labels in training_set.take(1):
    plt.figure(figsize=(12, 12))
    first_image = images[0]
    for i in range(12):
        ax = plt.subplot(3, 4, i + 1)
        augmented_image = data_augmentation(
            tf.expand_dims(first_image, 0)

Distributed training with Keras

TensorFlow provides various strategies for distributed training. One of them is the MirroredStrategy which allows distributed training on multiple GPUs on a single machine. It creates one replica per GPU and mirrors all model variables across the replicas. The variables form one variable called MirroredVariable .

Apply a EfficientNetV2M  via transfer learning. To train the model with mirrored strategy, create a mirrored_strategy.scope() and define the model within that scope.

Apart from the model, the metrics and optimizer must be defined within the scope. Creating the variables within this scope leads to the creation of distributed variables.

Train the model normally after defining the variables within the mirrored_strategy scope.  MirroredStrategy will perform the training on all the available GPUs or the one you'd define manually.

mirrored_strategy = tf.distribute.MirroredStrategy()

with mirrored_strategy.scope():
    base_model = tf.keras.applications.EfficientNetV2M(
    input_shape=(size, size, 3),
    base_model.trainable = False
    inputs = keras.Input(shape=(size, size, 3))
    x = data_augmentation(inputs)
    x = tf.keras.applications.efficientnet_v2.preprocess_input(x) 
    x = base_model(x, training=False)
    x = keras.layers.GlobalAveragePooling2D()(x)
    x = keras.layers.Dropout(0.2)(x)
    outputs = keras.layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(inputs, outputs)
    accuracy = keras.metrics.BinaryAccuracy()
    optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(),metrics=accuracy)
model.fit(training_set, epochs=10, validation_data=val_dataset)

Notice that the two GPUs on Kaggle are being used.

Final thoughts

There are other training strategies you can try apart from the mirrored strategy; they include:

  • TPUStrategy for training on TPUs.
  • MultiWorkerMirroredStrategy for distributed training across multiple workers.  
  • ParameterServerStrategy a data-parallel method for training on multiple machines.
  • CentralStorageStrategy performs synchronous training without mirroring variables but places them on the CPU.

Whenever you're ready, there is 2 way I can help you:

If you're looking for a way to build a career while writing about data science and machine learning, I'd recommend starting with an affordable ebook:

Writing for Data Scientists: The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.

Data Science and Machine Learning Ebook: I offer numerous free and paid data science and machine learning ebooks to help you in your data science career. Check them out.


Derrick Mwiti Twitter

Google Developer Expert - Machine Learning