Fine-tuning has become the new training because training large language models (LLMs) from scratch is computationally expensive. It also requires collecting and preparing large datasets which is also time intensive. These resources are only the purview of a few individuals and organizations. Fortunately, there are many open-source LLMs that one can leverage for different use cases. For instance, you can fine-tune a GPT model to follow instructions.

These large language models can be fine-tuned on commodity GPUs such as the ones offered for free on Kaggle notebooks or Colab. To fine-tune the model you will need:

- An environment with at least one GPU, such as the ones mentioned above
- An open-source model, such as the ones offered by Kerasnlp
- A dataset for fine-tuning the model which you can create from scratch or pick from Kaggle Datasets or Hugging Face datasets

You will also need to ensure that the dataset is in the required instruction format. There are many formats and any of them will get the job done. You just need to pick one and format the training dataset in that style. A few popular formats include the Alpaca style and the Llama style.

This article will explore how to fine-tune the GPT-2 model using Kerasnlp to follow instructions in the Alpaca style. You can follow along with this Kaggle notebook.

Join the newsletter to receive the technical deep dives in your inbox.

The data for training the model must be a set of instructions followed by the desired responses. First, prepare a prompt template:

```
prompt_template = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Instruction
### Input:
Input
### Response:
"""
```

Next, load a dataset with the instructions, input, and response. The alpaca dataset is a good option here. Load it and convert it into a Pandas DataFrame.

```
qa_data = load_dataset("tatsu-lab/alpaca", split="train")
import pandas as pd
df = pd.DataFrame(qa_data)
df.head()
```

Create an empty list and populate it with the dataset after replacing the instruction, input, and response as in the provided dataset.

```
examples = df.to_dict()
num_examples = len(examples["input"])
qa_finetuning_dataset = []
for i in range(num_examples):
input = examples["output"][i]
output = examples["output"][i]
instruction = examples["instruction"][i]
text_with_prompt_template = prompt_template.format(output=output, instruction=instruction, input=input)
qa_finetuning_dataset.append({"text": text_with_prompt_template})
from pprint import pprint
print("One sample from the data:")
pprint(qa_finetuning_dataset[0])
```

Optionally, you can convert it to a Pandas DataFrame and upload it to your Hugging Face account for future reference.

```
qa_llm_data = pd.DataFrame(columns=["text"], data=qa_finetuning_dataset)
qa_llm_data.head()
from datasets import Dataset
dataset = Dataset.from_pandas(qa_llm_data)
dataset.push_to_hub("YOUR_USERNAME/alpaca", token="YOUR_HF_TOKEN")
```

Fortunately, the Alpaca dataset already contains the formatted dataset in the `text`

column.

Import the required packages and move on to the next step:

```
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import keras_nlp
import datasets
from datasets import load_dataset
import pandas as pd
```

Load the Alpaca dataset into a Pandas DataFrame. Select the `text`

column since it contains the data we need to train the model.

```
dataset = load_dataset("tatsu-lab/alpaca", split="train")
df = pd.DataFrame(dataset)
df = df[['text']]
df.head()
```

Split the data into a training and validation set. We use 90% for training the model and the rest for validation.

```
n = int(0.9 * len(df)) # first 90% will be train, rest val
train_examples = df[:n]
val_examples = df[n:]
```

Next, convert the dataset into a TensorFlow dataset.

```
train_examples = tf.data.Dataset.from_tensor_slices((train_examples))
val_examples = tf.data.Dataset.from_tensor_slices((val_examples))
```

Tune the dataset for performance by batching and prefetching to ensure the data loading process is not a bottleneck when training the model.

```
BUFFER_SIZE = 20000
BATCH_SIZE = 32
def make_batches(ds):
return (
ds.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
# Create training and validation set batches
train_batches = make_batches(train_examples)
val_batches = make_batches(val_examples)
```

To train the model, we need to define the following items:

- The GPT model
- The model preprocessor that is responsible for tokenization and other preprocessing functions
- The training loss
- Training metrics
- Number of training epochs

Using a learning rate with `PolynomialDecay`

leads to a model with good results. It aims to reach the end learning rate in the given decay steps by applying the polynomial decay function. The steps are used for computing the decayed learning rate.

```
num_epochs = 5
learning_rate = tf.keras.optimizers.schedules.PolynomialDecay(
5e-5,
decay_steps=train_batches.cardinality() * num_epochs,
end_learning_rate=0.0,
)
optimizer = tf.keras.optimizers.Adam(learning_rate)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
```

Next, define the preprocessor and model. The `GPT2CausalLM`

class provides a GPT-2 model for text generation. A causal model works by predicting the next work, given a sequence of words. The preprocessor automatically applies all the required preprocessing to the input string. To perform no preprocessing, pass the preprocessor as None, in which case you will need to process the data before passing it to the model.

```
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
"gpt2_base_en",
sequence_length=300,
)
generator = keras_nlp.models.GPT2CausalLM.from_preset(
"gpt2_base_en", preprocessor=preprocessor
)
```

We set the `sequence_length`

to 300 for fast training and the ability to fit the model and data into a single GPU.

Join the newsletter to receive the technical deep dives in your inbox.

Compile the model using the default sampler. Kerasnlp provides `keras_nlp.samplers`

for controlling the text generation process. The default method is `top_k`

.

```
generator.compile(
optimizer=optimizer,
loss=loss,
weighted_metrics=["accuracy"],
)
```

Train the model for 5 epochs. This will take ~2 hours on Kaggle.

```
history = generator.fit(train_batches, validation_data=val_batches,epochs=num_epochs)
```

Prompt the model using the `generate`

method:

```
prompt = "How to make banana bread?"
output = generator.generate(f"### Instruction:\n{prompt}\n### Response:\n", max_length=300)
print(output)
```

Before fine-tuning the response would have been:

In this blog post, you have learned how to perform instruction fine-tuning using Keras. The resulting model can follow instructions, unlike the base model. You can improve on this model by:

- Training for more epochs
- Using a larger dataset
- Fine-tuning using a larger GPT-2 model instead of the base version such as
`gpt2_medium_en`

,`gpt2_large_en`

, or`gpt2_extra_large_en`

. - Try a different generative model such as Llama

With the plethora of open-source language models, it's incredibly difficult to determine if a piece of text is AI generated. However, with a good dataset, you can train a model in TensorFlow to detect if a large language model generated text. It's such an interesting problem that there is even a Kaggle competition dedicated to solving it.

In this blog post, we'll take a stab at solving this problem using TensorFlow.

We kick off by importing all the required modules:

- Pandas to load the dataset
`array`

to convert the text to NumPy arrays- Matplotlib to plot the test and validation chats
- TensorFlow utilities for building the network

```
from numpy import array
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Bidirectional
```

To train the model, we use the DAIGT V2 Train Dataset. Load the dataset with Pandas and drop any duplicates in the dataset.

```
test = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/test_essays.csv')
sub = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/sample_submission.csv')
org_train = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/train_essays.csv')
train = pd.read_csv("/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv", sep=',')
train = train.drop_duplicates(subset=['text'])
```

Display part of the training set:

`train.head()`

For this exercise, we are interested in the `text`

and `label`

columns.

Next, split the data into a training and validation set using Scikit-learn. We'll use 80% for training and 20% for validation.

```
docs = train['text']
labels = array(train['label'])
X_train, X_test , y_train, y_test = train_test_split(docs, labels , test_size = 0.2, random_state=0)
```

Deep learning models don't understand raw text. We, therefore, have to convert the text to a numerical representation. In TensorFlow, this is done using the TextVectorization layer. Given text, the layer will create a sequence of integers. Some of the arguments you can pass to the layer are:

`standardize`

to apply specific standardizations to the text, for example,`lower_and_strip_punctuation`

will lowercase all the text and remove punctuation.`max_tokens`

to determine the vocabulary size.`output_mode`

dictates the output of the layer, for example,`int`

will output integers.`output_sequence_length`

ensure that the text is padded or truncated to the maximum sequence length

```
max_features = 150000 # Maximum vocab size.
batch_size = 32
max_len = 300 # Sequence length to pad the outputs to.
vectorize_layer = tf.keras.layers.TextVectorization(standardize='lower_and_strip_punctuation',
max_tokens=max_features,
output_mode='int',
output_sequence_length=max_len)
vectorize_layer.adapt(X_train,batch_size=None)
vocab_size = vectorize_layer.vocabulary_size()
X_train_padded = vectorize_layer(X_train)
X_test_padded = vectorize_layer(X_test)
test_data = vectorize_layer(test['text'])
```

Next, create a TensorFlow dataset and create batches. Setting up a TensorFlow dataset allows you to configure further data settings such as prefetching the data with an automatic buffer size.

```
training_data = tf.data.Dataset.from_tensor_slices((X_train_padded, y_train))
validation_data = tf.data.Dataset.from_tensor_slices((X_test_padded, y_test))
training_data = training_data.batch(batch_size)
validation_data = validation_data.batch(batch_size)
```

A word embedding is a representation of text data in a vector space such that similar words appear close to each other. In this case, words that are more likely to be generated by an LLM may be close to each other. In TensorFlow, we can use the Embedding layer to achieve that. You can either train one from scratch or use a pre-trained one. In this case, we will do the latter.

We will use the pre-trained GloVe embeddings to initialize a pretrained Embedding layer. The process involves loading the embeddings into an `embeddings_index`

dictionary.

```
embeddings_index = {}
f = open('/kaggle/input/glove7b/glove.6B.300d.txt')
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))
```

For example, here is part of the vector representation of the word `word`

:

The next step is to create an embedding matrix by looking at every word in our vocabulary and fetching its embedding matrix from the `embeddings_index`

dictionary. If a word can't be found it will be represented by zeros.

```
vocabulary=vectorize_layer.get_vocabulary()
embedding_matrix = np.zeros((len(vocabulary) + 1, max_len))
for i,word in enumerate(vocabulary):
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
```

To create the network, we start by creating an embedding layer from the computed embedding matrix. We do this by initializing the Embedding layer and setting the weights to the embedding matrix. Setting `trainable`

to `False`

ensures that the layer is not trained again.

`input_dim`

is set as`vocab_size + 1`

representing the size of the vocabulary. The second argument, `output_dim`

is the dimension of the dense embedding. `input_length`

is the length of the input sequences.

```
embedding_layer = Embedding(vocab_size + 1,
max_len,
weights=[embedding_matrix],
input_length=max_len,
trainable=False)
```

Define the model using the Keras Sequential layer:

```
model = Sequential([
embedding_layer,
Bidirectional(LSTM(256, return_sequences=True)),
Bidirectional(LSTM(128, return_sequences=True)),
Bidirectional(LSTM(64, return_sequences=True)),
Bidirectional(LSTM(32,)),
Dense(300, activation='relu'),
Dense(150, activation='relu'),
Dense(75, activation='relu'),
Dense(24, activation='relu'),
Dense(1, activation='sigmoid')
])
```

The network uses a bidirectional LSTM to ensure that information flows in both directions. Since it's a binary classification problem, the final output has 1 unit and the sigmoid activation fiction.

Compile and train the model:

```
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
num_epochs = 20
history = model.fit(training_data, epochs=num_epochs, validation_data=validation_data)
```

When training is complete, we can plot the training and validation charts using Matplotlib:

```
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
```

Finally, you can run predictions on the test set.

`final_preds = model.predict(test_data)`

Some of the things you could do to improve this model include:

- Use a different metric because the dataset is slightly imbalanced
- Try a Transformer network instead
- Source a different and better dataset

JAX is a high performance library that offers accelerated computing through XLA and Just In Time Compilation. It also has handy features that enable you to write one codebase that can be applied to batches of data and run on CPU, GPU, or TPU. However, one of its biggest selling points is its speed of execution compared to NumPy and other libraries offering numerical computation.

In this article, you will learn how to define and train Convolutional Neural Networks in JAX. It's useful if you have already gone through:

- (JAX) What it is and how to use it in Python
- How to build CNN in TensorFlow
- How to load datasets in JAX
- Optimizers in JAX
- JAX loss functions

Join the newsletter to receive the technical deep dives in your inbox.

In this project, we will use the cats and dogs dataset to build a CNN in JAX that differentiates between cats and dogs. Download and unzip the dataset:

```
kaggle datasets download -d chetankv/dogs-cats-images
unzip dogs-cats-images.zip
```

Import all the packages needed for this project:

```
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import jax
from jax import numpy as jnp
import optax
from tqdm.auto import tqdm
import flax
from flax import linen as nn
from flax.training import train_state
import dm_pix as pix # pip install dm-pix
```

Confirm that you have GPU access:

`jax.local_devices()`

JAX doesn't ship with any data loading functionality. You can use your favorite library to load and process the data. In this case, let's use TensorFlow. First, define the path to the images and the batch size:

```
base_dir = "dog vs cat/dataset/training_set"
batch_size = 64
```

Load the training, testing, and validation images from the folder:

```
training_set = tf.keras.utils.image_dataset_from_directory(
base_dir, validation_split=0.2, batch_size=batch_size, subset="training", seed=5603
)
validation_set = tf.keras.utils.image_dataset_from_directory(
base_dir,validation_split=0.2,batch_size=batch_size,subset="validation",seed=5603,
)
eval_set = tf.keras.utils.image_dataset_from_directory(
"dog vs cat/dataset/test_set", batch_size=batch_size
)
```

Next, define functions for scaling and resizing the images. Scaling is necessary for stabilizing the training process by creating small numbers. We also resize all the images to be of the same size. The larger the images the longer training will take.

```
IMG_SIZE = 128
resize_and_rescale = tf.keras.Sequential(
[
tf.keras.layers.Resizing(IMG_SIZE, IMG_SIZE),
tf.keras.layers.Rescaling(1.0 / 255),
]
)
```

Data augmentation modifies existing data to prevent the model from overfitting by passing images of different aspects to the model. In JAX we can do this using the PIX library. In this case, we apply the following transformations:

- Brightness adjustment
- Flipping the images
- Rotating the images

Here's the function that will do that:

```
rng = jax.random.PRNGKey(0)
rng, inp_rng, init_rng = jax.random.split(rng, 3)
delta = 0.42
factor = 0.42
@jax.jit
def data_augmentation(image):
new_image = pix.adjust_brightness(image=image, delta=delta)
new_image = pix.random_brightness(image=new_image, max_delta=delta, key=inp_rng)
new_image = pix.flip_up_down(image=image)
new_image = pix.flip_left_right(image=new_image)
new_image = pix.rot90(k=1, image=new_image) # k = number of times the rotation is applied
return new_image
```

On line 2 above, we create a key and pass it on line 10 since the process is random. This is important in ensuring that we get the same transformation whenever the process is called using the same key. This is in line with JAX's expectation of pure functions.

Apply the function to a bunch of images and visualize them using Matplotlib to ensure that everything is working as expected.

```
plt.figure(figsize=(10, 10))
augmented_images = []
for images, _ in training_set.take(1):
for i in range(9):
augmented_image = data_augmentation(np.array(images[i], dtype=jnp.float32))
augmented_images.append(augmented_image)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[i].astype("uint8"))
plt.axis("off")
```

Join the newsletter to receive the technical deep dives in your inbox.

When creating the augmented images above, we used a for loop. However, we don't want to do this when training the model because it's inefficient. The solution is to use a mapping function that will do this automatically. Fortunately JAX, ships with the `vmap`

function that enables you to easily convert a function designed for a single example to run in a batch.

`jit_data_augmentation = jax.vmap(data_augmentation)`

The resulting data will be TensorFlow tensors because we used TensorFlow to process the data. However, passing TensorFlow tensors to a JAX model will lead to data type errors. We, therefore, have to convert the image data to NumPy arrays.

Start by shuffling the dataset and applying the scale and resizing functions.

```
AUTOTUNE = tf.data.AUTOTUNE
def prepare(ds, shuffle=False):
# Rescale and resize all datasets.
ds = ds.map(lambda x, y: (resize_and_rescale(x), y), num_parallel_calls=AUTOTUNE)
if shuffle:
ds = ds.shuffle(1000)
# Use buffered prefetching on all datasets.
return ds.prefetch(buffer_size=AUTOTUNE)
train_ds = prepare(training_set, shuffle=True)
val_ds = prepare(validation_set)
evaluation_set = prepare(eval_set)
```

Next, convert the datasets into NumPy arrays using TensorFlow datasets:

```
def get_batches(ds):
data = ds.prefetch(1)
# tfds.dataset_as_numpy converts the tf.data.Dataset into an iterable of NumPy arrays
return tfds.as_numpy(data)
training_data = get_batches(train_ds)
validation_data = get_batches(val_ds)
evaluation_data = get_batches(evaluation_set)
```

Defining a CNN in JAX can be done using the setup or compact way. Here's a CNN network with 3 convolutional blocks:

```
class_names = training_set.class_names
num_classes = len(class_names)
class CNN(nn.Module):
@nn.compact
def __call__(self, x):
x = nn.Conv(features=128, kernel_size=(3, 3))(x)
x = nn.relu(x)
x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
x = nn.Conv(features=64, kernel_size=(3, 3))(x)
x = nn.relu(x)
x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
x = nn.Conv(features=32, kernel_size=(3, 3))(x)
x = nn.relu(x)
x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
x = x.reshape((x.shape[0], -1)) # flatten
x = nn.Dense(features=256)(x)
x = nn.Dense(features=128)(x)
x = nn.relu(x)
x = nn.Dense(features=num_classes)(x)
return x
```

Each CNN block has a different number of features but they all have the same **kernel size**, window shape, and **pooling strides**. The convolutional layers are responsible for reducing the size of the image resulting in a **feature map**. This is done by passing the image through a kernel, usually a 3 by 3 matrix. The size of the feature map is the same as the size of the image because the padding argument is `SAME`

by default. No padding is applied with the `VALID`

option.

The **ReLU activation** is applied to ensure non-linearity in the network.

Pooling reduces the size of the feature map further by applying a **pooling filter**, usually a 2 by 2 matrix. **Max pooling** picks the largest value in each box while **pooled** computes the mean of the values in a certain box.

We flatted the** pooled feature map** before passing it to the fully connected layers. This results in a single column known as a **flattened feature map**.

Initialize the JAX CNN model with a data sample that is the same shape as the expected image dataset to create the model parameters. The initialization process requires a pseudo-random number (PRNG) key.

```
model = CNN()
inp = jnp.ones([1, IMG_SIZE, IMG_SIZE, 3])
# Initialize the model
params = model.init(init_rng, inp)
# print(params)
```

The structure of the parameters is the same as the CNN network you defined. The `kernel`

part indicates the **weights and biases** of the JAX CNN model. Apply the model to a sample input and check the output:

`model.apply(params, inp)`

In Flax, the training state is responsible for holding the model variables such as parameters and optimizer state. This is done by subclassing `flax.training.train_state`

. We define a training state with the Adam optimizer at a learning rate of 1e-5.

```
learning_rate = 1e-5
optimizer = optax.adam(
learning_rate=learning_rate
) # lr 1e-4. try 0.001 the default in tf.keras.optimizers.Adam
model_state = train_state.TrainState.create(
apply_fn=model.apply, params=params, tx=optimizer
)
```

As the model trains, we need to track the loss and accuracy. Later, we can use this to plot the model's performance. Note that on line 3, we apply the `jitted`

and `vmaped`

augmentation function.

We obtain the metrics by:

- Compute the logits by applying the
`params`

and images - One hot encoding of the labels
- Computing the loss using
`sigmoid_binary_cross_entropy`

it is since it is a binary classification problem - Obtain the accuracy from the logits

```
def calculate_loss_acc(state, params, batch):
data_input, labels = batch
data_input = jit_data_augmentation(data_input)
# Obtain the logits and predictions of the model for the input data
logits = state.apply_fn(params, data_input)
# Calculate the loss and accuracy
labels_onehot = jax.nn.one_hot(labels, num_classes=num_classes)
#uncomment the line below for multiclass classification
# loss = optax.softmax_cross_entropy(logits, labels_onehot).mean()
loss = optax.sigmoid_binary_cross_entropy(logits, labels_onehot).mean()
# comment the line above for multiclass problems
acc = jnp.mean(jnp.argmax(logits, -1) == labels)
return loss, acc
```

Let's break down the code:

`data_input, labels = batch`

: The function expects a`batch`

parameter containing the images and labels.`data_input = jit_data_augmentation(data_input)`

: The input data is passed through a function called`jit_data_augmentation`

that applies some data augmentation techniques to enhance the diversity of the training data.`logits = state.apply_fn(params, data_input)`

: The model's forward pass is computed using the`apply_fn`

method of the`state`

object. The`params`

are the model parameters and`data_input`

the augmented input data. The result`logits`

, represents the raw output of the model before applying any activation function.`labels_onehot = jax.nn.one_hot(labels, num_classes=num_classes)`

: The labels are converted into one-hot encoded format using the`one_hot`

function from JAX.`loss = optax.sigmoid_binary_cross_entropy(logits, labels_onehot).mean()`

: The loss is computed using the sigmoid binary cross-entropy loss function.`acc = jnp.mean(jnp.argmax(logits, -1) == labels)`

: The accuracy is calculated by comparing the predicted class indices (argmax of logits) with the actual labels. The result is a boolean array, and`jnp.mean`

is used to calculate the average accuracy.- The function returns a tuple containing the loss and accuracy.

Test the metrics function on a batch of data:

```
batch = next(iter(training_data))
calculate_loss_acc(model_state, model_state.params, batch)
```

Join the newsletter to receive the technical deep dives in your inbox.

When training the model we need to compute the gradients. This is done using the `value_and_grad`

function. The `value`

part of the name indicates that the function will have additional outputs apart from the gradients. `argnums`

is 1 because the parameters that will be differentiated with respect to are passed as the second argument. Setting `has_aux`

to true means that the second output element is auxiliary data while the first pair is the output of the mathematical function to be differentiated. The `apply_gradients`

function updates the model parameters with the computed gradients. Passing `jax.jit`

to the functions makes them faster since they are optimized for XLA.

```
@jax.jit # Jit the function for efficiency
def train_step(state, batch):
# Gradient function
grad_fn = jax.value_and_grad(
calculate_loss_acc, # Function to calculate the loss
argnums=1, # Parameters are second argument of the function
has_aux=True, # Function has additional outputs, here accuracy
)
# Determine gradients for current model, parameters and batch
(loss, acc), grads = grad_fn(state, state.params, batch)
# Perform parameter update with gradients and optimizer
state = state.apply_gradients(grads=grads)
# Return state and any other value we might want
return state, loss, acc
```

The JAX CNN evaluation step applies the metrics function to the test data and returns the loss and accuracy.

```
@jax.jit # Jit the function for efficiency
def eval_step(state, batch):
# Determine the accuracy
loss, acc = calculate_loss_acc(state, state.params, batch)
return loss, acc
```

Training the JAX CNN model is done in the following steps:

- Apply the
`train_step`

to the entire training dataset - Obtain the average metrics for each batch
- Compute the mean metrics for each epoch from the batch metrics
- Repeat the same for the evaluation step
- Save the metrics for plotting later
- Print the metrics on the screen

```
training_accuracy = []
training_loss = []
testing_loss = []
testing_accuracy = []
def train_model(state, train_loader, test_loader, num_epochs=30):
# Training loop
for epoch in tqdm(range(num_epochs)):
train_batch_loss, train_batch_accuracy = [], []
val_batch_loss, val_batch_accuracy = [], []
for train_batch in train_loader:
state, loss, acc = train_step(state, train_batch)
train_batch_loss.append(loss)
train_batch_accuracy.append(acc)
for val_batch in test_loader:
val_loss, val_acc = eval_step(state, val_batch)
val_batch_loss.append(val_loss)
val_batch_accuracy.append(val_acc)
# Loss for the current epoch
epoch_train_loss = np.mean(train_batch_loss)
epoch_val_loss = np.mean(val_batch_loss)
# Accuracy for the current epoch
epoch_train_acc = np.mean(train_batch_accuracy)
epoch_val_acc = np.mean(val_batch_accuracy)
testing_loss.append(epoch_val_loss)
testing_accuracy.append(epoch_val_acc)
training_loss.append(epoch_train_loss)
training_accuracy.append(epoch_train_acc)
print(
f"Epoch: {epoch + 1}, loss: {epoch_train_loss:.2f}, acc: {epoch_train_acc:.2f} val loss: {epoch_val_loss:.2f} val acc {epoch_val_acc:.2f} "
)
return state
```

You can save the metrics in a Pandas DataFrame and plot them using Matplotlib.

```
metrics_df = pd.DataFrame(np.array(training_accuracy), columns=["accuracy"])
metrics_df["val_accuracy"] = np.array(testing_accuracy)
metrics_df["loss"] = np.array(training_loss)
metrics_df["val_loss"] = np.array(testing_loss)
metrics_df[["loss", "val_loss"]].plot()
metrics_df[["accuracy", "val_accuracy"]].plot()
```

You may also want to save the trained model for later use. This is done by storing the model checkpoints in a folder:

```
from flax.training import checkpoints
checkpoints.save_checkpoint(
ckpt_dir="/content/my_checkpoints/", # Folder to save checkpoint in
target=trained_model_state, # What to save. To only save parameters, use model_state.params
step=100, # Training step or other metric to save best model on
prefix="my_model", # Checkpoint file name prefix
overwrite=True, # Overwrite existing checkpoint files
)
```

Load the checkpoint:

```
loaded_model_state = checkpoints.restore_checkpoint(
ckpt_dir="/content/my_checkpoints/", # Folder with the checkpoints
target=model_state, # (optional) matching object to rebuild state in
prefix="my_model", # Checkpoint file name prefix
)
```

Join the newsletter to receive the technical deep dives in your inbox.

In this article, you have learned how to define and train convolutional neural networks in JAX. You have also covered:

- How to apply data augmentation for computer vision problems in JAX
- Loading image data in JAX
- How to sanity check the augmented images by visualizing them using Matplotlib
- How to take advantage of
`jax.jit`

to make operations faster - Defining a function that runs for a single example and making it run on a batch using
`jax.vmap`

- Defining training loops in JAX
- How to evaluate the performance of CNN models in JAX

..to mention a few.

```
import tensorflow as tf
print(tf.__version__) # check version
# 2.14.0
```

💡

The examples in this article use TensorFlow v2.x, so concepts deprecated and or that

**First off: **If you are familiar with NumPy arrays, understanding TensorFlow Tensors will be as easy as first importing TensorFlow as below**:**

```
import tensorflow as tf
print(tf.__version__) # check version
# 2.14.0
```

💡

The examples in this article use TensorFlow v2.x, so concepts deprecated and or that were designed for TensorFlow v1.x are not covered (For example, tf.placeholder is unnecessary in TF2 since eager execution is enabled by default).

**But what is TensorFlow?**

TensorFlow is an open-source machine-learning platform designed to facilitate the development and deployment of machine-learning models, especially in deep learning. Mainly, its name is derived from one of its core frameworks: **Tensors**.

In TensorFlow, all the computations involve Tensors. That makes working with Tensors a must-knowledge before your next venture into building deep machine-learning models. **This article is** **dedicated** **to exploring Tensors**, and by the end of it, we aim to make you ready to work with them.

Tensors are multidimensional arrays with a **uniform type**(see supported types/dtypes). They form the fundamental building block for data representation and manipulation of TensorFlow. **Tensors can be scalars, vectors, matrices, or higher dimensional arrays,** which hold numerical data necessary for representing input data, model parameters, and output predictions in TensorFlow-based machine learning models.

Tensors resemble NumPy arrays. However, **Tensors are immutable, **which means that once we have created a Tensor, we can not modify or change it. **This feature ensures consistency and avoids unintended side effects** during the construction and execution of machine-learning models.

TensorFlow offers us several functions and methods for creating tensors. Most of the tensors we will create are also called **Dense tensors** since they have fixed shapes along all dimensions. We will also look at **special types of tensors**,.

We will look at:

`tf.constant`

`tf.Variable`

`tf.zeros`

and`tf.ones`

- Creating tensors from NumPy arrays(
`tf.convert_to_tensor`

) - Functions to create random tensors
- Ragged(
`tf.ragged.constant`

) and Sparse(`tf.SparseTensor`

) tensors(special tensors)

`tf.constant()`

is TensorFlow's most basic method for creating tensors. This function is vital as it allows us to create tensors with constant values. Tensors created with this function are immutable.

Before exploring other functions, we will use this function to explain most tensor concepts like **ranks, shape**, and **dtypes**.

```
tf.constant(value, dtype=None, shape=None, name='Const')
'''
value: A constant value or list of n dimensions to define the tensor
dtype: The type of the elements in the output tensor: Optional: Inferred
from the value if not specified
shape: The intended dimensions of the resulting tensor: Optional: If
specified, the value is reshaped to match
name: Name of the tensor: Optional
'''
```

Example 1:

```
rank_0_tensor = tf.constant(4)
print(rank_0_tensor)
```

Output:

Notice that we named the example tensor above "*rank_0_tensor*". That means the resulting tensor is a scalar with a single value and zero dimensions. We can check the number of dimensions with the `Tensor.ndim`

attribute.

`print(f"rank_0_tensor has {rank_0_tensor.ndim} dimensions")`

Output:

We can create tensors of n-dimensions. A vector, for instance, will have 1 dimension (rank 1 tensor), a matrix will have 2 dimensions(rank 2 tensor), while a higher dimensional tensor will have n-dimensions(rank n tensor).

Example 2: Creating a rank 1 tensor(vector) - A list of values:

```
rank_1_tensor = tf.constant([20, 100])
print(rank_1_tensor)
print(f"\nTensor rank: {tf.rank(rank_1_tensor)}")
print(f"rank_1_tensor has {rank_1_tensor.ndim} dimension")
```

Output:

Example 3: Creating a rank 2 tensor(matrix) - A list of lists:

```
rank_2_tensor = tf.constant([[20, 10],
[15, 30],
[45, 35]])
print(rank_2_tensor)
print(f"\nTensor rank: {tf.rank(rank_2_tensor)}")
print(f"rank_2_tensor has {rank_2_tensor.ndim} dimensions")
```

Output:

Example 4: Creating a rank 3 tensor(n-dimensional):

```
rank_3_tensor = tf.constant([
[[0, 1, 2],
[3, 4, 5]],
[[6, 7, 8],
[9, 10, 11]],
[[12, 13, 14],
[15, 16, 17]],])
print(rank_3_tensor)
print(f"\nTensor rank: {tf.rank(rank_3_tensor)}")
print(f"rank_3_tensor has {rank_3_tensor.ndim} dimensions")
```

Output:

Aside from the rank and the dimensions, **tensors have two other very important attributes: shape and dtype.**

**shape**: returns the size of the tensor along each of its axes or dimensions.**dtype**: returns the type of all the elements in the tensor.

As you may have noticed, when we print a tensor, the result returns its value, shape, and dtype. For instance, looking at the rank_3_tensor, the result is represented as below:

Sometimes, we may want to know or retrieve a tensor's shape after computation for operations like:

- Reshaping
- Slicing and indexing
- Model debugging

We can retrieve the shape using `tf.shape`

function :

```
# retrieve the shape of a tensor
print(f"Rank 0 tensor shape: {tf.shape(rank_0_tensor)}")
print(f"Rank 1 tensor shape: {tf.shape(rank_1_tensor)}")
print(f"Rank 2 tensor shape: {tf.shape(rank_2_tensor)}")
print(f"Rank 3 tensor shape: {tf.shape(rank_3_tensor)}")
```

Output:

We can also check the shape with `Tensor.shape`

. However, this does not return a tensor.

```
print(rank_2_tensor.shape)
# (3, 2)
```

Returning a tensor shape as a tensor may have the following benefits:

- When performing operations based on the shape of a tensor, having the shape as a tensor allows us to use it in computations.
- We can dynamically reshape a tensor's shape.
- We can query and validate the shape of the output when creating functions or layers that operate on tensors.

A tensor can have any dtype(data type) listed on `tf. dtypes`

. However, remember a tensor must have a specific data type, and all elements within that tensor must conform to that data type. Therefore, a single tensor cannot have two different data types. For instance, we cannot create a tensor with integer and floating-point elements. If you need to work with different data types, you would typically create separate tensors for each type.

We can specify the tensor `dtype`

while creating it:

```
# Float tensor
float32_tensor = tf.constant([20.5, 30.0, 4.3], dtype='float32')
# Integer tensor
int64_tensor = tf.constant([[1, 2, 3], [4, 5, 6]], dtype='int64')
# String tensor
string_tensor = tf.constant(["TensorFlow", "tensors", "dtypes"])
print(float32_tensor)
print(int64_tensor)
print(string_tensor)
```

Output:

**Aside:** The ** b'...' **notation on string tensors indicates that they are byte strings. As you will also learn later, especially when dealing with Natural Language Processing (NLP) tasks, string tensors can have elements of variable lengths, unlike numeric tensors.

While we can not have tensors with elements of different types, TensorFlow provides `tf.cast`

that we can convert tensors between various datatypes:

```
float32_tensor_as_int32_tensor = tf.cast(float32_tensor, dtype='int32')
int64_tensor_as_float16_tensor = tf.cast(int64_tensor, dtype='float16')
print(float32_tensor_as_int32_tensor)
print(int64_tensor_as_float16_tensor)
```

Output:

`tf.zeros`

and `tf.ones`

are commonly used in deep learning(especially when building neural network models) to initialize certain tensors to specific values. By default, a tensor initialized with `tf.zeros`

will contain only zeros, while that initialized with `tf.ones`

will have only ones.

Both functions are initialized with the format below:

```
tf.zeros/tf.ones(
shape,
dtype=tf.dtypes.float32,
name=None,
layout=None
)
'''
shape: a list or tuple of integers or a 1D tensor
dtype: dtype of the elements
'''
```

```
# Intiailize tensors with ones
ones_tensor1 = tf.ones(shape=(2), dtype='float32') # rank 1 tensor of ones
ones_tensor2 = tf.ones(shape=[2, 3], dtype='int32') # rank 2 tensor of ones
tensor_1d = tf.constant(value = [1, 2])
ones_tensor3 = tf.ones(shape=tensor_1d) # takes shape as the value of the 1d tensor
print(ones_tensor1)
print(ones_tensor2)
print(ones_tensor3) # 1 row and two columns
```

Output:

```
# Intiailize tensors with zeors
zeros_tensor1 = tf.zeros(shape=(3), dtype='float32') # rank 1 tensor of zeros
zeros_tensor2 = tf.zeros(shape=[2, 3, 3], dtype='int32') # rank 3 tensor of zeros
tensor_1d = tf.constant(value = [3, 2])
zeros_tensor3 = tf.zeros(shape=tensor_1d) # takes shape as the value of the 1d tensor
print(zeros_tensor1)
print(zeros_tensor2)
print(zeros_tensor3) # 3 rows and 2 columns
```

Output:

When building a model with TensorFlow, `tf.zeros`

and `tf.ones`

can come in handy, for instance, when initializing a model's weights, biases, variables(with `tf.Variable`

), or other parameters. For example, we can build a sample model where we initialize the weights and biases with `tf.zeros`

to give you a taste of building a model:*Do not worry if you do not understand some of the things below. This intends to break the monotony so far and excite you for the future.*

```
import numpy as np
# Dummy data for training
np.random.seed(42)
x_train = np.random.random((100, 10)) # 100 samples with 10 features each
y_train = np.random.randint(2, size=(100,)) # Integer labels
# Define the model
class SimpleModel(tf.keras.Model):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleModel, self).__init__()
# Initialize weights and biases with tf.zeros
self.weights_hidden = tf.Variable(
tf.zeros(shape=(input_size,
hidden_size)))
self.biases_hidden = tf.Variable(
tf.zeros(shape=(hidden_size,)))
self.weights_output = tf.Variable(
tf.zeros(shape=(hidden_size,
output_size)))
self.biases_output = tf.Variable(
tf.zeros(shape=(output_size,)))
def call(self, inputs):
# Forward pass
hidden_layer = tf.matmul(inputs,
self.weights_hidden) + self.biases_hidden
output_layer = tf.matmul(hidden_layer,
self.weights_output) + self.biases_output
return output_layer
# Instantiate the model
input_size = 10
hidden_size = 5
output_size = 2
model_zeros = SimpleModel(input_size, hidden_size, output_size)
# Compile the model
model_zeros.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Train the model
model_zeros.fit(x_train, y_train, epochs=10, batch_size=32)
# Evaluate the model
loss, accuracy = model_zeros.evaluate(x_train, y_train)
print(f"Final Training Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")
```

Output:

💡

As much as the model may work when the weights are initialized with tf.zeros and tf.ones, it is

the

Instead, we can randomly initialize weights or use higher-level APIs, such as tf.keras.layers.Dense, that comes with

*I hope you liked the small exercise. Just take that lightly for now, at least!*

So far, we have explored how to create tensors with `tf.constant`

function. However, we mentioned that tensors created that way are immutable - that is, we can not modify their elements. To prove this point, let's see some examples below:

Suppose we had the tensor:

```
# tensor with tf.constant()
immutable_tensor = tf.constant([20, 30, 40]) # vector
print(immutable_tensor)
```

Output:

Let's try to modify the first element of the vector through indexing and assignment like we would do on NumPy arrays:

```
# Try to change the first element
immutable_tensor[0] = 100 # index first element(20) and assign new value
print(immutable_tensor)
```

Output:

💡

We can not modify elements of a tf.constant tensor once created. **That could be a challenge when creating models like neural networks where trainable parameters like weights and biases must be adjustable during the training process(optimization process). **That's where TensorFlow variables come in handy!

**TensorFlow variables** - mutable tensors, are recommended to represent shared, persistent states your program manipulates(as defined here). They are created using `tf.Variable`

class. The class has some of the following use cases when building machine learning models:

- Creating trainable parameters like weights and biases. As evident in this example above:

```
self.weights_hidden = tf.Variable(
tf.zeros(shape=(input_size,hidden_size)))
self.biases_hidden = tf.Variable(tf.zeros(shape=(hidden_size,)))
```

- Allow us to specify initial values like random initialization or zero initialization. Also evident in the example above.
- Useful in the automatic differentiation system of TensorFlow.
- Variables can be shared between different parts of a model or even between other models, enabling the reuse of learned representations.

Let's create our first variable: The `Variable()`

constructor requires an initial value that can be a tensor of any type and shape. This initial value defines the type and shape of the variable. After construction, the type and shape of the variable are fixed.

```
example_variable = tf.Variable([[1.0, 2.0], [3.0, 4.0]])
print(example_variable)
print(f"Shape: {tf.shape(example_variable)}")
print(f"Rank: {tf.rank(example_variable)}")
```

Output:

Variables can hold any type, just like tensors:

```
bool_variable = tf.Variable([False, True, True, False, False])
int32_variable = tf.Variable([20, 40, 15], dtype='int32')
print(bool_variable)
print(int32_variable)
```

Output:

We can mutate a variable tensor using the assign method.

`assign()`

to reassign a tensor to a variable tensorSince tensors back variables, we can modify or re-assign a tensor to an existing variable using `tf.Variable.assign`

. We call the `assign`

method on the variable and allocate a new tensor.

```
int32_variable = tf.Variable([20, 40, 15], dtype='int32')
print(f"Old variable: {int32_variable.numpy()}")
# modify the variable elements
int32_variable = int32_variable.assign([12, 15, 18])
print(f"Mod variable: {int32_variable.numpy()}")
```

Output:

We can not assign a tensor with a different shape from the existing variable!

`assign.add()`

method (Counter/incrementing):We can add a specific value to the current value of an existing variable(**increment**) with `tf.Variable.assign_add`

. The method is particularly useful for implementing counters or variables that need to be incremented during the execution, for instance, counting training steps(As we shall see later).

```
example_variable1 = tf.Variable(10)
example_variable2 = tf.Variable([10,30])
print(f"Current variable1 value: {example_variable1.numpy()}")
print(f"Current variable2 values: {example_variable2.numpy()}")
# increment values in the variables
example_variable1 = example_variable1.assign_add(5) # add 5 to current value
example_variable2 = example_variable2.assign_add([200, 100]) # add 200 & 100 to current values
print(f"Updated variable1 value: {example_variable1.numpy()}")
print(f"Updated variable2 values: {example_variable2.numpy()}")
```

Output:

The shapes of the variable tensors must match for that to work!

You notice that we mentioned that one of the major use cases of variables is to store trainable parameters. They can also hold gradients computed during backpropagation on a neural network model. However, **not all model variables(for instance, counters or constant values) need to be trainable or have gradients**. They can be part of the model but don't need to be updated through optimization.

We can specify whether or not a variable needs updating during the training process or its gradients computed during backpropagation by setting ** trainable=False** on the variable. For instance:

```
# set a non-trainable variable to count training steps
train_step = tf.Variable(initial_value=0, trainable=False)
```

From the code above, we could use the untrainable train step counter in a loop as below. Don't be intimidated by what you see. This will all make sense as you advance:

```
# SAMPLE MODEL HERE
...
# set a non-trainable variable to count training steps
train_step = tf.Variable(initial_value=0, trainable=False)
...
# Example training loop
for epoch in range(5): # Run for 5 epochs
for step in range(len(x_train)):
# Forward pass
with tf.GradientTape() as tape:
pass
# Perform backward pass and optimization here
...
# Update the train step variable here
train_step.assign_add(1)
print(f"Epoch {epoch + 1}, train step: {train_step.numpy()}")
# Show the final training step value
print("Final Global Step:", train_step.numpy())
```

Output:

*You see, the *`train_step`

*variable is incremented at each training step. Since we have set trainable = False to it, it will not affect any model's weights or gradients.*

We can easily create tensors from NumPy arrays using `tf.convert_to_tensor`

or calling `tf.constant`

on the particular NumPy array. Let's see how:

```
np_array = np.arange(0, 12).reshape(3, 4)
print(np_array, type(np_array))
# convert to tensor
np_array_to_tensor = tf.convert_to_tensor(np_array)
print()
print(np_array_to_tensor)
# Alternatively, we can call tf.constant on the array
np_array_to_tensor = tf.constant(np_array, dtype='float32')
print()
print(np_array_to_tensor)
```

Output:

💡

TensorFlow tensors and NumPy arrays have several similarities. However, NumPy arrays primarily run on CPUs, while tensors take advantage of hardware accelerations like GPUs and TPUs. That makes tensors faster in certain computations.

TensorFlow also has graph execution mode, which constructs a**computational graph** for optimization, resulting in speedier execution for certain tensor operations.

TensorFlow also has graph execution mode, which constructs a

There are various ways of generating random tensors. We will explore the following ways:

`tf.random.normal`

`tf.random.uniform`

`tf.random.shuffle`

`tf.random.set_seed`

`tf.random.normal `

creates tensors with random values from a normal(Gaussian) distribution. A normal distribution depicts two parameters: a mean and a standard deviation. That means we can set the two parameters for the random tensor.

The syntax:

```
tf.random.normal(
shape,
mean=0.0,
stddev=1.0,
dtype=tf.dtypes.float32,
seed=None,
name=None
)
'''
mean: optional and default is 0
stddev: Standard deviation - optional and default is 1
'''
```

Example:

```
# generate a 3x2 tensor containing random values
# sampled from a normal distribution with a mean of 0.0
# and a standard deviation of 1.0
random_normal_tensor = tf.random.normal(shape=(3, 2),
mean=0, stddev=1.0,
dtype='float32')
print(random_normal_tensor)
```

Output:

**Note that your** **result will differ every time you run the above code**. In the next section, we will learn how to produce the same random tensor each time(**random seed**).

The `tf.random.uniform`

function creates tensors with random values from a uniform distribution. A normal distribution is a probability distribution where all values in the range have an equal probability of being sampled. That means that every value in the specified interval has the same likelihood of being chosen. So, we can generate a random tensor with a set range(min_value, max_value).

The syntax:

```
tf.random.uniform(
shape,
minval=0,
maxval=None,
dtype=tf.dtypes.float32,
seed=None,
name=None
)
'''
minval: optional, and default is 0. The minimum value of the distribution.
maxval: Optional, and default is 1 - The maximum value of the distribution.
'''
```

Example:

```
# generate a 2x3x2 tensor containing random values
# sampled from a uniform distribution with a min_value of 0.0
# and a max_value of 1.5
random_uniform_tensor = tf.random.uniform(shape=[2, 3, 2],
maxval=0,
minval=1.5)
print(random_uniform_tensor)
```

Output:

**Note that your** **result will differ every time you run the above code**. In the next section, we will learn how to produce the same random tensor each time(**random seed**).

It is a common practice in machine learning to shuffle data, especially while training a model, to ensure **randomness** in the data. The data could be presented as tensors, and thus, TensorFlow has a function to help in the shuffling.

The `tf.random.shuffle`

function in TensorFlow is used to randomly shuffle the elements along the first dimension of a tensor. For a 2D or rank 2 tensor, this means shuffling the rows.

Example:

```
intial_tensor_data = tf.constant([[10, 15], [20, 30], [40, 50]])
print("Original:")
print(intial_tensor_data.numpy())
# shuffle the elements
shuffled_intial_tensor_data = tf.random.shuffle(intial_tensor_data)
print("\nShuffled:")
print(shuffled_intial_tensor_data.numpy())
```

Output:

**Note that your** **shuffle** **result will differ every time you run the above code**. In the next section, we will learn how to produce the same random tensor each time(**random seed**).

As you advance, you may encounter shuffling tensors in some use cases like:

- Data augmentation in image classification tasks to increase the diversity/randomness of the training dataset.
- In cross-validation, before splitting the data into folds.
- In Natural Language Processing.

In the examples above, we have noticed that the codes we have written generate new random tensor elements each we run them. While building models, we must have consistent results where the same sequence of random numbers is generated each time we run the model.

Tensor reproducibility can come in handy in cases like when initializing weights, shuffling datasets, or in cases of data augmentation and other tasks that require randomized tensors.

TensorFlow has two ways we can set the random seed for random tensor generation:

- Global-level random seed setting
- Operation-level random seed setting

We set the global seeds with `tf.random.set_seed(seed=integer_value)`

. A global seed is shared for all TensorFlow operations in the script, which means that any random operation will be affected by it. Typically, we set the global seed at the beginning of your script or notebook.

Example:

```
# set the global random seed
tf.random.set_seed(42)
random_tensor = tf.random.uniform(shape=[1, 3, 2])
print(random_tensor)
```

Output:

Notice that the same tensor elements are reproduced each time you run the code. If you try to generate the same random tensor with the same random seed with the same seed value(42) on another notebook cell, the result is similar.

However, the results will differ if you change the seed value to another integer value(say 123). For example:

```
# set a different global random seed value
tf.random.set_seed(123)
random_tensor = tf.random.uniform(shape=[1, 3, 2])
print(random_tensor)
```

Output:

💡

Operational-level random seeds are currently the most recommended way to ensure the reproducibility of random tensors. We set the seed with `tf.random.Generator`

class.

Each random tensor generated with this class has its own random seed, thus unique randomness. This can be helpful when we want different parts of the code to have independent random sequences.

Example:

```
# Create two instances of tf.random.Generator
# with different seeds
random_generator1 = tf.random.Generator.from_seed(42)
random_generator2 = tf.random.Generator.from_seed(123)
# use the generators for random tensor generation
random_tensor1 = random_generator1.normal(shape=[4, 2])
random_tensor2 = random_generator1.uniform(shape=[1, 3, 3])
print(random_tensor1)
print(random_tensor2)
```

Output:

✍️ You can try shuffling a tensor with `tf.shuffle`

while the random seed is set and observe the behavior!

These tensors are different from the dense tensors with special characteristics and are viable for specific use cases in deep learning. We will look at two of these tensors:

- Ragged tensors
- Sparse tensors

A ragged tensor is mainly used to represent sequences of variable lengths. While dense tensors have dimensions of fixed sizes(or uniform), dimensions in ragged tensors vary in size(not uniform). They can be helpful in tasks like NLP, where sentences are sequences with different numbers of words.

A ragged tensor would be represented like:

We create a ragged tensor using `tf.ragged.constant`

:

```
# sample array of varying dimension sizes
ragged_array = [
[1.5, 3.0, 2.3],
[4.5, 0.5],
[0.8]]
# we can not convert it to a dense tensor
try:
tensor = tf.constant(ragged_array)
except Exception as e:
print(f"{type(e).__name__}: {e}")
# instead we can convert it to a ragged tensor
ragged_tensor = tf.ragged.constant(ragged_array)
print("\nConverted to ragged tensor:")
print(ragged_tensor)
```

Output:

You notice that the ragged tensor is not presented like normal tensors. Instead, it is encoded such that its variable-length rows are concatenated into a flattened list. The flattened list has row partitions that indicate the row divisions:

That encoding gives us more ways in which we can construct ragged tensors. We can pair flat *value* tensors with *row-partitioning* tensors, indicating how those values should be divided into rows. We can use the following methods:

`value_rowids`

partitioning tensor:`tf.RaggedTensor.from_value_rowids`

`row_length`

partitioning tensor:`tf.RaggedTensor.from_row_lengths`

`row splits`

partitioning tensor:`tf.RaggedTensor.from_row_splits`

`value_rowids`

partitioning tensorYou can create a tensor if you know which row each value belongs to. Creating a tensor this way is **handy when you have a set of values and a corresponding row assignment for each value and you want to create a ragged tensor where each row represents a group of values assigned to the same row.**

It is also an efficient way of storing ragged tensors with many empty rows since the size of the tensor depends only on the total number of values.

Example:

```
# tf.RaggedTensor.from_value_rowids
values = tf.constant([20, 30, 40, 50, 60, 70]) # must be a vector
row_ids = tf.constant([0, 0, 0, 1, 1, 2]) # integer vector specifying the
# row index for each value
ragged_tensor = tf.RaggedTensor.from_value_rowids(values = values,
value_rowids = row_ids)
print("Values:" ,values.numpy())
print("Row ids:" ,row_ids.numpy())
print("Ragged tensor from value_row_ids:\n" , ragged_tensor)
```

Output:

`row_lengths`

partitioning tensorYou can create a tensor if you know how long each row is. Creating a tensor this way is **handy when concatenating ragged tensors** since row lengths do not change when two tensors are concatenated together.

Example:

```
# tf.RaggedTensor.from_row_lengths
values = tf.constant([20, 30, 40, 50, 60, 70]) # must be a vector
row_lengths = tf.constant([3, 2, 1]) # integer vector specifying the length of each row
# ragged tensor
ragged_tensor2 = tf.RaggedTensor.from_row_lengths(values = values,
row_lengths = row_lengths)
print("Values:" ,values.numpy())
print("Row lengths:" ,row_lengths.numpy())
print("Ragged tensor from row lengths:\n" , ragged_tensor2)
```

Output:

`row_splits`

partitioning tensorYou can create a tensor if you know the index where each row starts and ends. The `row_splits`

**enable quick indexing and slicing into ragged tensors** since TensorFlow can quickly determine each row's starting and ending indices.

Example:

```
# tf.RaggedTensor.from_row_splits
values = tf.constant([20, 30, 40, 50, 60, 70]) # must be a vector
row_splits = tf.constant([0, 3, 5, 6]) # integer vector specifying
# the split points between rows
ragged_tensor3 = tf.RaggedTensor.from_row_splits(values = values,
row_splits = row_splits)
print("Values:" ,values.numpy())
print("Row splits:" ,row_lengths.numpy())
print("Ragged tensor from row_splits:\n" , ragged_tensor3)
```

Output:

The outermost dimension of a ragged tensor is always uniform(it has the same length) since it consists of a single slice(`ragged_tensor3.shape[0]`

). The remaining dimensions can be ragged or uniform.

We can view the shape of a tensor with the `shape`

method:

```
# shape of the ragged_tensor3 above
print(ragged_tensor3.shape)
# the outer dimension is ragged_tensor3.shape[0]
# Returns
# (3, None)
```

The above code gives the **static shape of the ragged tensor**. The outer dimension is indicated by 3, representing the total number of rows. The ragged dimension is always represented by `None`

, which indicates the rows have varying lengths.

Viewing the **dynamic shape** with `tf.shape`

gives more details about the lengths of the ragged tensor dimensions:

```
# dynamic shape of the ragged_tensor3 above
print(tf.shape(ragged_tensor3))
# Returns
# <DynamicRaggedShape lengths=[3, (3, 2, 1)] num_row_partitions=1>
```

The results show that the ragged tensor has 3 rows with lengths 3, 2, and 1.

** How do we describe the shape of a ragged tensor? **For instance, for a ragged tensor that will store the word embeddings for each word in a batch of sentences?

When describing a ragged tensor, we enclose the ragged dimensions in parenthesis. For example, we can write `[num_sentences, (num_words), embedding_size]`

. That conveys that the size of those dimensions can vary across different rows.

Sparse tensors are tensors that contain a lot of zero values. When you have tensors with many zero values, storing them in a sparse tensor improves space and time on computations. These tensors are common in areas like NLP for data preprocessing and computer vision.

We construct a sparse tensor by supplying the following components to `tf.sparse.SparseTensor`

:

`indices`

: A 2-D int64 tensor of shape`[N, rank]`

(number of values, number of dimensions) which specifies the indices of the elements in the sparse tensor that contain nonzero values.`values`

: A 1-D tensor of any type and shape`[N]`

with all the nonzero values of the tensor.`dense_shape`

: A 1-D int64 tensor of shape`[rank]`

, specifying the dense shape of the sparse tensor.

Example:

```
# indices of non-zero values
indices = tf.constant([[0, 1], [1, 2], [2, 0]], dtype=tf.int64)
# the nonzero values in the tensor
values = tf.constant([15, 25, 35], dtype=tf.float32)
# define the tensor's shape
shape = tf.constant([4,3], dtype=tf.int64)
sparse_tensor = tf.sparse.SparseTensor(indices = indices,
values = values,
dense_shape= shape)
'''Results
Printing the result will return the components: Visualize below.
'''
```

The sparse tensor has 4 rows and 3 columns (`shape [4, 3]`

). We can then visualize the tensor represented by this sparse tensor by **converting the sparse tensor to a dense tensor with tf.sparse.to_dense**:

```
# convert sparse tensor to dense tensor
sparse_to_dense_tensor = tf.sparse.to_dense(sparse_tensor)
print(sparse_to_dense_tensor)
''' Results
tf.Tensor(
[[ 0. 15. 0.]
[ 0. 0. 25.]
[35. 0. 0.]
[ 0. 0. 0.]], shape=(4, 3), dtype=float32)
'''
```

✍️ To **convert the dense tensor back to sparse**, use `tf.sparse.from_dense`

.

Indexing tensors follows the basic Python and NumPy indexing rules, which include:

- Indexes start at
`0`

- Negative indices count backward from the end.
- Colons,
`:`

, are used for slices:`start:stop:step`

**Example single index indexing and slicing:**

```
tensor = tf.constant([20, 30, 40, 50, 15, 45, 100, 120])
print(f"Tensor: {tensor.numpy()}")
# return everything
print(f"Return everything: {tensor[:]}")
print(f"First 3 elements: {tensor[:3]}")
print(f"All elements after first 3 elements: {tensor[3:]}")
print(f"Every other item: {tensor[::2]}")
print(f"Elements between fourth and before 7th element: {tensor[3:6]}")
print(f"Reversing: {tensor[::-1]}")
```

Output:

We index higher dimensional tensors by passing multiple indices.

```
rank_2tensor = tf.constant([[20, 30, 40], [50, 15, 45], [100, 120, 150]])
print(f"Tensor:\n{rank_2tensor.numpy()}")
print(f"Second row:, {rank_2tensor[1, :].numpy()}")
print(f"Second column:, {rank_2tensor[:, 1].numpy()}")
print(f"Last row:, {rank_2tensor[-1, :].numpy()}")
print(f"First item in last column:, {rank_2tensor[0, -1].numpy()}")
print(f"Second row onwards:\n {rank_2tensor[1:, :].numpy()}")
```

Output:

**Slicing tensor with** `tf.slice`

:

`tf.slice`

takes `begin`

and `size`

parameters. `begin`

specifies the start index for the slicing, while the`size`

specifies the number of elements to slice.

Example slicing rank 1 tensor:

```
tensor = tf.constant([20, 30, 40, 50, 15, 25, 60])
print(f"Tensor:\n{rank_2tensor.numpy()}")
# slice with tf.slice
begin = [2] # begin at index 2
size = [3] # number of elements to slice starting from begin index
t_slice = tf.slice(tensor, begin = begin, size=size)
print(f"\nSlice: {t_slice.numpy()}") # similar to tensor[2:5]
```

Output:

Example 2 slicing rank 3 tensor:

```
r3_tensor = tf.constant([
[[20, 30, 40, 15],
[50, 15, 25, 60]],
[[5, 16, 21, 17],
[9, 11, 35, 13]]])
r3_slice = tf.slice(r3_tensor, begin=[1, 1, 1], size=[1, 1, 2])
print(f"Tensor: {r3_tensor.numpy()}")
print(f"\nr3_slice: {r3_slice.numpy()}")
'''
Tensor:
[[[20 30 40 15]
[50 15 25 60]]
[[ 5 16 21 17]
[ 9 11 35 13]]]
r3_slice: [[[11 35]]]
'''
```

**Slicing tensor with **`tf.gather`

** :**

`tf.gather`

extracts specific `indices`

from a single axis/dimension of a tensor. The indices must be an integer tensor of any dimension but primarily 1D.

Example 1. Gather elements from rank 1 tensor:

```
print(f"r1_tTensor: {tensor.numpy()}")
print(f"Gathered: {tf.gather(tensor, indices=[1, 5])}") # take index 1 and 5
'''
r1_tTensor: [20 30 40 50 15 25 60]
Gathered: [30 25]
'''
```

Example 2. Using `batch_dims`

:

```
tensor_2d = tf.constant([
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
indices = tf.constant([
[2, 4],
[0, 4],
[1, 3]])
print(tensor_2d.numpy())
print(tf.gather(tensor_2d, indices=indices, batch_dims=1, axis=1).numpy())
```

Output:

`batch_dims`

helps gather different items from each batch element along a specified axis by looping over the first axis of the tensor and the indices.

**Note that ragged tensors can also be indexed. However, we cannot index on the ragged dimensions since a value may exist in some rows but not in others.**

Tensors need to meet specific requirements for various machine-learning models. Knowing how to manipulate them to a particular structure or shape ensures we can handle diverse data formats and ensure their compatibility throughout the model development process. In this section, we will look at the following **operations for reshaping and manipulating tensors**:

- Reshaping
`tf.reshape`

- Swapping dimensions
`tf.transpose`

- Reducing dimensions
`tf.squeeze`

- Expanding dimensions
`tf.expand_dims`

- Joining tensors
`tf.concat`

Reshaping tensors is a critical concept employed in the preprocessing of data or in situations where the shape of a tensor needs to be adjusted to meet the requirements of a particular operation or model( for instance, in preparing the input data for a neural network model.

`tf.reshape`

enables us to reshape a tensor without altering its data. It does not change the order or total number of elements in the tensor.

Example 1:

```
tensor = tf.constant([[1, 2, 3, 4],
[5, 6, 7, 8]])
print(f"Tensor:\n{tensor.numpy()} \nOld shape: {tensor.shape}")
# reshape
tensor = tf.reshape(tensor, [2, 2, 2])
print(f"Reshaped:\n{tensor.numpy()} \nNew shape: {tensor.shape}")
```

Output:

Example 2:

```
var_tensor = tf.Variable([[[1, 2, 3, 4],
[5, 6, 7, 8]],
[[9, 10, 11, 12],
[13, 14, 15, 16]]])
print(f"Tensor:\n{var_tensor.numpy()} \nOld shape: {var_tensor.shape}")
# reshape
var_tensor = tf.reshape(var_tensor, [4, 4])
print(f"Reshaped:\n{var_tensor.numpy()} \nNew shape: {var_tensor.shape}")
```

Output:

💡

Note that we can not reshape the tensor beyond its total number of elements. For instance, if a tensor is 3x2(6 elements), we can not reshape it to 4x2( 8 elements).

We can **flatten a tensor(rank 1/1D)** by specifying -1 as the shape.

```
# flatten the variable tensor
print(f"Tensor:\n{var_tensor.numpy()}")
var_tensor = tf.reshape(var_tensor, [-1])
print(f"Flattened:\n{var_tensor.numpy()} \nNew shape: {var_tensor.shape}")
```

Output:

Transposing a tensor means switching its rows and columns. For instance, in a rank 2 tensor, that means swapping its rows and columns. We achieve this using `tf.transpose`

.

Example 1:

```
var_tensor = tf.reshape(var_tensor, [2, 2, 4])
print(f"Tensor:\n{var_tensor.numpy()} shape before: {tf.shape(var_tensor)}")
# transpose
var_tensor = tf.transpose(var_tensor) # row => columns, columns => rows
print(f"Transposed:\n{var_tensor.numpy()}\
shape after: {tf.shape(var_tensor)}")
```

Output:

Notice that the rows become the columns and vice versa.

In certain cases where we have tensors with singleton dimensions, for instance, batches of size one, we may want to remove them to have a more concise representation of the data. Squeezing does not change the data in the tensor but only modifies its shape. We can achieve that with `tf.squeeze`

.

Example 1:

```
tensor = tf.constant([[[10], [15], [30]]])
print(f"Tensor:\n{tensor} => shape: {tf.shape(tensor)}")
# squeeze
tensor = tf.squeeze(tensor)
print(f"\nSqueezed: {tensor.numpy()} => New shape: {tf.shape(tensor)}")
```

Output:

Example 2: Specifying the axis if you do not want to remove all size 1 dimensions

```
tensor3 = tf.constant([[[10]],
[[11]],
[[9]]])
print(f"Tensor:\n{tensor3} => shape: {tf.shape(tensor3)}")
# specify the axis(squeeze axis 2)
tensor3 = tf.squeeze(tensor3, axis=[2])
print("\nSqueezed:")
print(f"{tensor3.numpy()} => New shape: {tf.shape(tensor3)}")
```

Output:

**Be aware** that you **must** specify the axis when squeezing a ragged tensor!.

Expanding dimensions involves adding size 1 dimensions to a tensor. That increases the rank of the tensor by one. It is the opposite of squeezing a tensor. Using`tf.expand_dims`

, we can specify the axis on which to add the dimension.

Expanding dimensions is a common practice, for instance when:

- Adding an outer "batch" dimension to a tensor, for instance, a tensor of shape
`(height, width, channels)`

storing image data. - Broadcasting for arithmetic operations with tensors of different shapes.

For example, we can add an outer batch to a tensor:

```
tensor = tf.constant([10, 20, 30])
print(f"Tensor:\n{tensor} => shape: {tf.shape(tensor)}")
# expand dimensions
tensor_expanded = tf.expand_dims(tensor, axis = [0])
print("Tensor expanded:")
print(f"{tensor_expanded} => shape: {tf.shape(tensor_expanded)}")
```

Output:

Specifying a negative axis will add an innermost dimension:

```
tensor_expanded = tf.expand_dims(tensor, axis = [-1]) # innermost ndim
print("Tensor expanded(axis=-1):")
print(f"{tensor_expanded} => shape: {tf.shape(tensor_expanded)}")
```

Output:

We can join two tensors along a particular dimension. For that to work, the **tensors must have a similar number of dimensions and equal dimensions**. `tf.concat`

helps us achieve that.

Example:

```
tensor_1 = tf.constant([[1, 2, 3],
[4, 5, 6]])
tensor_2 = tf.constant([[7, 8, 9],
[10, 11, 12]])
print(f"Tensor 1:\n{tensor_1}")
print(f"Tensor 2:\n{tensor_2}")
# concatenate along axis=0
tensor1_tensor2 = tf.concat([tensor_1, tensor_2], axis=[0])
print("tensor_1 and tensor_2 joined(axis=0):")
print(f"{tensor1_tensor2} => shape: {tf.shape(tensor1_tensor2)}")
```

Output:

Concatenating along axis 1:

```
# concatenate along axis=1
tensor1_tensor2 = tf.concat([tensor_1, tensor_2], axis=[1])
print("tensor_1 and tensor_2 joined(axis=1):")
print(f"{tensor1_tensor2} => shape: {tf.shape(tensor1_tensor2)}")
```

Output:

✍️ Consider exploring`tf.stack`

, and `tf.tile`

as an exercise for this section!

Broadcasting tensors is a concept very similar to NumPy's broadcasting concept. It allows operations to be performed on tensors of different shapes. The smaller tensor is stretched to match the shape of the larger tensor, enabling seamless elementwise operations.

For instance, if we multiply a tensor by a scalar, the scalar is stretched to match the shape of the tensor:

```
tensor = tf.constant([5, 10, 15])
#multiply by scalar 5
print(tensor * 5)
''' Results
tf.Tensor([25 50 75], shape=(3,), dtype=int32)
'''
```

To understand broadcasting in tensors, we can review NumPy's **broadcasting rules** but with tensors in mind:

- Rule 1: If the two tensors vary in their number of dimensions, the shape of the one with lesser dimensions is
*padded*with ones on its left side. - Rule 2: If the shape of the two tensors does not match in any dimension, the tensor with a shape of 1 in that dimension is stretched to match the other shape.
- Rule 3: If, in any dimension, the sizes differ and neither is 1, an error is raised.

Let's understand the rules with a few examples:

Example 1: Adding a rank two tensor to a rank one tensor:

```
rank1_t = tf.constant([1, 2 , 3])
rank2_t = tf.constant([[5, 10, 15],
[20, 25, 30]])
# shapes
print(f"rank1_t shape: {rank1_t.shape}")
print(f"rank2_t shape: {rank2_t.shape}")
'''
rank1_t shape: (3,)
rank2_t shape: (2, 3)
'''
```

- The above tensors have different shapes. By rule 1,
`rank1_t`

has fewer dimensions, so it is padded with ones on the left. The new shapes are now:`rank1_t shape: (1, 3)`

, and`rank2_t shape: (2, 3)`

. - Next, we see that the shapes in their first dimensions differ. By rule 2, we stretch
`rank1_t`

- since its first dimension is of size 1 - to match the shape of`rank2_t`

. The new shapes are now:`rank1_t shape: (1, 3)`

, and`rank2_t shape: (2, 3)`

. - Since now the shapes match, we can add the two tensors. The shape of the resulting tensor will be
`(2, 3)`

.

```
# add
r2_plus_r1 = rank2_t + rank1_t
print(f"r2_plus_r1:\n {r2_plus_r1} => shape{r2_plus_r1.shape}")
```

Visualize:

Example 2: Broadcasting two tensors

```
t1 = tf.constant([1, 2 , 3, 4])
t2 = tf.constant([[10],
[20],
[30],
[40]])
# shapes
print(f"t1 shape: {t1.shape}")
print(f"t2 shape: {t2.shape}")
'''Result
t1 shape: (4,)
t2 shape: (4, 1)
'''
```

- The above tensors have different shapes. By rule 1,
`t1`

has fewer dimensions, so it is padded with ones on the left. The new shapes are now:`t1 shape: (1, 4)`

, and`t2 shape: (4, 1)`

. - Next, we see that their shapes differ. By rule 2, we stretch both to match the shape of the other - since they both have a dimension of size 1. The new shapes are now:
`t1 shape: (4, 4)`

, and`t2 shape: (4, 4)`

. - Since now the shapes match, we can multiply the two tensors. The shape of the resulting tensor will be
`(4, 4)`

.

```
# multiply
t2_x_t1 = t2 + t1
print("t2_x_t1:")
print(f"{t2_x_t1} => shape{t2_x_t1.shape}")
```

Output:

Visualize:

✍️ There are instances where broadcasting fails. For example, try adding tensors of `shape(4, 3)`

and `shape(4)`

and analyze why they are incompatible while referring to the rules!

We can perform various basic mathematical operations on tensors.

Example tensors:

```
# Example tensors
tensor_a = tf.constant([[1, 2], [3, 4]])
tensor_b = tf.constant([[5, 6], [7, 8]])
print("Tensor A:")
print(tensor_a)
print("Tensor B:")
print(tensor_b)
```

Element-wise tensor addition operation.

```
# Addition
result_addition = tensor_a + tensor_b
print("Addition Result:")
print(result_subtraction.numpy())
```

Element-wise tensor subtraction operation.

```
# Subtraction
result_subtraction = tensor_a - tensor_b
print("Subtraction Result:")
print(result_subtraction.numpy())
```

We can perform **element-wise multiplication** with `tf.multiply`

or **matrix multiplication** with `tf.matmul`

.

**Element-wise multiplication** involves multiplying the corresponding elements of two tensors or matrices. That means that the element on the first row and first column of the resultant tensor is the product of the elements on the first row and first column of the input tensors. The idea is true for other rows and columns of the result. It is the `a * b`

operation in Python but done with `tf.multiply`

.

**Matrix multiplication** involves finding the **dot product **of each row of matrix *A* with each column of matrix *B*. Each element of the resulting matrix is the sum of the products of the corresponding elements in the selected row of A and column of B. **The only requirement for matrix multiplication** is that the number of columns in the first matrix must equal the number of rows in the second matrix.

Visualize matrix multiplication here!

Example:

```
# Element-wise multiplication
result_elementwise_multiplication = tf.multiply(tensor_a, tensor_b)
# Matrix multiplication
result_matrix_multiplication = tf.matmul(tensor_a, tensor_b)
print("Element-wise Multiplication Result:")
print(result_elementwise_multiplication.numpy())
print("Matrix Multiplication Result:")
print(result_matrix_multiplication.numpy())
```

Element-wise tensor division.

```
# Element-wise division
result_elementwise_division = tf.divide(tensor_a, tensor_b)
print("Element-wise Division Result:")
print(result_elementwise_division.numpy())
```

Aggregation deriving a reduced summary of a tensor's information like its mean, sum, maximum, and minimum value or other statistical measures. In this section, we will look at the following aggregation functions:

`tf.reduce_sum`

`tf.reduce_mean`

`tf.reduce_min and tf.reduce max`

.`tf.argmax`

, and`tf.argmin`

```
example_t = tf.constant([[5, 10, 15],
[20, 25, 30]])
print(example_t.numpy()
```

We can compute the sum of elements across a tensor's axis with `tf.reduce_sum`

:

```
sum_of_all_elems = tf.reduce_sum(example_t)
sum_axis_0 = tf.reduce_sum(example_t, axis = 0)
sum_axis_1 = tf.reduce_sum(example_t, axis = 1)
print(f"Tensor: \n{example_t.numpy()}")
print("Sum of all elements:", sum_of_all_elems.numpy())
print("Sum on axis 0:", sum_axis_0.numpy())
print("Sum on axis 1:", sum_axis_1.numpy())
```

We can compute the sum of elements across a tensor's axis with `tf.reduce_mean`

:

```
tensor_mean = tf.reduce_mean(example_t)
mean_axis_0 = tf.reduce_mean(example_t, axis = 0)
mean_axis_1 = tf.reduce_mean(example_t, axis = 1)
print(f"Tensor: \n{example_t.numpy()}")
print("Mean of all elements:", tensor_mean.numpy())
print("Mean on axis 0:", mean_axis_0.numpy())
print("Mean on axis 1:", mean_axis_1.numpy())
```

We can compute the minimum and maximum elements across a tensor's axis with `tf.reduce_min`

, and `tf.reduce_max`

.

```
# Along axis 0
max_axis_0 = tf.reduce_max(example_t, axis=0)
min_axis_0 = tf.reduce_min(example_t, axis=0)
# Along axis 1
max_axis_1 = tf.reduce_max(example_t, axis=1)
min_axis_1 = tf.reduce_min(example_t, axis=1)
print(f"Tensor: \n{example_t.numpy()}")
print("Max axis 0:", max_axis_0.numpy())
print("Min axis 0:", min_axis_0.numpy())
print("Max axis 1:", max_axis_1.numpy())
print("Min axis 1:", min_axis_1.numpy())
```

We can find the smallest and largest value across a tensor dimension with `tf.argmin`

and `tf.argmax`

.

```
# Along axis 0
argmax_axis_0 = tf.argmax(example_t, axis=0)
argmin_axis_0 = tf.argmin(example_t, axis=0)
# Along axis 1
argmax_axis_1 = tf.argmax(example_t, axis=1)
argmin_axis_1 = tf.argmin(example_t, axis=1)
print(f"Tensor: \n{example_t.numpy()}")
print("Index of max value axis 0:", argmax_axis_0.numpy())
print("Index of min value axis 0:", argmin_axis_0.numpy())
print("Index of max value axis 1:", argmax_axis_1.numpy())
print("Index of min value axis 1:", argmin_axis_1.numpy())
```

TensorFlow tensors form the foundation of numerical computation and data representation within the TensorFlow framework. They are versatile data structures that enable efficient handling of multidimensional data, making them essential for machine learning and deep learning tasks.

This article has given you a solid understanding of the basics of creating, manipulating, and aggregating tensors, which is crucial for building robust and effective machine-learning models.

As you delve into the world of TensorFlow, a solid grasp of tensors and their operations will undoubtedly enhance your ability to design and implement sophisticated machine learning algorithms. Keep exploring the extensive capabilities of TensorFlow tensors to unlock the full potential of your data-driven applications.

- Implementing Transformer decoder for text generation in Keras and TensorFlow
- Object detection with TensorFlow 2 Object detection API
- How to train deep learning models on Apple Silicon GPU
- How to build CNN in TensorFlow(examples, code, and notebooks)
- How to build artificial neural networks with Keras and TensorFlow
- Custom training loops in Keras and TensorFlow
- Flax vs. TensorFlow
- How to build TensorFlow models with the Keras Functional API

*Follow us on **LinkedIn**, **Twitter**, **GitHub**, and *subscribe to our__ __*blog*,* so you don't miss a new issue.*

The recent wave of generative language models is the culmination of years of research starting with the seminal "Attention is All You Need" paper. The paper introduced the **Transformer** architecture that would later be used as the backbone for numerous language models. These text generation language models are ** autoregressive**, meaning that they predict one token at a time. They work by

In this blog, we will take a step back and build a text generation model using Keras and TensorFlow. This will involve building the following building blocks:

- The position encoding layer
- The embedding layer
- The Transformer decoder layer
- The Transformer decoder Keras model
- The Keras module for text generation

This piece assumes you know how to build artificial neural networks with Keras and TensorFlow. Check out the notebook with the entire code at the end of the post.

Training a Transformer-based model is compute-intensive and requires a GPU accelerator. Ensure that you have one by running this command:

`nvidia-smi`

If no GPU shows up, ensure that you have a GPU and have installed all the required GPU drivers and libraries. Otherwise, you may still be able to train the Transformer but it will be extremely slow.

You can access GPUs for free on Google Colab by clicking on Runtime and changing the Hardware accelerator to T4 GPU.

Kaggle Notebooks also gives access to GPUs for free.

Follow the instructions on the Install TensorFlow with pip page to install TensorFlow locally if you have a GPU.

`python3 -m pip install tensorflow[and-cuda]`

Confirm that TensorFlow can access the GPU:

```
import tensorflow as tf
tf.config.list_physical_devices("GPU")
```

We will use the 190k+ Medium Articles dataset to train the Tranformer. Import the packages needed for this project and load the dataset using Pandas.

```
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
df = pd.read_csv(
"/kaggle/input/medium-articles/medium_articles.csv",
)
df = df[["text"]]
df.head()
```

We will train the Transformer using the first 90% of the samples and use the rest for validation. Split the data using this criterion:

```
n = int(0.9 * len(df)) # first 90% will be train, rest val
train_examples = df[:n]
val_examples = df[n:]
```

Next, ensure that the data is in the tf.data format for easy caching and batch creation.

```
train_examples = tf.data.Dataset.from_tensor_slices((train_examples))
val_examples = tf.data.Dataset.from_tensor_slices((val_examples))
```

Up to this point, the data is still in text form. We need to convert it into a numerical form before we can pass it to the Transformer model. The TextVectorization layer maps the text into integers. Some of the parameters it accepts are:

`standardize`

in this case`lower_and_strip_punctuation`

to lowercase the text and remove punctuations`max_tokens`

to determine the maximum size of the vocabulary

Call `adapt`

once the `TextVectorization`

layer has been initialized to create the vocabulary.

```
max_features = 5000 # Maximum vocab size
BATCH_SIZE = 32
MAX_TOKENS = 128
vectorize_layer = tf.keras.layers.TextVectorization(
standardize="lower_and_strip_punctuation",
max_tokens=max_features,
)
vectorize_layer.adapt(train_examples, batch_size=None)
```

Create a vocabulary variable that we will use for converting the predicted token IDs to words.

```
vocabulary = vectorize_layer.get_vocabulary()
```

Instead of feeding the data one by one into the Transformer, we create batches because it is more effective.

```
def prepare_batch(data):
x = vectorize_layer(data)
x = x[:, :(MAX_TOKENS)] # Trim to MAX_TOKENS
X_train = x[:, :-1] # Shift by one
y_train = x[:, 1:] # Shift by one
return (X_train, y_train)
```

On lines 4 and 5 above we shift the values by one to ensure that the Transformer decoder doesn't see a future word when trying to predict it. Check out the NumPy tutorial to learn more about indexing arrays.

Setting `prefetch`

allows the data to be fetched ahead of time meaning that fetching data is not a bottleneck in the training process. Setting `tf.data.AUTOTUNE`

means that the buffer size for the prefetching will be set dynamically. You can however set this value manually.

```
def make_batches(ds):
return (
ds.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.map(prepare_batch, tf.data.AUTOTUNE)
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
```

Next, run the functions to create training and validation batches:

```
# Create training and validation set batches
train_batches = make_batches(train_examples)
val_batches = make_batches(val_examples)
```

Check the shapes for future reference:

```
for X_train, y_train in train_batches.take(1):
break
print(X_train.shape)
print(y_train.shape)
"""
(32, 127)
(32, 127)
"""
```

Grab one batch for testing various components of the Transformer decoder:

```
for x_batch, y_batch in train_batches.take(1):
break
```

Recurrent Neural Networks (RNNs) were a popular way of dealing with sequence data before the introduction of the Transformer architecture. Transformers are better than RNNs because they:

- Can run in parallel hence they are more computationally efficient on accelerators such as GPUs
- Are better at modeling long-range relationships and thus can easily learn longer connections
- Are great at modeling sequence data

The Transformer we will build is modified from the official TensorFlow docs that was built for machine translation. We will modify it for text generation.

Our *attention* will be on the right side of the Transformer architecture:

We will use the Keras Embedding Layer to convert the tokens we created to vectors when passing them to the decoder. However, since the Transformer network has no recurrent layers, all the positional information would be lost. This is solved by introducing positional encoding into the network. In practice, this is done using a set of sines and cosines at different frequencies. In the original paper, the proposed formulae were:

```
def positional_encoding(length, depth):
depth = depth / 2
positions = np.arange(length)[:, np.newaxis] # (seq, 1)
depths = np.arange(depth)[np.newaxis, :] / depth # (1, depth)
angle_rates = 1 / (10000**depths) # (1, depth)
angle_rads = positions * angle_rates # (pos, depth)
pos_encoding = np.concatenate([np.sin(angle_rads), np.cos(angle_rads)], axis=-1)
return tf.cast(pos_encoding, dtype=tf.float32)
```

This can be visualized using Matplotlib as follows:

```
pos_encoding = positional_encoding(length=2048, depth=512)
# Check the shape.
print(pos_encoding.shape)
# Plot the dimensions.
plt.pcolormesh(pos_encoding.numpy().T, cmap="RdBu")
plt.ylabel("Depth")
plt.xlabel("Position")
plt.colorbar()
plt.show()
```

With position encoding set up, we can proceed to create the position embedding layer. The objective is to get the embedding vector of a token and add its position vector. That way, position information will not be lost.

```
class PositionalEmbedding(tf.keras.layers.Layer):
def __init__(self, vocab_size, d_model):
super().__init__()
self.d_model = d_model
self.embedding = tf.keras.layers.Embedding(vocab_size, d_model, mask_zero=True)
self.pos_encoding = positional_encoding(length=2048, depth=d_model)
def compute_mask(self, *args, **kwargs):
return self.embedding.compute_mask(*args, **kwargs)
def call(self, x):
length = tf.shape(x)[1]
x = self.embedding(x)
# This factor sets the relative scale of the embedding and positonal_encoding.
x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
x = x + self.pos_encoding[tf.newaxis, :length, :]
return x
```

On line 5 above, we set up the word embeddings.** Word embedding** is a technique used to represent documents with a dense vector representation. The vocabulary in these documents is mapped to real number vectors. Semantically similar words are mapped close to each other in the vector space.

A word embedding represents the words in a text corpus with floating point values while considering the relationship between the different words. These relationships are learned when training the embeddings. The size of the embedding vector can be assigned manually. The Embedding layer is used for learning word embeddings in TensorFlow.

Understanding attention is critical before we start using the building blocks provided by TensorFlow.

In the Transformer attention is computed using **queries, keys, **and** values**. The computation is done by weighting the sum of the values by mapping the key-value pairs with each value having a given weight.

The attention is computed using the formula:

where:

`dk`

is the dimension of the key vector, making the square root 8- Q is the query matrix
- K and V are the key and value matrices
- Dividing by the square root of
`dk`

is a scaling factor that stabilizes gradients

To obtain Query, Key, and Value matrices the input query, key, and value are multiplied.

** Self-attention** comes from the factor each word in the sentence is scored against all the words. So a word can attend to itself. The scores are passed through a softmax function where they will sum to 1, making them easy to interpret as probabilities.

When you run multiple attention layers at the same time it leads to ** Multi-head Attention. **This is done by running the attention layers at the same time, concatenating the results, and passing them to the feedforward layer.

The original paper has 8 parallel heads. After concatenation, the results are multiplied with another weight to form a one-weight matrix that is passed to the feed-forward network.

The attention layer is defined using:

- MultiHeadAttention an implementation of the query, key, and value mechanism as defined in the Transformer paper
- LayerNormalization for efficient Transformer training
- Add for adding multiple layers, popularly known as residual connections

The following defines self-attention since the query, key, and value are the same. Projecting it several times makes it multihead attention.

The decoder network is *autoregressive* meaning that it generates one token at a time. It should, therefore, not see future tokens during training, otherwise, it would start memorizing them and not learning anything. To make this possible, pass the `use_causal_mask=True`

in the `MultiHeadAttention`

layer to mask future tokens. Setting `return_attention_scores`

to True is important so that they are available for plotting after training.

```
class BaseAttention(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super().__init__()
self.mha = tf.keras.layers.MultiHeadAttention(**kwargs)
self.layernorm = tf.keras.layers.LayerNormalization()
self.add = tf.keras.layers.Add()
class CausalSelfAttention(BaseAttention):
def call(self, x):
attn_output, attn_scores = self.mha(
query=x, value=x, key=x, return_attention_scores=True, use_causal_mask=True
)
# Cache the attention scores for plotting later.
self.last_attn_scores = attn_scores
x = self.add([x, attn_output])
x = self.layernorm(x)
return x
```

Test the attention layer:

```
sample_csa = CausalSelfAttention(num_heads=2, key_dim=512)
print(x_batch_emb.shape)
print(sample_csa(x_batch_emb).shape)
```

The Transformer decoder includes a feedforward network with a ReLU activation. The network has two linear layers and a dropout layer.

```
class FeedForward(tf.keras.layers.Layer):
def __init__(self, d_model, dff, dropout_rate=0.1):
super().__init__()
self.seq = tf.keras.Sequential(
[
tf.keras.layers.Dense(dff, activation="relu"),
tf.keras.layers.Dense(d_model),
tf.keras.layers.Dropout(dropout_rate),
]
)
self.add = tf.keras.layers.Add()
self.layer_norm = tf.keras.layers.LayerNormalization()
def call(self, x):
x = self.add([x, self.seq(x)])
x = self.layer_norm(x)
return x
```

The Transformer decoder layer will contain two main building blocks:

- The self-attention layer
- The feedforward network

```
class DecoderLayer(tf.keras.layers.Layer):
def __init__(self, *, d_model, num_heads, dff, dropout_rate=0.1):
super(DecoderLayer, self).__init__()
self.causal_self_attention = CausalSelfAttention(
num_heads=num_heads, key_dim=d_model, dropout=dropout_rate
)
self.ffn = FeedForward(d_model, dff)
def call(self, x):
x = self.causal_self_attention(x=x)
# Cache the last attention scores for plotting later
self.last_attn_scores = self.causal_self_attention.last_attn_scores
x = self.ffn(x) # Shape `(batch_size, seq_len, d_model)`.
return x
```

To define the Transformer decoder in TensorFlow you need:

- The positional embedding layer
- A stack of decoder layers

```
class Decoder(tf.keras.layers.Layer):
def __init__(
self, *, num_layers, d_model, num_heads, dff, vocab_size, dropout_rate=0.1
):
super(Decoder, self).__init__()
self.d_model = d_model
self.num_layers = num_layers
self.pos_embedding = PositionalEmbedding(vocab_size=vocab_size, d_model=d_model)
self.dropout = tf.keras.layers.Dropout(dropout_rate)
self.dec_layers = [
DecoderLayer(
d_model=d_model, num_heads=num_heads, dff=dff, dropout_rate=dropout_rate
)
for _ in range(num_layers)
]
self.last_attn_scores = None
def call(self, x):
# `x` is token-IDs shape (batch, target_seq_len)
x = self.pos_embedding(x) # (batch_size, target_seq_len, d_model)
x = self.dropout(x)
for i in range(self.num_layers):
x = self.dec_layers[i](x)
self.last_attn_scores = self.dec_layers[-1].last_attn_scores
# The shape of x is (batch_size, target_seq_len, d_model).
return x
```

Test the decoder:

```
# Instantiate the decoder.
sample_decoder = Decoder(
num_layers=4, d_model=512, num_heads=8, dff=2048, vocab_size=8000
)
output = sample_decoder(x=x_batch)
# Print the shapes.
print(x_batch.shape)
print(x_batch_emb.shape)
print(output.shape)
```

We now have all the building blocks required to define the Keras Transformer decoder. The final step is to put them together and add a final dense layer to output final predictions from the network as logits.

```
class Transformer(tf.keras.Model):
def __init__(
self, *, num_layers, d_model, num_heads, dff, input_vocab_size, dropout_rate=0.1
):
super().__init__()
self.decoder = Decoder(
num_layers=num_layers,
d_model=d_model,
num_heads=num_heads,
dff=dff,
vocab_size=input_vocab_size,
dropout_rate=dropout_rate,
)
self.final_layer = tf.keras.layers.Dense(input_vocab_size)
def call(self, inputs):
# To use a Keras model with `.fit` you must pass all your inputs in the
# first argument.
x = inputs
x = self.decoder(x) # (batch_size, target_len, d_model)
# Final linear layer output.
logits = self.final_layer(x) # (batch_size, target_len, target_vocab_size)
try:
# Drop the keras mask, so it doesn't scale the losses/metrics.
# b/250038731
del logits._keras_mask
except AttributeError:
pass
# Return the final output and the attention weights.
return logits
```

Before we can train the model we have to get some settings out of the way.

The original paper proposed training the Transformer with the Adam optimizer with a custom scheduler:

```
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, d_model, warmup_steps=4000):
super().__init__()
self.d_model = d_model
self.d_model = tf.cast(self.d_model, tf.float32)
self.warmup_steps = warmup_steps
def __call__(self, step):
step = tf.cast(step, dtype=tf.float32)
arg1 = tf.math.rsqrt(step)
arg2 = step * (self.warmup_steps**-1.5)
return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)
learning_rate = CustomSchedule(d_model)
optimizer = tf.keras.optimizers.Adam(
learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9
)
```

Define masked loss and accuracy:

```
def masked_loss(label, pred):
mask = label != 0
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction="none"
)
loss = loss_object(label, pred)
mask = tf.cast(mask, dtype=loss.dtype)
loss *= mask
loss = tf.reduce_sum(loss) / tf.reduce_sum(mask)
return loss
def masked_accuracy(label, pred):
pred = tf.argmax(pred, axis=2)
label = tf.cast(label, pred.dtype)
match = label == pred
mask = label != 0
match = match & mask
match = tf.cast(match, dtype=tf.float32)
mask = tf.cast(mask, dtype=tf.float32)
return tf.reduce_sum(match) / tf.reduce_sum(mask)
```

Train the model and save the history variable for easy plotting:

```
transformer.compile(loss=masked_loss, optimizer=optimizer, metrics=[masked_accuracy])
history = transformer.fit(train_batches, epochs=10, validation_data=val_batches)
```

Plot the model's accuracy and loss using Matplotlib:

```
metrics_df = pd.DataFrame(history.history)
metrics_df[["loss", "val_loss"]].plot()
metrics_df[["masked_accuracy", "val_masked_accuracy"]].plot()
```

The accuracy is not the best. We will provide suggestions for improvement at the end, but this is a great start.

Next, try to predict a single token. The process is as follows:

- Provide a sentence, in this case, "Python"
- Vectorized the sentence
- Expand the dimensions to add a batch dimension
- Perform prediction
- Select the last token from the predictions
- Decode the prediction, in this case using the
`argmax`

to give the token with the highest score - Convert the token to a word using StringLookup

```
sentence = "Python"
x = vectorize_layer(sentence)
x = tf.expand_dims(x, axis=0)
prediction = transformer(x)
predicted_id = tf.argmax(predictions, axis=-1)
id_to_word = tf.keras.layers.StringLookup(
vocabulary=vocabulary, mask_token="", oov_token="[UNK]", invert=True
)
```

In this case, the Transformer predicted "is" as the next likely word:

The word "is" has the highest score, but there are other words the model could have chosen if we had picked a different decoding strategy. Dump all the words and their scores into a Pandas DataFrame, sort them by the scores, and plot them using Seaborn. This will allow us to see the other words that the Transformer predicted.

Interestingly, you can see that the Transformer was able to associate Python with other related terms such as data, libraries, Pandas, and programming.

Now generating one word is not fun. We need to be able to generate many words. To make that possible you would need to append the word that was just predicted to the sentence so that it can use that to generate the next word. The process would repeat until you get the maximum number of words you are interested in.

Here is how to append the generated token to the previous sentence:

```
x_concat = tf.experimental.numpy.append(x, predicted_id[0], axis=None)
```

Next, define a class that will generate as many tokens as you would like. The more tokens you generate the longer it will take to get the final output.

```
class Generator(tf.Module):
def __init__(
self,
tokenizer,
vocabulary,
transformer,
max_new_tokens,
temperature=0.0,
):
self.tokenizer = tokenizer
self.transformer = transformer
self.vocabulary = vocabulary
self.max_new_tokens = max_new_tokens
self.temperature = temperature
def __call__(self, sentence, max_length=MAX_TOKENS):
sentence = self.tokenizer(sentence)
sentence = tf.expand_dims(sentence, axis=0)
encoder_input = sentence
# `tf.TensorArray` is required here (instead of a Python list), so that the
# dynamic-loop can be traced by `tf.function`.
output_array = tf.TensorArray(dtype=tf.int64, size=0, dynamic_size=True)
print(f"Generating {self.max_new_tokens} tokens")
for i in tf.range(self.max_new_tokens):
output = tf.transpose(output_array.stack())
predictions = self.transformer(encoder_input, training=False)
# Select the last token from the `seq_len` dimension.
predictions = predictions[:, -1:, :] # Shape `(batch_size, 1, vocab_size)`.
if self.temperature == 0.0:
# greedy sampling, output always the same
predicted_id = tf.argmax(predictions, axis=-1)
else:
predictions = predictions / self.temperature
predicted_id = tf.random.categorical(predictions[0], num_samples=1)
# Concatenate the `predicted_id` to the output which is given to the
# decoder as its input.
output_array = output_array.write(i + 1, predicted_id[0])
encoder_input = tf.experimental.numpy.append(encoder_input, predicted_id[0])
encoder_input = tf.expand_dims(encoder_input, axis=0)
output = tf.transpose(output_array.stack())
# The output shape is `(1, tokens)`.
id_to_word = tf.keras.layers.StringLookup(
vocabulary=self.vocabulary, mask_token="", oov_token="[UNK]", invert=True
)
print(f"Using temperature of {self.temperature}")
text = id_to_word(output)
tokens = output
# `tf.function` prevents us from using the attention_weights that were
# calculated on the last iteration of the loop.
# So, recalculate them outside the loop.
self.transformer(output[:, :-1], training=False)
attention_weights = self.transformer.decoder.last_attn_scores
return text, tokens, attention_weights
```

In the Generator above we define temperature as a decoding strategy. When you set the temperature the model doesn't pick the token with the highest score. Instead, the Transformer uses tf.random.categorical to draw one sample from the categorical distribution.

Decoding using temperature works by affecting the logits produced by the decoder. A value of 1 has no effect. Lowering the temperature is ideal for factual applications where you want the model to be more confident in its responses. You can increase the temperature for creative applications to make the responses from the Transformer more random hence more creative. However, the model can start making mistakes as you increase the temperature.

Other Transformer decoding strategies include:

In Top K sampling, the number of words to sample from is given. For example, if K is 80, the model will sample from the top 80 words words meaning that lower probability words won't get a chance to be selected. The problem with this strategy is that you have to manually select the value of K.

In Top P sampling, the words are chosen dynamically as long as their total cumulative probability exceeds p. For example, if the desired probability is 0.9, the model can choose 0.5 + 0.3 + 0.1. This is a better strategy because the number of words can be adjusted dynamically.

Generate 50 new tokens using a temperature of 0.92:

```
max_new_tokens = 50
temperature = 0.92
generator = Generator(
vectorize_layer, vocabulary, transformer, max_new_tokens, temperature,
)
def print_generation(sentence, generated_text):
print(f'{"Input:":15s}: {sentence}')
print(f'{"Generation":15s}: {generated_text}')
sentence = "Machine learning"
generated_text, generated_tokens, attention_weights = generator(sentence)
print_generation(sentence, generated_text)
```

From the above output, you can see that given the prompt "machine learning" the Transformer was able to generate some related text such as models and artificial intelligence. However, there is a lot of repetition of words such as "models". One of the strategies for solving this is to train a better model and to introduce repetition penalty where the model is penalized for repeating certain phrases and words. This can be ideal in creative writing.

You can create attention plots because the Transformer returns attention weights:

```
sentence = "Python"
def plot_attention_weights(sentence, generated_tokens, attention_heads):
in_tokens = vectorize_layer([sentence])
fig = plt.figure(figsize=(16, 8))
for h, head in enumerate(attention_heads):
ax = fig.add_subplot(2, 4, h + 1)
plot_attention_head(in_tokens, generated_tokens, head)
ax.set_xlabel(f"Head {h+1}")
plt.tight_layout()
plt.show()
generated_text, generated_tokens, attention_weights = generator(sentence)
print_generation(sentence, generated_text)
plot_attention_weights(sentence, generated_tokens, attention_weights[0])
plt.tight_layout()
```

Transformers are compute-intensive to train. It's unlikely that you will be training one from scratch unless you are a researcher in which case, you will most likely have the resources for doing so. The main aim of the project was for learning purposes and to see how these Transformer models are built. However, if you'd like to improve this model's performance there are several things you can try:

- Get better text generation data from Hugging Face
- Source more text generation data
- Create a better network
- Train longer
- Use a pre-trained network (Best choice for production)

In this article, you have learned how to build a text generation model using Keras and TensorFlow using the Transformer decoder. You have seen how the various building blocks come together to build an end-to-end system that can generate text. However, since the model wasn't trained for so long and with limited compute, it wasn't able to generate coherent text. You can attempt to improve this by using a bigger dataset and training for a longer time. For production use cases, check out MakerSuite, which provides simple APIs that you can use to fine-tune and build your generative AI model in various languages.

Training BERT can quickly

]]>**BERT **is a popular Masked Language Model. Some words are hidden from the model and trained to predict them. The model is bidirectional, meaning it has access to the words to the left and right, making it a good choice for tasks such as text classification.

Training BERT can quickly become complicated, but not with KerasNLP, which provides a simple Keras API for training and finetuning natural language processing (NLP)models. KerasNLP provides preprocessors and tokenizers for various NLP models, including BERT, GPT2, and OPT. You can even use the library to train a transformer from scratch.

In this article, you will use KerasNLP to train a text classification model to classify sentiment.

Let's dive in.

Join the newsletter to receive the technical deep dives in your inbox.

Install KerasNLP:

`pip install keras-nlp --upgrade`

Import the required packages:

```
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
import keras_nlp
from sklearn.model_selection import train_test_split
```

You can follow along using this Kaggle Notebook.

Next, download the training dataset from Google Drive.

```
wget --no-check-certificate https://drive.google.com/uc?id=13ySLC_ue6Umt9RJYSeM2t-V0kCv-4C-P -O /tmp/sentiment.csv -O /tmp/sentiment.csv
```

Read the dataset and split it into a training and testing set.

```
df = pd.read_csv('/tmp/sentiment.csv')
X = df['text']
y = df['sentiment']
X_train, X_test , y_train, y_test = train_test_split(X, y , test_size = 0.20)
```

Convert the labels to the categorical format as expected by Keras.

```
y_train = tf.keras.utils.to_categorical(y_train, num_classes=2, dtype='float32')
y_test = tf.keras.utils.to_categorical(y_test, num_classes=2, dtype='float32')
```

KerasNLP provides various NLP modes to choose from. In this case, let's use the `bert_tiny_en_uncased_sst2`

that has been finetuned for sentiment analysis. The model is loaded using `BertClassifier`

with the following arguments:

- The model name
- The number of classes, here 2
- Whether to load the pre-trained weights, true by default
- The type of activation. We choose sigmoid because it's a binary classification problem.

💡

You will need to process the model output to get probabilities if you don't specify an activation function at this point because the model will output logits. To make the results interpretable, you will need to pass them through the sigmoid activation function when running inference.

```
model_name = "bert_tiny_en_uncased_sst2"
# Pretrained classifier.
classifier = keras_nlp.models.BertClassifier.from_preset(
model_name,
num_classes=2,
load_weights = True,
activation='sigmoid'
)
```

The next step is to compile and train the model. Set the trainable parameter of the model to false so that you are not training the model from scratch. The objective is to use the pre-trained model and finetune it on your dataset, a process known as transfer learning.

```
classifier.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer=keras.optimizers.Adam(),
jit_compile=True,
metrics=["accuracy"],
)
# Access backbone programatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=X_train, y=y_train, validation_data=(X_test,y_test), batch_size=32)
```

Evaluating the model on the test set gives us an accuracy of 87% which is not bad considering that you have used the tiny version of the BERT model.

`classifier.evaluate(X_test, y_test,batch_size=32)`

Join the newsletter to receive the technical deep dives in your inbox.

Test the BERT model to see how it performs on new data samples.

```
# Predict two new examples.
classifier.predict(["What an amazing movie!", "A total waste of my time."])
```

The model predicts that the first sample is positive with 95% while the second one is negative with 95% confidence.

You can also make the results more interpretable by passing the predictions through the class names of the training data. Here is an example with a sample from the test set:

```
print(list(X_test)[10])
class_names = ["negative","postive"]
scores = classifier.predict([list(X_test)[10]])
scores
f"{class_names[np.argmax(scores)]} with a { (100 * np.max(scores)).round(2) } percent confidence."
```

In the previous example, you trained a BERT model by passing raw strings. Notice that we didn't perform the standard NLP processing, such as:

- Removing punctuations
- Removing stop words
- Creating vocabulary
- Converting the text to a numerical computation

All these were done by the model automatically. However, in some cases, you may want more control over that process. KerasNLP provides `BertPreprocessor`

** **for this purpose. Every model has its preprocessor class. For this illustration, load `BertPreprocessor`

with a sequence length of 128.

```
preprocessor = keras_nlp.models.BertPreprocessor.from_preset(
model_name,
sequence_length=128,
)
```

Manually map this `preprocessor`

to the training and testing set. Convert the data to a tf.data format to make this possible. Notice the use of:

`cache`

to cache the dataset. Pass a file name to this function if your dataset can't fit into memory.-
`AUTOTUNE`

to automatically configure the batch size.

```
training_data = tf.data.Dataset.from_tensor_slices(([X_train], [y_train]))
validation_data = tf.data.Dataset.from_tensor_slices(([X_test], [y_test]))
train_cached = (
training_data.map(preprocessor, tf.data.AUTOTUNE).cache().prefetch(tf.data.AUTOTUNE)
)
test_cached = (
validation_data.map(preprocessor, tf.data.AUTOTUNE).cache().prefetch(tf.data.AUTOTUNE)
)
```

Next, define the BERT model and train it.

```
# Pretrained classifier.
classifier = keras_nlp.models.BertClassifier.from_preset(
model_name,
preprocessor=None,
num_classes=2,
load_weights = True,
activation='sigmoid'
)
classifier.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer=keras.optimizers.Adam(),
jit_compile=True,
metrics=["accuracy"],
)
classifier.fit(train_cached, validation_data=test_cached,epochs=10)
```

You can run some predictions on new data by first passing it through the BERT preprocessor to ensure that it's in the format the model expects.

```
test_data = preprocessor([list(X_test)[10]])
print(list(X_test)[10])
scores = classifier.predict(test_data)
scores
f"{class_names[np.argmax(scores)]} with a { (100 * np.max(scores)).round(2) } percent confidence."
```

Join the newsletter to receive the technical deep dives in your inbox.

You have seen how easy it is to train NLP models with KerasNLP, which removes much of the complexity, enabling you to train the model faster. KerasNLP is an excellent choice for training NLP models with TensorFlow using the Keras API you are already familiar with. You can explore further by switching the dataset and model used in this article. Check out the KerasNLP website for more tutorials and guides.

**Whenever you're ready, there is 2 way I can help you:**

If you're looking to accelerate your career, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. **Get your copy**.

**→ Data Science and Machine Learning Ebook**: I offer numerous free and paid data science and machine learning ebooks to help you in your data science and machine learning career.

]]>

The Iris dataset contains four features: Sepal Length, Sepal Width, Petal Length, and Petal Width, along

]]>This blog post will explore 20 powerful and unique Pandas functions that can significantly enhance your data analysis workflow. We will be using the famous Iris dataset as an example to demonstrate each function.

The Iris dataset contains four features: Sepal Length, Sepal Width, Petal Length, and Petal Width, along with their corresponding Iris species. We'll also use another dataset towards the end to show some datetime Pandas features.

All code and files will be hosted on my Github, which can be found here.

Before we begin, ensure you have Python installed; if not, here's how to do it. Also, you need to install `pandas`

which you can do by going to your terminal and typing:

`pip3 install pandas`

If you want to avoid all this hassle, go to colab.research.google.com, and you can start coding with Python and Pandas straightaway, as everything you need is pre-installed!

Now we can get started with expanding our Pandas toolkit! But first, let's import Pandas and set up the dataset.

```
import pandas as pd # Importing Pandas
# Load the Iris dataset
iris_df = pd.read_csv('iris.csv')
```

The `nunique()`

function is used in Pandas to count the number of unique values in a Series or DataFrame column. It helps us understand the diversity and variety of values present in a dataset, allowing us to gain insights into the uniqueness of the data.

This function is handy for data quality analysis, identifying duplicates, and understanding the distribution of distinct values.

Let's illustrate the usage of `nunique()`

with a logical and real-life code example using the Iris dataset:

```
# Count unique species in the dataset
num_unique_species = iris_df['Species'].nunique()
print(num_unique_species)
```

We apply the ** nunique() **function on the '

The result, stored in the

variable represents the count of unique species in the Iris dataset. By printing this value, we can easily observe the diversity and variety of species in the dataset. This information is valuable for understanding the composition of the dataset and can guide further analysis or decision-making.**num_unique_species**

💡

In simple terms, the

By applying this function, we can obtain a single number that tells us the count of unique species, allowing us to comprehend the variety of species within the dataset.

*Learn More ➡️*

**nunique()**

function helps us find out how many different types of species are present in the Iris dataset. By applying this function, we can obtain a single number that tells us the count of unique species, allowing us to comprehend the variety of species within the dataset.

The `map()`

function in Pandas is used to transform values in a Series or DataFrame column. It enables us to replace existing values with new values based on a mapping dictionary, another Series, or a custom function.* *

This function is particularly useful when we want to perform value mapping or transformation on specific columns.

*Learn more about Series and DataFrames.*

Let's explore the usage of `map()`

with a logical and real-life code example:

```
# Create a mapping dictionary for flower colors
color_mapping = {
'setosa': 'blue',
'versicolor': 'orange',
'virginica': 'purple'
}
# Map flower colors using the mapping dictionary
iris_df['Flower Color'] = iris_df['Species'].map(color_mapping)
print(iris_df[['Species', 'Flower Color']].head())
```

In the code above, we start by creating a mapping dictionary

, where the keys represent the original species names ('**color_mapping***setosa*', '*versicolor*', '*virginica*'), and the values represent the corresponding color codes ('*blue*', '*orange*', '*purple*') we want to map them to.

Next, we then use the

function on the '**map()***Species*' column to transform the species names into their corresponding color codes using the `color_mapping`

dictionary.

The resulting '*Flower Color*' column is added to the `iris_df`

DataFrame, containing the mapped color values for each species. By printing the '*Species*' and '*Flower Color*' columns using `head()`

, we can observe the transformed values side by side.

💡

In simple terms, the

By providing a mapping dictionary, we can easily transform the values and create a new column ('*Flower Color*') that represents the mapped colors.

This function is helpful when we need to replace existing values with new values based on specific mapping rules or transformations.

*Learn More ***➡️**

`map()`

function allows us to convert the species names in the 'Species' column of the Iris dataset into corresponding color codes. By providing a mapping dictionary, we can easily transform the values and create a new column ('

This function is helpful when we need to replace existing values with new values based on specific mapping rules or transformations.

The `groupby()`

function in Pandas groups data based on one or more columns in a DataFrame.

It enables us to perform operations on each group separately, which is particularly useful for aggregation and summarization tasks.

This function allows us to split the data into groups based on a specified column or columns and then apply functions or calculations to each group.

Let's explore the usage of `groupby()`

with a logical and real-life code example:

```
# Group the data by species and calculate the mean of sepal length
species_grouped = iris_df.groupby('Species')['sepal_length'].mean()
pd.DataFrame(species_grouped)
```

In the code above, we then apply the

function on the DataFrame, specifying the column '**groupby()***Species*' to group the data by.

Next, we select the '*sepal_length*' column and use the

function to calculate the average sepal length for each species. The result is a Series object, **mean()**

where the species names are the index and the corresponding mean sepal length values are the values.**species_grouped**

By printing the ** species_grouped** Series, we can observe the average sepal length for each species. This information allows us to compare the average sepal length across different species in the Iris dataset, providing insights into their characteristic differences.

💡

In simple terms, the function helps us group the data in the Iris dataset based on the species column. By specifying the column to group by, we create distinct groups for each species.

We then calculate the mean sepal length within each group using the function.

The result allows us to see the average sepal length for each species, aiding in the comparison and analysis of sepal length across different species.

*Learn More ***➡️**

`groupby()`

We then calculate the mean sepal length within each group using the

`mean()`

The result allows us to see the average sepal length for each species, aiding in the comparison and analysis of sepal length across different species.

The `.pivot_table()`

function in Pandas is used to create a pivot table based on a DataFrame. It allows you to summarize and aggregate data based on two or more columns, providing a compact representation of the data.

Pivot tables are particularly useful for analyzing and visualizing data from multiple perspectives.

*If you don't know what Pivot Tables are, read more about them here *➡️

Let's explore the usage of `.pivot_table()`

with a nice example:

```
# Create a pivot table to calculate the average petal length for each species based on sepal width
pivot_table = pd.pivot_table(
iris_df,
values='petal_length',
index='Species',
columns='sepal_width',
aggfunc='mean')
print(pivot_table)
```

In the code above, we use the

function to create a pivot table based on the DataFrame.**.pivot_table()**

Within the

function, we specify:**.pivot_table()**

- the column to be aggregated (

),**values='petal_length'** - the columns to be used as row labels (

), and**index='Species'** - the columns to be used as column labels (

).**columns='sepal_width'**

Additionally, we specify the aggregation function (** aggfunc='mean'**) to calculate the average petal length for each species and sepal width combination.

The resulting pivot table, stored in the

variable presents a compact representation of the data. The rows represent the species, the columns represent the sepal widths, and the values represent the average petal lengths corresponding to each combination of species and sepal width.**pivot_table**

By printing the ** pivot_table**, we can observe the average petal length for each species based on different sepal widths. This information provides a comprehensive view of how petal length varies across species and sepal width categories.

💡

In simple terms, the

We can specify which columns to use as row and column labels, which column to aggregate, and the aggregation function to apply.

The resulting pivot table provides a condensed representation of the data, making it easier to analyze and compare values across different categories.

*Learn More ***➡️**

**.pivot_table()**

function allows us to summarize and aggregate data in a tabular format. We can specify which columns to use as row and column labels, which column to aggregate, and the aggregation function to apply.

The resulting pivot table provides a condensed representation of the data, making it easier to analyze and compare values across different categories.

The `cut()`

function in Pandas is used to divide continuous data into bins or intervals. It is beneficial when transforming a numerical column into categorical bins, allowing for better analysis and visualization.

This function helps in discretizing data and creating meaningful categories based on specific ranges or criteria.

Let's explore the usage of `cut()`

with an example:

```
# Create three bins for sepal length: Short, Medium, and Long
sepal_length_bins = pd.cut(
iris_df['sepal_length'],
bins=[0, 5, 6.5, 10],
labels=['Short', 'Medium', 'Long'])
pd.DataFrame(sepal_length_bins.head())
```

In the code above, we use the

function to divide the '**cut()***sepal_length*' column into three bins: '*Short*', '*Medium*', and '*Long*'.

Within the

function, we specify the column to be binned (**cut()**

), the bin edges (**iris_df['sepal_lenghth'**]

), and the labels for each bin (**bins=[0, 5, 6.5, 10]**

). **labels=['Short', 'Medium', 'Long']**

In this example, sepal lengths below 5 will be categorized as '*Short*', lengths between 5 and 6.5 will be categorized as '*Medium*', and lengths above 6.5 will be categorized as '*Long*'.

The resulting

Series contains the bin labels corresponding to each sepal length value in the '**sepal_length_bins***sepal_length*' column. By printing the `head()`

of this Series, we can observe the transformed values, where each sepal length is assigned to the respective bin category.

💡

In simple terms, the

In this example, we divide the sepal length values in the Iris dataset into three bins: '*Short*', '*Medium*', and '*Long*'. By specifying the bin edges and labels, we can assign each sepal length value to its appropriate bin.

This transformation allows us to analyze and visualize the data in terms of these meaningful categories instead of continuous values.

*Learn More ***➡️**

**cut()**

function helps us create categories or bins for numerical data. In this example, we divide the sepal length values in the Iris dataset into three bins: '

This transformation allows us to analyze and visualize the data in terms of these meaningful categories instead of continuous values.

The `melt()`

function in Pandas transforms a DataFrame from a wide format to a long format by "unpivoting" the data. It is handy when converting columns into rows, making the data more suitable for analysis and visualization.

This function helps in restructuring data by gathering multiple columns into key-value pairs.

Let's explore the usage of `melt()`

with an example:

```
# Convert the DataFrame from wide to long format
melted_df = pd.melt(
iris_df,
id_vars='Species',
value_vars=[
'sepal_length',
'sepal_width',
'petal_length',
'petal_width'])
print(melted_df.head())
```

The iris dataset is in a wide format, where each attribute (*Sepal Length, Sepal Width, Petal Length, Petal Width*) has its own column.

** If you're wondering what's a wide format, learn about it here **➡️

We then use the

function to transform the **melt()**

DataFrame from a wide format to a long format. We specify the **iris_df**

parameter as '**id_vars***Species*' to indicate that we want to keep the '*Species*' column as an identifier. The

parameter lists the columns we want to unpivot or melt, which are [**value_vars***'sepal_lenth', 'sepal_width', 'petal_length', 'petal_width'*] in this example.

The resulting **melted_df**** **DataFrame contains the unpivoted data, where each row represents a unique combination of '*Species*' and an attribute column. The '*variable*' column indicates the attribute name and the '*value*' column contains the corresponding attribute values.

By printing the

of **head()**

, we can observe the transformed data, where the attribute columns are converted into key-value pairs. **melted_df***This long format is often more suitable for further analysis, as it allows for easier aggregation, filtering, and visualization of the data.*

💡

In simple terms, the

It gathers multiple columns and stacks them into key-value pairs, with each row representing a unique combination of identifiers and attributes.

This transformation is useful when we want to analyze or visualize the data in a more structured and organized manner, especially when dealing with data that has attributes spread across different columns.

**Learn More**

**melt()**

function helps us convert a DataFrame from a wide format to a long format. It gathers multiple columns and stacks them into key-value pairs, with each row representing a unique combination of identifiers and attributes.

This transformation is useful when we want to analyze or visualize the data in a more structured and organized manner, especially when dealing with data that has attributes spread across different columns.

The `apply()`

function in Pandas is used to apply a function to each element or row in a Series or DataFrame. It provides a flexible way to perform custom operations on your data.

This function allows you to process data in a more granular and personalized manner, as you can define your own function or use built-in functions.

Let's explore the usage of `apply()`

with a useful example:

```
# Apply a lambda function to calculate the square of each Sepal Length value
iris_df['sepal_length_sqaured'] = iris_df['sepal_length'].apply(lambda x: x**2)
print(iris_df[['sepal_length', 'sepal_length_sqaured']].head())
```

We use the

function to apply a lambda function to each value in the '**apply()***sepal_length*' column. The lambda function calculates the square of each value by raising it to the power of 2.

The resulting values are stored in a new column called '*sepal_length_sqaured*'. By printing the

of the **head()**

DataFrame, we can observe both the original '**iris_df***sepal_length*' column and the newly added '*sepal_length_sqaured*' column, which contains the squared values.

💡

In simple terms, the

In this example, we use a lambda function to calculate the square of each sepal length value in the '*sepal_length*' column.

This transformation is useful when we want to derive new values or perform calculations based on existing data. The result is a modified DataFrame with the original column and the newly created column reflecting the applied function.

*Learn More ***➡️**

**apply()**

function allows us to perform a custom operation on each element in a Series or DataFrame. In this example, we use a lambda function to calculate the square of each sepal length value in the '

This transformation is useful when we want to derive new values or perform calculations based on existing data. The result is a modified DataFrame with the original column and the newly created column reflecting the applied function.

The `sort_values()`

function in Pandas is used to sort a DataFrame or Series by one or more columns. It allows you to arrange your data in ascending or descending order, making it easier to analyze and visualize.

This function is particularly useful when you want to explore the data in a specific order or identify the top or bottom values based on certain criteria.

Let's explore the usage of `sort_values()`

with a quick example:

```
# Sort the DataFrame by Sepal Length in descending order
sorted_df = iris_df.sort_values('sepal_length', ascending=False)
print(sorted_df.head())
```

We use the ** sort_values() **function to sort the DataFrame based on the '

`'`**sepal_length**'

) and setting the **ascending**

parameter to **False**

, we arrange the data in descending order of sepal length.The resulting ** sorted_df **DataFrame contains the rows sorted based on the sepal length values in descending order. By printing the

**head()**

of **sorted_df**

, we can observe the top rows with the longest sepal lengths.💡

In simple terms, the

In this example, we sort the Iris DataFrame by sepal length in descending order to identify the flowers with the longest sepals.

Sorting the data allows us to explore the dataset in a specific order and identify patterns or outliers based on the sorted criteria.

*Learn More ***➡️**

**sort_values()**

function helps us arrange the data in a DataFrame or Series based on the values in one or more columns. In this example, we sort the Iris DataFrame by sepal length in descending order to identify the flowers with the longest sepals.

Sorting the data allows us to explore the dataset in a specific order and identify patterns or outliers based on the sorted criteria.

The ** value_counts()** function in Pandas is used to count the occurrences of unique values in a Series. It is

This function allows us to quickly determine the frequency of each unique value in a column, providing insights into the data's composition.

Let's explore the usage of `value_counts()`

with a useful example:

```
# Count the occurrences of each species
species_counts = iris_df['Species'].value_counts()
print(species_counts)
```

We use the

function on the '**value_counts()***Species*' column to count the occurrences of each unique species. The resulting

Series contains the counts of each species, with the species names as the index and the corresponding frequencies as the values.**species_counts**

By printing the ** species_counts**, we can observe the number of occurrences for each species in the dataset. This information helps us understand species distribution and identify any imbalances or biases in the data.

💡

In simple terms, the function allows us to count the occurrences of each unique value in a Series.

In this example, we count the occurrences of each species in the 'Species' column of the Iris dataset.

The result provides a frequency count for each species, helping us analyze the distribution of species in the dataset and gain insights into the composition of the data.

*Learn More ***➡️**

`value_counts()`

In this example, we count the occurrences of each species in the 'Species' column of the Iris dataset.

The result provides a frequency count for each species, helping us analyze the distribution of species in the dataset and gain insights into the composition of the data.

The `fillna()`

function in Pandas is used to fill missing or NaN (Not a Number) values in a DataFrame or Series. It allows you to replace missing values with specific values or methods like *forward-fill or backward-fill*.

This function helps in handling missing data and ensuring the completeness of the dataset.

Let's explore the usage of `fillna()`

with a logical and real-life code example:

```
# Fill missing values in the 'sepal_width' column with the mean value
iris_df['sepal_width'] = iris_df['sepal_width'].fillna(
iris_df['sepal_width'].mean())
print(iris_df['sepal_width'].isnull().sum())
```

We then use the

function to fill any missing values in the '**fillna()***sepal_width*' column with the mean value of that column.

By calling

, we calculate the mean value of the '**iris_df['sepal_width'].mean()***sepal_width*' column. The

function replaces any missing values with this calculated mean value.**fillna()**

After filling in the missing values, we check for remaining null values by calling

. This expression returns the sum of null values in the '**iris_df['sepal_width'].isnull().sum()***sepal_width*' column. If the output is zero, all missing values have been successfully filled.

💡

In simple terms, the

In this example, we fill the missing values in the '*sepal_width*' column of the Iris dataset with the mean value of that column.

By doing so, we ensure that there are no missing values, allowing us to work with complete and reliable data.

*Learn More ***➡️**

**fillna()**

function helps us handle missing values in a DataFrame or Series. In this example, we fill the missing values in the '

By doing so, we ensure that there are no missing values, allowing us to work with complete and reliable data.

The `astype()`

function in Pandas is used to change the data type of a column in a DataFrame or a Series. It allows you to convert a column from one data type to another, such as from integer to float, string to datetime, or vice versa.

This function helps in ensuring the appropriate data type for analysis and computation.

Let's explore the usage of `astype()`

with a practical example:

```
# Convert the 'sepal_length' column to float
iris_df['sepal_length'] = iris_df['sepal_length'].astype(float)
print(iris_df['sepal_length'].dtype)
```

We use the

function to convert the '**astype()***sepal_length*' column from its original data type to float. By calling

, we specify the desired data type as float and apply the conversion to the '**iris_df['sepal_length'].astype(float)***sepal_length*' column.

After the conversion, we check the data type of the '*sepal_length*' column by calling

. This expression returns the data type of the column. If the output is float, it indicates that the conversion was successful.**iris_df['sepal_length'].dtype**

💡

In simple terms, the

In this example, we convert the '*sepal_length*' column in the Iris dataset from its original data type to float.

This conversion ensures that the values in the column are treated as floating-point numbers, which might be necessary for certain calculations or operations.

*Learn More ***➡️**

**astype()**

function allows us to change the data type of a column in a DataFrame or Series. In this example, we convert the '

This conversion ensures that the values in the column are treated as floating-point numbers, which might be necessary for certain calculations or operations.

The `duplicated()`

function in Pandas is used to identify duplicate rows in a DataFrame.

It helps in detecting and handling any duplicate entries in your dataset, allowing you to ensure data integrity and identify potential issues.

Let's explore the usage of `duplicated()`

with a logical example:

```
# Check for duplicate rows based on all columns
duplicates = iris_df.duplicated()
print(duplicates.sum())
```

We then use the

function on the **duplicated()**** iris_df **DataFrame to check for duplicate rows. By calling

**iris_df.duplicated()**

, the function returns a boolean Series where `True`

`False`

To count the total number of duplicate rows, we use the

function on the **sum()**`duplicates`

Series. By calling

, we get the sum of **duplicates.sum()**** True **values, which represents the count of duplicate rows.

💡

In simple terms, the function helps us find duplicate rows in a DataFrame.

In this example, we check for duplicate rows in the Iris dataset by applying on the DataFrame.

By counting the number of

*Learn More ***➡️**

`duplicated()`

In this example, we check for duplicate rows in the Iris dataset by applying

`duplicated()`

`iris_df`

By counting the number of

**True**

values in the resulting boolean Series, we can determine the total count of duplicate rows. This information is valuable for data quality analysis and ensuring the uniqueness of the data.The `drop_duplicates()`

function in Pandas is used to remove duplicate rows from a DataFrame.

It helps in cleaning and ensuring the uniqueness of the data by eliminating any redundant entries.

Let's explore the usage of `drop_duplicates()`

with a simple example:

```
# Remove duplicate rows based on all columns
deduplicated_df = iris_df.drop_duplicates()
print(deduplicated_df.shape[0])
```

We use the

function on the **drop_duplicates()**

DataFrame to remove duplicate rows. By calling **iris_df**

, the function returns a new DataFrame with duplicate rows removed. The original **iris_df.drop_duplicates()**

DataFrame remains unchanged.**iris_df**

We use the attribute to check the number of rows in the deduplicated DataFrame. By calling

, we get the number of rows in the DataFrame, which represents the count of unique rows after removing duplicates.**deduplicated_df.shape[0]**

💡

In simple terms, the

In this example, we remove duplicate rows in the Iris dataset by applying

The resulting

*Learn More ***➡️**

**drop_duplicates()**

function helps us eliminate duplicate rows from a DataFrame.In this example, we remove duplicate rows in the Iris dataset by applying

**drop_duplicates()**

on the **iris_df**

DataFrame. The resulting

**deduplicated_df**

DataFrame contains only the unique rows, ensuring the uniqueness of the data. This operation is useful for data cleaning and maintaining the integrity of the dataset.The `str.contains()`

function in Pandas is used to check whether each element of a string column contains a specific pattern or substring. It is used when searching for a specific pattern within the values of a string column.

This function helps in identifying rows that contain a particular pattern, allowing us to filter or analyze the data based on specific criteria.

Let's explore the usage of `str.contains()`

with a quick example:

```
# Check if the 'Species' column contains the pattern 'versi'
contains_versi = iris_df['Species'].str.contains('versi')
print(contains_versi.head())
```

Next, we use the

function on the '**str.contains()***Species*' column to check whether each value contains the pattern '*versi*'.

By calling

, the function returns a boolean Series indicating whether the pattern is present (**iris_df['Species'].str.contains('versi')**

) or not (**True**

) in each value of the '**False***Species*' column.

The resulting ** contains_versi **Series contains

`True`

`False`

**contains_versi.head()**

, we can observe the boolean values corresponding to the first few rows of the '💡

In simple terms, the

In this example, we use it to check whether the '*Species*' column in the Iris dataset contains the pattern '*versi*'.

By doing so, we obtain a boolean Series that indicates which rows have the specified pattern. This function is useful when filtering or analyzing the data based on specific patterns or substrings within the string values.

**Learn More ➡️**

**str.contains()**

function allows us to search for a specific pattern within the values of a string column. In this example, we use it to check whether the '

By doing so, we obtain a boolean Series that indicates which rows have the specified pattern. This function is useful when filtering or analyzing the data based on specific patterns or substrings within the string values.

The `str.replace()`

function in Pandas is used to replace occurrences of a pattern or substring with a new value in a string column. It is used when we want to modify or update specific parts of the string values within a column.

This function helps in performing string replacement operations, which can be useful for data cleaning, standardization, or transforming the data to a desired format.

Let's explore the usage of `str.replace()`

with a useful example:

```
# Replace 'setosa' with 'SETOSA' in the 'Species' column
replaced_species = iris_df['Species'].str.replace('setosa', 'SETOSA')
print(replaced_species.head())
```

Next, we use the

function on the '**str.replace()***Species*' column to replace all occurrences of '*setosa*' with '*SETOSA*'. By calling

, the function performs the replacement operation, resulting in a new Series called **iris_df['Species'].str.replace('setosa', 'SETOSA')**

.*replaced_species*

The resulting ** replaced_species **Series contains the modified values where all instances of '

**replaced_species.head()**

, we can observe the updated values of the '💡

In simple terms, the

In this example, we use it to replace all occurrences of '*setosa*' with '*SETOSA*' in the '*Species*' column of the Iris dataset. By doing so, we obtain a new Series with the modified values.

**Learn More**** ➡️**

**str.replace()**

function allows us to replace specific patterns or substrings within the string values of a column. In this example, we use it to replace all occurrences of '

The `str.extract()`

function in Pandas extracts substrings from a string column using regular expressions. It is used when we want to extract specific segments or patterns from the values within a string column.

This function helps in retrieving specific information from the strings, such as extracting numbers, dates, or other structured patterns.

Let's explore the usage of `str.extract()`

with an example:

```
# Extract the numeric part from the 'Species' column
numeric_species = iris_df['Species'].str.extract('(\d+)')
print(numeric_species.head())
```

Next, we use the

function on the '**str.extract()***Species*' column to extract the numeric part from each string value.

By calling

, we specify the regular expression pattern **iris_df['Species'].str.extract('(\d+)')**

to capture one or more digits from the strings.**(\d+)**

The resulting ** numeric_species **DataFrame contains the extracted numeric values. Each value is extracted based on the provided regular expression pattern. By printing

**numeric_species.head()**

, we can observe the extracted numeric values corresponding to the first few rows of the '💡

In simple terms, the function allows us to extract specific substrings or patterns from the values of a string column.

In this example, we use it to extract the numeric part from the '*Species*' column in the Iris dataset.

The extracted values are then stored in a new DataFrame.

**Learn More**** ➡️**

`str.extract()`

In this example, we use it to extract the numeric part from the '

The extracted values are then stored in a new DataFrame.

The `get_dummies()`

function in Pandas converts categorical variables into dummy or indicator variables. It is used when we want to transform categorical columns into numerical representations that can be used in machine learning models or further analysis.

This function helps in handling categorical data by creating binary columns to represent each category.

Let's explore the usage of `get_dummies()`

with a logical example:

```
# Convert the 'Species' column into dummy variables
dummy_species = pd.get_dummies(iris_df['Species'])
print(dummy_species.head())
```

Next, we use the

function on the '**get_dummies()***Species*' column to convert it into dummy variables. By calling

, the function generates binary columns for each unique category in the '**pd.get_dummies(iris_df['Species'])***Species*' column.

The resulting

DataFrame contains the transformed data, where each category in the '**dummy_species***Species*' column is represented by a separate binary column. The value of each binary column is 1 if the original value matches the category, and 0 otherwise. By printing

, we can observe the dummy variables corresponding to the first few rows of the '**dummy_species.head()***Species*' column.

💡

In simple terms, the

In this example, we use it to transform the '*Species*' column in the Iris dataset into dummy variables. Each unique category in the column is represented by a separate binary column, where a value of 1 indicates the presence of that category, and zero indicates its absence.

This transformation is useful for handling categorical data in machine learning models or performing further analysis with numerical representations.

*Learn more* **➡️**

**get_dummies()**

function allows us to convert categorical variables into binary columns. In this example, we use it to transform the '

This transformation is useful for handling categorical data in machine learning models or performing further analysis with numerical representations.

To showcase this function, we'll need to switch our dataset, and for this example, I'll be using an electricity-production dataset which you'll be able to find in the Github Repo (LINK)

The `dt.year`

accessor in Pandas is used to extract the year component from a datetime series. It is used when we want to extract only the year information from dates in a datetime column.

This accessor allows us to focus on the year component of the dates and perform analysis or operations based on the year.

Let's explore the usage of `dt.year`

with a quick example:

```
# Load the electric_production dataset
electric_df = pd.read_csv('electric_production.csv')
# Convert the 'Date' column to datetime and extract the year
electric_df['Date'] = pd.to_datetime(electric_df['Date'])
electric_df['Year'] = electric_df['Date'].dt.year
print(electric_df[['Date', 'Year']].head())
```

In the code above, we start by loading the electric_production dataset using Pandas'

function and storing it in the **read_csv()**

DataFrame. The dataset contains two columns: '**electric_df***Date*' representing the date of electric production and '*Production*' representing the production value on that date.

Next, we convert the '*Date*' column to a datetime data type using

. By calling **pd.to_datetime()**

, we ensure that the 'Date' column is recognized as a datetime column.**electric_df['Date'] = pd.to_datetime(electric_df['Date'])**

We then use the

accessor on the 'Date' column to extract the year component. By calling **dt.year**

, we extract only the year information from each date and create a new column called '**electric_df['Date'].dt.year***Year*' in the electric_df DataFrame.

The resulting DataFrame contains the original '*Date*' and the newly created '*Year*' column. By printing ** electric_df[['Date', 'Year']].head()**, we can observe the first few rows with both the original dates and the extracted year values.

💡

In simple terms, the

In this example, we use it to extract the year from the '*Date*' column in the electric_production dataset. By doing so, we create a new column called '*Year*' containing only the year information.

This extraction can be helpful for analyzing or aggregating data based on yearly trends, patterns, or comparisons.

*Learn More ➡️*

**dt.year**

accessor allows us to focus on the year component of dates in a datetime column. In this example, we use it to extract the year from the '

This extraction can be helpful for analyzing or aggregating data based on yearly trends, patterns, or comparisons.

The `resample()`

function in Pandas is used to resample time series data to a different frequency. It is used when we want to change the time intervals of our data, either by upsampling (increasing the frequency) or downsampling (decreasing the frequency).

This function helps in aggregating or summarizing data over different time intervals, such as daily, monthly, or yearly.

Let's explore the usage of `resample()`

with a useful example:

```
# Convert the 'Date' column to datetime and set it as the index
electric_df['Date'] = pd.to_datetime(electric_df['Date'])
electric_df.set_index('Date', inplace=True)
# Resample the data to monthly frequency and calculate the mean
monthly_mean = electric_df.resample('M').mean()
print(monthly_mean.head())
```

Firstly, we convert the '*Date*' column to a datetime data type using `pd.to_datetime()`

.

By calling

, we ensure that the 'Date' column is recognized as a datetime column.**electric_df['Date'] = pd.to_datetime(electric_df['Date'])**

We then set the '*Date*' column as the index of the

DataFrame using **electric_df**

. Setting the index as the datetime column allows us to perform time-based operations and resampling.**electric_df.set_index('Date', inplace=True)**

Next, we use the

function on the **resample()**

DataFrame to resample the data to a monthly frequency. By calling **electric_df**

, we specify '**electric_df.resample('M')***M'* as the frequency, which stands for monthly. This operation aggregates the data over monthly intervals.

Finally, we calculate the mean of each month's data using the `mean()`

function. By calling

, we calculate the mean production value for each month.**electric_df.resample('M').mean()**

The resulting ** monthly_mean **DataFrame contains the resampled data, where each row represents the mean production value for a specific month. By printing

**monthly_mean.head()**

, we can observe the resampled data for the first few months.💡

In simple terms, the

By doing so, we aggregate the data over monthly intervals and calculate the mean production value for each month.

This resampling is helpful for analyzing and summarizing time series data at different frequencies, enabling us to observe patterns, trends, or seasonality on a larger time scale.

**Learn More**** ➡️**

**resample()**

function allows us to change the time intervals of our time series data. In this example, we resample the electric_production dataset to a monthly frequency. By doing so, we aggregate the data over monthly intervals and calculate the mean production value for each month.

This resampling is helpful for analyzing and summarizing time series data at different frequencies, enabling us to observe patterns, trends, or seasonality on a larger time scale.

The `.to_csv()`

function in Pandas is used to export a DataFrame to a CSV (Comma-Separated Values) file. It is used when we want to save our DataFrame as a CSV file, which is a common file format for storing tabular data.

This function helps in saving the data for future use, sharing it with others, or using it in other applications that accept CSV files as input.

Let's create a dummy dataset on flowers and export it as a CSV file using the `.to_csv()`

function:

```
# Create a dummy dataset on flowers
flowers_data = {
'Name': ['Rose', 'Lily', 'Tulip', 'Sunflower', 'Daisy'],
'Color': ['Red', 'White', 'Pink', 'Yellow', 'White'],
'Petals': [5, 6, 4, 10, 8],
'Fragrance': ['Yes', 'Yes', 'No', 'No', 'Yes']
}
flowers_df = pd.DataFrame(flowers_data)
# Export the DataFrame as a CSV file
flowers_df.to_csv('flowers_dataset.csv', index=False)
```

In the code above, we first create a dummy dataset on flowers using a Python dictionary called ** flowers_data**. The dataset contains information about each flower's name, color, number of petals, and fragrance.

Next, we create a DataFrame called ** flowers_df** using the

`pd.DataFrame()`

`flowers_data`

To export the DataFrame as a CSV file, we use the ** .to_csv()** function. By calling

`flowers_df.to_csv('flowers_dataset.csv', index=False)`

`'flowers_dataset.csv'`

`index=False`

After executing this code, a CSV file named

will be created in the same directory as your Python script or notebook. This file will contain the data from the **'flowers_dataset.csv'**

DataFrame, with each row representing a flower and each column representing a specific attribute.**flowers_df**

💡

In simple terms, the function allows us to export a DataFrame as a CSV file.

In this example, we create a dummy dataset on flowers and store it in the, we save this DataFrame as a CSV file named .

This file can be used for future analysis, shared with others, or imported into other applications that accept CSV files as input.

**Learn More**** ➡️**

`.to_csv()`

In this example, we create a dummy dataset on flowers and store it in the

`flowers_df`

DataFrame. By using `.to_csv()`

`'flowers_dataset.csv'`

This file can be used for future analysis, shared with others, or imported into other applications that accept CSV files as input.

In conclusion, we have explored several powerful functions in Pandas for data manipulation and analysis.

Throughout this blog post, we have seen how these functions can be used to perform various tasks such as counting unique values, transforming values, grouping data, creating pivot tables, binning data, merging DataFrames, unpivoting data, applying functions, sorting data, counting values, filling missing values, changing data types, finding duplicates, removing duplicates, pattern matching, string replacement, substring extraction, extracting year from dates, and resampling time series data.

While it may seem overwhelming to remember all of these functions, it's important to note that the Pandas' documentation is a valuable resource. It provides detailed explanations, examples, and usage guidelines for each function. Rather than memorizing all the functions, it's more efficient to understand their capabilities and consult the documentation as needed. This allows you to leverage the full potential of Pandas and apply the appropriate functions based on your data analysis requirements.

For further learning, I recommend the book "** Python for Data Analysis**" by Wes McKinney, the core contributor of Pandas. This book provides in-depth coverage of data analysis techniques using Python and Pandas. It covers various topics, including data manipulation, data cleaning, visualization, and more. You can read the e-book on Wes McKinney's website here.

I hope this blog post has provided valuable insights into Pandas' functions' power and versatility. Feel free to explore the Pandas documentation and continue your journey in mastering data manipulation with Python!

If you have any further questions or need assistance, feel free to contact me on Twitter or connect with me on LinkedIn.

Happy coding and data analysis!

]]>Leveraging a foundational model is a

]]>You can now use Generative AI Studio on Vertex AI to prompt, tune and deploy Google's foundational models, including PaLM 2, Imagen, Codey, and Chirp. You can easily design and fine-tune your prompt and copy the code required to deploy the solution.

Leveraging a foundational model is a no-brainer because of the time, complexity, and computational requirements for training these language models (LLMs) from scratch. Deploying large language applications through APIs, even open-source ones, is easier because the size of these models makes them harder to deploy.

There are numerous large language models; both closed and open-source. LangChain has become a popular solution for building LLMs applications because it reduces complexity and makes switching from one model to another easy.

In this article, you will discover how to use the PaLM API with LangChain to build LLM applications.

Join the newsletter to receive the technical deep dives in your inbox.

You will learn how to use the PaLM API with two applications:

- Chatting with YouTube videos
- Chatting with PDFs

The first step is to install all the required libraries, including:

- LangChain
`google-cloud-aiplatform`

`pypdf`

for reading PDF files- Whisper for transcription
- Pytube for downloading YouTube videos

`pip install google-cloud-aiplatform google-api-python-client pypdf langchain pytube git+https://github.com/openai/whisper.git`

💡

Are you interested in digging deeper into building applications with Google's large language models? Check out the Generative AI learning path from Google. It contains numerous curated resources to get you started. You can also sign up for Google Cloud and get free credits for trying the foundational models at no cost.

You can follow along using this and this Kaggle notebook.

Chatting with YouTube videos is done in the following steps:

- Download the video using PyTube
- Convert the video to audio
- Transcribe the video using Whisper
- Split the transcribed text into chunks because the LLMs have a maximum number of tokens they can accept
- Create word embeddings for each chunk
- Store the embeddings in a vector database
- Create an embedding for the question
- Compare the questions embedding's to the embeddings in the vector store
- Return the top similar embeddings
- Pass these embeddings instead of the entire text to the LLM
- Get a response from the LLM

Let's look at the code implementation. First, import all the required packages:

```
import pandas as pd
# Utils
import time
from typing import List
from pydantic import BaseModel
from google.cloud import aiplatform
from langchain.chat_models import ChatVertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.vectorstores import Chroma
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import Chroma
import whisper
from pytube import YouTube
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
```

Using the PaLM API requires that you authenticate your Google account:

```
import vertexai
PROJECT_ID = "PROJECT_ID" # @param {type:"string"}
vertexai.init(project=PROJECT_ID, location="us-central1")
```

💡

You might need to set up a Google Service account and give it the required permissions for this to work.

Next, use PyTube to download the video and Whisper to transcribe it. Save the result in a CSV file.

```
YOUTUBE_VIDEOS = ["https://www.youtube.com/watch?v=Ibjm2KHfymo"]
def transcribe(youtube_url, model):
youtube = YouTube(youtube_url)
audio = youtube.streams.filter(only_audio=True).first()
with tempfile.TemporaryDirectory() as tmpdir:
file = audio.download(output_path=tmpdir)
title = os.path.basename(file)[:-4]
result = model.transcribe(file, fp16=False)
return title, youtube_url, result["text"].strip()
transcriptions = []
model = whisper.load_model("base")
for youtube_url in YOUTUBE_VIDEOS:
transcriptions.append(transcribe(youtube_url, model))
df = pd.DataFrame(transcriptions, columns=["title", "url", "text"])
df.to_csv("text.csv")
```

Join the newsletter to receive the technical deep dives in your inbox.

LangChain provides various data loaders. In this case, we are interested in the `CSVLoader`

.

```
file = "text.csv"
loader = CSVLoader(file_path=file)
docs = loader.load()
```

The following is a utility function for using the Vertex AI Embedding with rate limiting. Check the Console Quoatas page for the allowed request per minute.

```
# Utility functions for Embeddings API with rate limiting
def rate_limit(max_per_minute):
period = 60 / max_per_minute
print("Waiting")
while True:
before = time.time()
yield
after = time.time()
elapsed = after - before
sleep_time = max(0, period - elapsed)
if sleep_time > 0:
print(".", end="")
time.sleep(sleep_time)
class CustomVertexAIEmbeddings(VertexAIEmbeddings, BaseModel):
requests_per_minute: int
num_instances_per_batch: int
# Overriding embed_documents method
def embed_documents(self, texts: List[str]):
limiter = rate_limit(self.requests_per_minute)
results = []
docs = list(texts)
while docs:
# Working in batches because the API accepts maximum 5
# documents per request to get embeddings
head, docs = (
docs[: self.num_instances_per_batch],
docs[self.num_instances_per_batch :],
)
chunk = self.client.get_embeddings(head)
results.extend(chunk)
next(limiter)
return [r.values for r in results]
# Embedding
EMBEDDING_QPM = 100
EMBEDDING_NUM_BATCH = 5
embeddings = CustomVertexAIEmbeddings(
requests_per_minute=EMBEDDING_QPM,
num_instances_per_batch=EMBEDDING_NUM_BATCH,
)
```

There are various open-source vector databases for storing word embeddings. Chromadb is a common choice among developers.

Next, set up a retriever to fetch documents and pass them to the LLM.

```
db = Chroma.from_documents(docs, embeddings)
retriever = db.as_retriever()
```

To answer questions from the video, you need to set up the VertexAI LLM and the `RetrievalQA`

from LangChain.

```
llm = VertexAI(
model_name="text-bison@001",
max_output_tokens=256,
temperature=0,
top_p=0.8,
top_k=40,
verbose=True,
)
qa_stuff = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
query = "WHow is Java mounting a come back?"
response = qa_stuff.run(query)
"""Java is mounting a comeback because it is a well-balanced language that is performing enough for most things. It is relatively easy to use, and most importantly, has a huge ecosystem of stable libraries and frameworks."""
```

Join the newsletter to receive the technical deep dives in your inbox.

The process of chatting with PDFs is very similar to the video application. The only difference is reading in the PDF with LangChain.

LangChain provides various utilities for loading a PDF. Let's use the `PyPDFLoader`

.

```
loader = PyPDFLoader("yourpdf.pdf")
documents = loader.load()
```

We used a very short video from the Fireship YouTube channel in the video example. However, in some cases, the text will be too long to fit the LLM's context. In such a case, you have to split it into chunks. LangChain provides tools for doing that.

```
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(documents)
```

Save the text embeddings using Chromadb as done previously.

```
db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever()
```

Next, use `ChatVertexAI`

and start chatting with the PDF.

```
llm = ChatVertexAI()
qa_stuff = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
```

In this case, let's have the LLM summarize the contents of the PDF as a list.

```
from langchain.schema import (
HumanMessage
)
# prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.
Here's the article you need to summarize.
==================
Title: {article_title}
{article_text}
==================
Now, provide a summarized version of the article in a bulleted list format.
"""
# format prompt
prompt = template.format(article_title="Activation functions in JAX and Flax", article_text="texts")
# generate summary
summary = llm([HumanMessage(content=prompt)])
print(summary.content)
```

As you can see from the screenshot, the model summarized the PDF in a list format as instructed.

Join the newsletter to receive the technical deep dives in your inbox.

These two applications provide a glimpse into the world of building applications using Google's foundational models. What you build using them is only limited by your imagination. What will you build? Let me know in the comments below or by replying to this email.

**Whenever you're ready, there is 2 way I can help you:**

If you're looking to accelerate your career, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. **Get your copy**.

**→ Data Science and Machine Learning Ebook**: I offer numerous free and paid data science and machine learning ebooks to help you in your data science and machine learning career.

Training computer vision models with little data can lead to poor model performance. This problem can be solved by generating new data samples from the existing images. For example, you can create new images by flipping and rotating the existing ones. Generating new image samples from existing ones is known as **image augmentation**.

Image augmentation improves the model's performance by creating images in various angles, lighting, etc. Performing these transformations also prevents the model from memorizing and overfitting the training data. In this article, you will discover how to perform image augmentation using KerasCV.

Join the newsletter to receive the technical deep dives in your inbox.

We will use the cats and dogs dataset from Kaggle for this project. Download the dataset, extract it, and move the images to their corresponding folders. You can follow along with this Kaggle Notebook.

```
import wget # pip install wget
import zipfile
import shutil
wget.download("https://ml.machinelearningnuggets.com/train.zip")
with zipfile.ZipFile('train.zip', 'r') as zip_ref:
zip_ref.extractall('.')
filenames = os.listdir('train')
for filename in filenames:
category = filename.split('.')[0]
if category == 'dog':
shutil.move(f'train/{filename}', f'animals/dog/{filename}')
else:
shutil.move(f'train/{filename}', f'animals/cat/{filename}')
```

Next, import all the required packages and load the dataset using TensorFlow.

```
from PIL import Image
import matplotlib.pyplot as plt
import tarfile
from tensorflow import keras
import pandas as pd
from tensorflow.keras import layers
import tensorflow as tf
import keras_cv # pip install keras_cv
batch_size = 32
img_height = 128
img_width = 128
training_set = tf.keras.utils.image_dataset_from_directory(
base_dir,
validation_split=0.2,
subset="training",
seed=100,
image_size=(img_height, img_width),
batch_size=batch_size)
validation_set = tf.keras.utils.image_dataset_from_directory(
base_dir,
validation_split=0.2,
subset="validation",
seed=100,
image_size=(img_height, img_width),
batch_size=batch_size)
```

Next, set the class names and number of classes.

```
class_names = training_set.class_names
num_classes = len(class_names)
```

Here's a sample of the dataset visualized using Matplotlib.

```
plt.figure(figsize=(10, 10))
for images, labels in training_set.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
```

Prefetching data when training a machine learning model prevents data loading from becoming the bottleneck in the training process. Prefetching is done using the prefetch function, which allows you to set the buffer size manually or set it automatically at runtime by passing `tf.data.AUTOTUNE`

.

```
AUTOTUNE = tf.data.AUTOTUNE
training_ds = training_set.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
validation_ds = validation_set.cache().prefetch(buffer_size=AUTOTUNE)
```

Join the newsletter to receive the technical deep dives in your inbox.

We will train an image classification model using image augmentation and, without then, compare the results. Training is done under the following conditions:

- Image size of 128, you can increase this to 224 if you have the memory and GPU required to do so with this large dataset
- Training on Kaggle Notebooks using the P100 GPU
- Visualizing the results using Matplotblib, but you can use something more powerful like TensorBoard
- Training the model with 100 epochs and early stopping with a patience of 5

The results may differ from what you get because of the random initialization of the weights and biases of the model.

First, let's define the Keras model.

Start by training the model without any image augmentation.

```
epochs = 100
model = keras.Sequential([
layers.Rescaling(1./255),
layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Conv2D(filters=32,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Conv2D(filters=64,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.25),
layers.Dense(len(class_names), activation='sigmoid')])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
log_folder ="logs"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_folder, histogram_freq=1,write_graph=True,write_images=True, update_freq='epoch')
earlystopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
history = model.fit(training_ds,validation_data=validation_ds, epochs=epochs,callbacks=[earlystopping,tensorboard_callback])
```

Check out the How to Build CNN in TensorFlow tutorial to learn more about how to build CNN models and how they work.

Load the model metrics with Pandas and visualize them using Matplotlib.

```
metrics_df = pd.DataFrame(history.history)
loss, accuracy = model.evaluate(validation_set)
metrics_df[["loss","val_loss"]].plot();
metrics_df[["accuracy","val_accuracy"]].plot();
```

The model achieves an accuracy of 86% but is overfitting because the validation loss is higher than the training loss.

Next, train the model with RandAugment image augmentation. RandAugment performs random augmentations on the training dataset. Define the augmentation with the Keras Sequential layer for inclusion in the Keras Sequential model.

```
data_augmentation = keras.Sequential(
[
keras_cv.layers.RandAugment(
value_range=(0, 255),
augmentations_per_image=3,
magnitude=0.3,
magnitude_stddev=0.2,
rate=0.5,
) ])
```

You can visualize the RandAugment to see the augmented images.

```
plt.figure(figsize=(10, 10))
for images, _ in training_set.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
```

Join the newsletter to receive the technical deep dives in your inbox.

Include the augmentation layer as part of the Keras model.

```
model = keras.Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Conv2D(filters=32,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Conv2D(filters=64,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.25),
layers.Dense(len(class_names), activation='sigmoid')])
```

Next, train the model with this augmentation.

```
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=['accuracy'])
history = model.fit(training_ds,validation_data=validation_ds,epochs=epochs, callbacks=[earlystopping,tensorboard_callback])
```

With RandAugment, we get an accuracy of 86%, the curves are a bit smoother, but there is still some overfitting.

```
metrics_df = pd.DataFrame(history.history)
loss, accuracy = model.evaluate(validation_set)
metrics_df[["loss","val_loss"]].plot();
metrics_df[["accuracy","val_accuracy"]].plot();
```

The CutMix augmentation cuts an image randomly and places it on another, preventing the model from depending on any particular feature. MixUp merges two images. CutMix and MixUp augmentation prevents a model from overfitting on the training data. These augmentation also help the model perform better on testing data that are from a distribution that's different from the training data.

Define the CutMix and MixUp augmentation. The `to_dict`

function ensures that the data is in the format the augmentation layer expects.

```
def to_dict(image, label):
image = tf.cast(image, tf.float32)
label = tf.one_hot(label, num_classes)
return {"images": image, "labels": label}
AUTOTUNE = tf.data.AUTOTUNE
training_ds = training_set.shuffle(1000).map(to_dict)
validation_ds = validation_set.map(to_dict, num_parallel_calls=AUTOTUNE)
cut_mix = keras_cv.layers.CutMix()
mix_up = keras_cv.layers.MixUp()
def cut_mix_and_mix_up(samples):
samples = cut_mix(samples, training=True)
samples = mix_up(samples, training=True)
return samples
tada = training_ds.map(cut_mix_and_mix_up)
```

Visualize some of the augmented images.

```
image_iterator = iter(training_set)
image_batch, labels_batch = image_iterator.get_next()
output = cut_mix_and_mix_up({"images": image_batch, "labels": tf.cast(labels_batch, tf.float32)})
plt.figure(figsize=(10, 10))
for images in output['images']:
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images.numpy().astype("uint8"))
plt.axis("off")
```

Next, add the augmentation layer to the model and train it.

```
model = keras.Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Conv2D(filters=32,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Conv2D(filters=64,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.25),
layers.Dense(len(class_names), activation='sigmoid')])
def preprocess_for_model(inputs):
images, labels = inputs["images"], inputs["labels"]
images = tf.cast(images, tf.float32)
return images, labels
train_dataset = tada.map(preprocess_for_model, num_parallel_calls=AUTOTUNE)
train_dataset = train_dataset.prefetch(AUTOTUNE)
test_dataset = validation_ds.map(preprocess_for_model, num_parallel_calls=AUTOTUNE)
test_dataset = test_dataset.prefetch(AUTOTUNE)
model.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy'])
model.fit(train_dataset, validation_data = test_dataset, epochs=epochs, callbacks=[earlystopping,tensorboard_callback])
```

With CutMix and MixUp, training stopped after 8 epochs, but the graphs look similar to what we have seen before. The accuracy failed to reach even 50% after 8 epochs meaning that the CutMix and MixUp image augmentation might not be best suited for this dataset.

```
metrics_df = pd.DataFrame(history.history)
loss, accuracy = model.evaluate(test_dataset)
metrics_df[["loss","val_loss"]].plot();
metrics_df[["accuracy","val_accuracy"]].plot();
```

Join the newsletter to receive the technical deep dives in your inbox.

You can also apply multiple KerasCV augmentations such as RandAugment, Equalization, and Posterization. Define them in a Keras Sequential layer.

```
data_augmentation = keras.Sequential(
[
keras_cv.layers.RandAugment(
value_range=(0, 255),
augmentations_per_image=3,
magnitude=0.3,
magnitude_stddev=0.2,
rate=0.5,),
keras_cv.layers.Equalization(value_range=[0, 255]),
keras_cv.layers.Posterization(bits=4, value_range=[0, 255])
])
```

Visualize some augmented images:

```
plt.figure(figsize=(10, 10))
for images, _ in training_set.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
```

Next, add the augmentation layers to the model and train it.

```
model = keras.Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Conv2D(filters=32,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Conv2D(filters=64,kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.25),
layers.Dense(len(class_names), activation='sigmoid')])
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=['accuracy'])
history = model.fit(training_ds,validation_data=validation_ds,epochs=epochs, callbacks=[earlystopping,tensorboard_callback])
```

The results with the multiple augmentatons are quite similar to the previous one with a slight drop in accuracy.

```
metrics_df = pd.DataFrame(history.history)
loss, accuracy = model.evaluate(validation_set)
metrics_df[["loss","val_loss"]].plot();
metrics_df[["accuracy","val_accuracy"]].plot();
```

Even after trying multiple augmentations it still looks like we are still overfitting on the training images based on the fact that the validation loss curve is going up while the training loss in going down. You can try a pretrained model instead of desing a CNN from scratch to see if the results will be different.

💡

Learn how to build machine learning applications using Gradio using this free comprehensive Gradio Guide.

**Whenever you're ready, there is 2 way I can help you:**

If you're looking to accelerate your career, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. **Get your copy**.

**→ Data Science and Machine Learning Ebook**: I offer numerous free and paid data science and machine learning ebooks to help you in your data science and machine learning career.

**Models**for supported models and integrations**Prompts**for making it easy to

LangChain is an open-source tool for building large language model (LLM) applications. It supports a variety of open-source and closed models, making it easy to create these applications with one tool. Some of the modules in Langchain include:

**Models**for supported models and integrations**Prompts**for making it easy to manage prompts**Memory**for managing the memory between different model calls**Indexes**for loading, querying and updating external data**Chains**for creating subsequent calls to an LLM**Agents**to develop applications where the LLM model can direct itself**Callbacks**for logging and streaming the intermediate steps in a chain

You will see the use of these modules in this article as you build an application to transcribe YouTube videos and ask them questions.

Let's dive in.

Join the newsletter to receive the technical deep dives in your inbox.

Install the required packages:

- Langchain
- Openai
`python-dotenv`

for reading environment variables`pinecone-client`

to store embeddings`pytube`

for downloading YouTube videos`whisper`

for transcribing the video

```
pip install langchain docarray openai pytube python-dotenv tiktoken pinecone-client git+https://github.com/openai/whisper.git
```

Next, create a `.env`

file and add your Openai and Pinecone keys. Pinecine is a vector database for storing embeddings. This is particularly important for real applications where you want to persist and not process them in memory. Use this notebook to follow along.

```
OPENAI_API_KEY=OPENAI_API_KEY
PINECONE_API_KEY = PINECONE_API_KEY
PINECONE_API_ENV = PINECONE_API_ENV
```

Import the required packages and load the environment variables.

```
import os
import whisper
import tiktoken
import openai
import pinecone
import tempfile
import numpy as np
import pandas as pd
from pytube import YouTube
from uuid import uuid4
from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from IPython.display import display, Markdown
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
```

The first is to download YouTube videos and transcribe them. For example, use some of Lex's videos which are over two hours long. The following code was provided in one of ML School's sessions.

```
YOUTUBE_VIDEOS = ["https://www.youtube.com/watch?v=Z3_PwvvfxIU",
"https://www.youtube.com/watch?v=DxREm3s1scA"]
def transcribe(youtube_url, model):
youtube = YouTube(youtube_url)
audio = youtube.streams.filter(only_audio=True).first()
with tempfile.TemporaryDirectory() as tmpdir:
file = audio.download(output_path=tmpdir)
title = os.path.basename(file)[:-4]
result = model.transcribe(file, fp16=False)
return title, youtube_url, result["text"].strip()
transcriptions = []
model = whisper.load_model("base")
for youtube_url in YOUTUBE_VIDEOS:
transcriptions.append(transcribe(youtube_url, model))
df = pd.DataFrame(transcriptions, columns=["title", "url", "text"])
df.to_csv("text.csv")
df.head()
```

Store the text in a CSV file so you don't have to keep transcribing the same videos.

Join the newsletter to receive the technical deep dives in your inbox.

Large language models have a maximum number of tokens that they can accept. You, therefore, can't pass the entire text from the transcribed video as the content when asking a question. To go around this, you have to:

- Split the text into smaller pieces with a few tokens
- Creating embeddings for each piece
- Save the embeddings in a vector store such as Pinecone
- Pass the relevant embedding instead of the entire transcript embedding to Openai when someone asks a question

When a question is written, create an embedding for that question. Next, find embeddings in your vector store similar to that question's embeddings. Pick the top, say 4. Send these four embeddings to the large language model instead of all the embeddings since it's impossible to pass them all. Sending all the embedding will exceed the model's content length.

Split the text and save them in a CSV file:

```
MAX_TOKENS = 500
tokenizer = tiktoken.get_encoding("cl100k_base")
df = pd.read_csv("text.csv", index_col=0)
df["tokens"] = df.text.apply(lambda x: len(tokenizer.encode(x)))
def split_into_many(text, max_tokens):
# Split the text into sentences
sentences = text.split('. ')
# Get the number of tokens for each sentence
n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences]
chunks = []
tokens_so_far = 0
chunk = []
# Loop through the sentences and tokens joined together in a tuple
for sentence, token in zip(sentences, n_tokens):
# If the number of tokens so far plus the number of tokens in the current sentence is greater
# than the max number of tokens, then add the chunk to the list of chunks and reset
# the chunk and tokens so far
if tokens_so_far + token > max_tokens:
chunks.append(". ".join(chunk) + ".")
chunk = []
tokens_so_far = 0
# If the number of tokens in the current sentence is greater than the max number of
# tokens, go to the next sentence
if token > max_tokens:
continue
# Otherwise, add the sentence to the chunk and add the number of tokens to the total
chunk.append(sentence)
tokens_so_far += token + 1
# Add the last chunk to the list of chunks
if chunk:
chunks.append(". ".join(chunk) + ".")
return chunks
data = []
for row in df.iterrows():
title = row[1]["title"]
url = row[1]["url"]
text = row[1]["text"]
tokens = row[1]["tokens"]
if tokens <= MAX_TOKENS:
data.append((title, url, text))
else:
for chunk in split_into_many(text, MAX_TOKENS):
data.append((title, url, chunk))
df = pd.DataFrame(data, columns=["title", "url", "text"])
df["tokens"] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.to_csv("video_text.csv", index=False)
```

Load the data using LangChain's `CSVLoader`

.

```
file = "video_text.csv"
loader = CSVLoader(file_path=file)
docs = loader.load()
```

The Pinecone Index is useful for storing vector data and serving queries. You will compare the question embeddings to those stored in this index and return the most similar embeddings.

You can create a free index on Pinecone with the desired:

- Dimensions
- Name of the index
- Metadata you'd like to store
- Metric for performing similarity search, cosine is a common choice

```
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)
PINECONE_INDEX = "another-tube"
embedding_dimension = 1536
if PINECONE_INDEX not in pinecone.list_indexes():
pinecone.create_index(
PINECONE_INDEX,
dimension=embedding_dimension,
metric="cosine",
metadata_config={"indexed": ["title", "url"]},
)
index = pinecone.Index(PINECONE_INDEX)
index.describe_index_stats()
```

Join the newsletter to receive the technical deep dives in your inbox.

You have everything needed to start querying the transcribed YouTube video. Declare a database using the Pinecone Index, the docs created from the transcription, and Openai embeddings:

```
embeddings = OpenAIEmbeddings()
db = Pinecone.from_documents(docs, embeddings, index_name=PINECONE_INDEX)
```

The `from_documents`

command initializes a vector store from documents and stores all the embeddings.

Create a query and check the number of documents similar to that query in the vector store:

```
query = "What does Mr Beast say about succeeding on YouTube?"
docs = db.similarity_search(query)
len(docs)
# 4
```

There are multiple ways to get the answer. Let's use the `RetrievalQA`

chain here. This chain expects:

- The LLM model, Openiai chat model in this case
- The type of chain,
`stuff`

dumps all the documents into the context and makes one call to the LLM - The retriever for fetching documents and passing them to the LLM

```
retriever = db.as_retriever()
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
response = qa.run(query)
```

Setting the LLM's temperature to 0 ensures no randomness in the answer generation.

In this post, you have seen how to build large language model applications with LangChain and Openai. The beauty of using LangChain is that you can use different LLMs. For instance, you can switch the Openai LLM with another model provided by LangChain.

I am working on a web interface for this application. Reply to this email or comment below if you'd like to see it and learn more about the stack used to develop it.

Check out ML Schol if you are interested in these machine learning applications, particularly how to deploy them for real-world usage.

Join the newsletter to receive the technical deep dives in your inbox.

Here's how diffusion models work in plain English:

Diffusion adds noise gradually to the]]>

Image generation models are causing a sensation worldwide, particularly the powerful Stable Diffusion technique. With Stable Diffusion, you can generate images with your laptop, which was previously impossible.

Here's how diffusion models work in plain English:**1. Generating images involves two processes.**

Diffusion adds noise gradually to the image until its unrecognizable, and a reversion diffusion process removes the noise.

The models then try to generate new images from the noise image.**2. Denoing**

Denoising** **is** **done** **using** **convolutional** **neural** **networks** **such** **as UNet.

A UNet comprises an encoder for creating the latent representation of the image and a decoder for creating an image from the low-level image representation.**3. Gradual noise removal **

Noise is not removed from the image at once but is done gradually for the defined number of steps.

Removing noise step-by-step makes the process of generating images from pure noise easier.

Therefore, the goal is to improve upon the previous step.

**4. Generating the image in one step leads to a noisy image**

At each time step, a fraction of the noise and not the entire noise is removed.

The same concept is used in text-to-image generation, where you inject the textual information gradually instead of at once.

5. **Add textual information**

The text information is added by concatenating the text representation from a language model on the image input and also through cross-attention.

Cross-attention enables the CNN attention layers to attend to the text tokens.

6. **Train with small images**

Diffusion models are compute-intensive because of the number of steps involved in the denoising process.

This can be solved by training the network with small images and adding a network to upsample the result to larger images.

7. **Generate images in the latent space**

Latent diffusion models (LDM) solve this problem by generating the image in the latent space instead of the image space.

LDMs create a low-dimensional image representation by passing it through an encoder network. Apply noise to the image representation instead of the image.

8. **Reverse diffusion**

The reverse diffusion process works with the low-dimensional image representation instead of the image itself.

This is a less compute-intensive process because the model is not working with the entire image. As a result, you can perform image generation on your laptop.

DreamBooth is a technique for fine-tuning diffusion models with a few images while getting good results. This is a game-changer because training diffusion models from scratch requires a lot of images and is computationally expensive.

In this article, we will fine-tune it using KerasCV and TensorFlow.

```
pip install -q -U keras_cv
pip install -q -U tensorflow
```

By fine-tuning Stable Diffusion with DreamBooth, you can show the model a few images and have it generate similar images in various settings and locations.

DreamBooth uses **prior preservation** to ensure that the generated images are similar to concepts provided during fine-tuning. During fine-tuning, you'll need to provide:

- Unique class to uniquely describes the object you are fine-tuning, e.g., dog, person, etc.
- An
**identifier**that comes before the unique identifier, e.g., sks. - An
**instance promp**t that describes the concept you are fin-tuning, e.g., "a photo of sks person." **Class prompt**to describe the prompt without the unique identifier, e.g., "a photo of a person."**Instance images**representing the unique class, mostly 3-5 images, but more will give better results, especially when fine-tuning on faces.**Class images**to represent images generated using the class prompt.

Class images are used for prior preservation when fine-tuning DreamBooth. 200-300 images are usually sufficient. You can provide these images or generate them using the Stable Diffusion model.

Create some instances images of the concept you'd like to fine-tune, and then let's get going. For example, you can create an `instance_images`

folder with 5 images of yourself.

```
from imutils import paths
instance_images_root = 'instance-images'
class_images_root = 'class-images'
instance_image_paths = list(paths.list_images(instance_images_root))
class_image_paths = instance_image_paths
```

I downloaded images of the class `fantansy_world`

. Here's what they look like:

```
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
def load_images(image_paths):
images = []
for path in image_paths:
image = Image.open(path)
images.append(np.array(image))
return images
def plot_images(images, title=None):
plt.figure(figsize=(20, 20))
for i in range(len(images)):
ax = plt.subplot(1, len(images), i + 1)
if title is not None:
plt.title(title)
plt.imshow(images[i])
plt.axis("off")
plot_images(load_images(instance_image_paths[:5]))
```

Next, import all the libraries and modules needed for this process:

```
import tensorflow as tf
from keras_cv.models.stable_diffusion.clip_tokenizer import SimpleTokenizer
from keras_cv.models.stable_diffusion.diffusion_model import DiffusionModel
from keras_cv.models.stable_diffusion.image_encoder import ImageEncoder
from keras_cv.models.stable_diffusion.noise_scheduler import NoiseScheduler
from keras_cv.models.stable_diffusion.text_encoder import TextEncoder
```

Prepare the datasets in the format expected by the DreamBooth model. Hugging Face has provided the following scripts.

Preparing the captions:

```
# Since we're using prior preservation, we need to match the number
# of instance images we're using. We just repeat the instance image paths
# to do so.
new_instance_image_paths = []
for index in range(len(class_image_paths)):
instance_image = instance_image_paths[index % len(instance_image_paths)]
new_instance_image_paths.append(instance_image)
# We just repeat the prompts / captions per images.
unique_id = "sks"
class_label = "fantasy_world"
instance_prompt = f"a photo of {unique_id} {class_label}"
instance_prompts = [instance_prompt] * len(new_instance_image_paths)
class_prompt = f"a photo of {class_label}"
class_prompts = [class_prompt] * len(class_image_paths)
import numpy as np
import itertools
# The padding token and maximum prompt length are specific to the text encoder.
# If you're using a different text encoder be sure to change them accordingly.
padding_token = 49407
max_prompt_length = 77
# Load the tokenizer.
tokenizer = SimpleTokenizer()
# Method to tokenize and pad the tokens.
def process_text(caption):
tokens = tokenizer.encode(caption)
tokens = tokens + [padding_token] * (max_prompt_length - len(tokens))
return np.array(tokens)
# Collate the tokenized captions into an array.
tokenized_texts = np.empty((len(instance_prompts) + len(class_prompts), max_prompt_length))
for i, caption in enumerate(itertools.chain(instance_prompts, class_prompts)):
tokenized_texts[i] = process_text(caption)
# We also pre-compute the text embeddings to save some memory during training.
POS_IDS = tf.convert_to_tensor([list(range(max_prompt_length))], dtype=tf.int32)
text_encoder = TextEncoder(max_prompt_length)
gpus = tf.config.list_logical_devices("GPU")
# Ensure the computation takes place on a GPU.
with tf.device(gpus[0].name):
embedded_text = text_encoder(
[tf.convert_to_tensor(tokenized_texts), POS_IDS], training=False
).numpy()
# To ensure text_encoder doesn't occupy any GPU space.
del text_encoder
```

Preparing the images:

```
import keras_cv
resolution = 512
auto = tf.data.AUTOTUNE
augmenter = keras_cv.layers.Augmenter(
layers=[
keras_cv.layers.CenterCrop(resolution, resolution),
keras_cv.layers.RandomFlip(),
tf.keras.layers.Rescaling(scale=1.0 / 127.5, offset=-1),
]
)
def process_image(image_path, tokenized_text):
image = tf.io.read_file(image_path)
image = tf.io.decode_png(image, 3)
image = tf.image.resize(image, (resolution, resolution))
return image, tokenized_text
def apply_augmentation(image_batch, embedded_tokens):
return augmenter(image_batch), embedded_tokens
def prepare_dict(instance_only=True):
def fn(image_batch, embedded_tokens):
if instance_only:
batch_dict = {
"instance_images": image_batch,
"instance_embedded_texts": embedded_tokens,
}
return batch_dict
else:
batch_dict = {
"class_images": image_batch,
"class_embedded_texts": embedded_tokens,
}
return batch_dict
return fn
def assemble_dataset(
image_paths, embedded_texts, instance_only=True, batch_size=1
):
dataset = tf.data.Dataset.from_tensor_slices(
(image_paths, embedded_texts)
)
dataset = dataset.map(process_image, num_parallel_calls=auto)
dataset = dataset.shuffle(5, reshuffle_each_iteration=True)
dataset = dataset.batch(batch_size)
dataset = dataset.map(apply_augmentation, num_parallel_calls=auto)
prepare_dict_fn = prepare_dict(instance_only=instance_only)
dataset = dataset.map(prepare_dict_fn, num_parallel_calls=auto)
return dataset
```

Assembling the dataset:

```
instance_dataset = assemble_dataset(
new_instance_image_paths,
embedded_text[:len(new_instance_image_paths)],
)
class_dataset = assemble_dataset(
class_image_paths,
embedded_text[len(new_instance_image_paths):],
instance_only=False
)
train_dataset = tf.data.Dataset.zip((instance_dataset, class_dataset))
```

Hugging Face provides the DreamBooth training loop. The script only fine-tunes the UNet and not the text encoder.

```
https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.pyimport tensorflow.experimental.numpy as tnp
class DreamBoothTrainer(tf.keras.Model):
# Reference:
# https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py
def __init__(
self,
diffusion_model,
vae,
noise_scheduler,
use_mixed_precision=False,
prior_loss_weight=1.0,
max_grad_norm=1.0,
**kwargs
):
super().__init__(**kwargs)
self.diffusion_model = diffusion_model
self.vae = vae
self.noise_scheduler = noise_scheduler
self.prior_loss_weight = prior_loss_weight
self.max_grad_norm = max_grad_norm
self.use_mixed_precision = use_mixed_precision
self.vae.trainable = False
def train_step(self, inputs):
instance_batch = inputs[0]
class_batch = inputs[1]
instance_images = instance_batch["instance_images"]
instance_embedded_text = instance_batch["instance_embedded_texts"]
class_images = class_batch["class_images"]
class_embedded_text = class_batch["class_embedded_texts"]
images = tf.concat([instance_images, class_images], 0)
embedded_texts = tf.concat([instance_embedded_text, class_embedded_text], 0)
batch_size = tf.shape(images)[0]
with tf.GradientTape() as tape:
# Project image into the latent space and sample from it.
latents = self.sample_from_encoder_outputs(self.vae(images, training=False))
# Know more about the magic number here:
# https://keras.io/examples/generative/fine_tune_via_textual_inversion/
latents = latents * 0.18215
# Sample noise that we'll add to the latents.
noise = tf.random.normal(tf.shape(latents))
# Sample a random timestep for each image.
timesteps = tnp.random.randint(
0, self.noise_scheduler.train_timesteps, (batch_size,)
)
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process).
noisy_latents = self.noise_scheduler.add_noise(
tf.cast(latents, noise.dtype), noise, timesteps
)
# Get the target for loss depending on the prediction type
# just the sampled noise for now.
target = noise # noise_schedule.predict_epsilon == True
# Predict the noise residual and compute loss.
timestep_embedding = tf.map_fn(
lambda t: self.get_timestep_embedding(t), timesteps, dtype=tf.float32
)
model_pred = self.diffusion_model(
[noisy_latents, timestep_embedding, embedded_texts], training=True
)
loss = self.compute_loss(target, model_pred)
if self.use_mixed_precision:
loss = self.optimizer.get_scaled_loss(loss)
# Update parameters of the diffusion model.
trainable_vars = self.diffusion_model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
if self.use_mixed_precision:
gradients = self.optimizer.get_unscaled_gradients(gradients)
gradients = [tf.clip_by_norm(g, self.max_grad_norm) for g in gradients]
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
return {m.name: m.result() for m in self.metrics}
def get_timestep_embedding(self, timestep, dim=320, max_period=10000):
half = dim // 2
log_max_preiod = tf.math.log(tf.cast(max_period, tf.float32))
freqs = tf.math.exp(
-log_max_preiod * tf.range(0, half, dtype=tf.float32) / half
)
args = tf.convert_to_tensor([timestep], dtype=tf.float32) * freqs
embedding = tf.concat([tf.math.cos(args), tf.math.sin(args)], 0)
return embedding
def sample_from_encoder_outputs(self, outputs):
mean, logvar = tf.split(outputs, 2, axis=-1)
logvar = tf.clip_by_value(logvar, -30.0, 20.0)
std = tf.exp(0.5 * logvar)
sample = tf.random.normal(tf.shape(mean), dtype=mean.dtype)
return mean + std * sample
def compute_loss(self, target, model_pred):
# Chunk the noise and model_pred into two parts and compute the loss
# on each part separately.
# Since the first half of the inputs has instance samples and the second half
# has class samples, we do the chunking accordingly.
model_pred, model_pred_prior = tf.split(model_pred, num_or_size_splits=2, axis=0)
target, target_prior = tf.split(target, num_or_size_splits=2, axis=0)
# Compute instance loss.
loss = self.compiled_loss(target, model_pred)
# Compute prior loss.
prior_loss = self.compiled_loss(target_prior, model_pred_prior)
# Add the prior loss to the instance loss.
loss = loss + self.prior_loss_weight * prior_loss
return loss
def save_weights(self, filepath, overwrite=True, save_format=None, options=None):
# Overriding this method will allow us to use the `ModelCheckpoint`
# callback directly with this trainer class. In this case, it will
# only checkpoint the `diffusion_model` since that's what we're training
# during fine-tuning.
self.diffusion_model.save_weights(
filepath=filepath,
overwrite=overwrite,
save_format=save_format,
options=options,
)
```

Next, train the model:

```
# Comment it if you are not using a GPU having tensor cores.
tf.keras.mixed_precision.set_global_policy("mixed_float16")
use_mp = True # Set it to False if you're not using a GPU with tensor cores.
image_encoder = ImageEncoder(resolution, resolution)
dreambooth_trainer = DreamBoothTrainer(
diffusion_model=DiffusionModel(resolution, resolution, max_prompt_length),
# Remove the top layer from the encoder, which cuts off the variance and only
# returns the mean.
vae=tf.keras.Model(
image_encoder.input,
image_encoder.layers[-2].output,
),
noise_scheduler=NoiseScheduler(),
use_mixed_precision=use_mp,
)
# These hyperparameters come from this tutorial by Hugging Face:
# https://github.com/huggingface/diffusers/tree/main/examples/dreambooth
lr = 2e-6
beta_1, beta_2 = 0.9, 0.999
weight_decay = (1e-2,)
epsilon = 1e-08
optimizer = tf.keras.optimizers.experimental.AdamW(
learning_rate=lr,
weight_decay=weight_decay,
beta_1=beta_1,
beta_2=beta_2,
epsilon=epsilon,
)
dreambooth_trainer.compile(optimizer=optimizer, loss="mse")
import math
num_update_steps_per_epoch = train_dataset.cardinality()
max_train_steps = 1200
epochs = math.ceil(max_train_steps / num_update_steps_per_epoch)
print(f"Training for {epochs} epochs.")
ckpt_path = "dreambooth-unet.h5"
ckpt_callback = tf.keras.callbacks.ModelCheckpoint(
ckpt_path,
save_weights_only=True,
monitor="loss",
mode="min",
)
dreambooth_trainer.fit(train_dataset, epochs=epochs, callbacks=[ckpt_callback])
```

Once training is complete, saving the model is vital so that you don't have to train it again since training takes long.

Host the model on Hugging Face for free:

```
from huggingface_hub import notebook_login
from huggingface_hub import push_to_hub_keras
# Initialize a new Stable Diffusion model.
dreambooth_model = keras_cv.models.StableDiffusion(
img_width=resolution, img_height=resolution, jit_compile=True
)
dreambooth_model.diffusion_model.load_weights(ckpt_path)
notebook_login()
config = dreambooth_model.diffusion_model.get_config()
repo_id = "mwitiderrick/fantasy_dreambooth_diffusion_model"
push_to_hub_keras(dreambooth_model.diffusion_model, repo_id, config=config)
```

You can generate new images once the model is saved to your Hugging Face account.

```
from huggingface_hub import from_pretrained_keras
sd_dreambooth_model = keras_cv.models.StableDiffusion(
img_width=resolution, img_height=resolution, jit_compile=True,
)
loaded_diffusion_model = from_pretrained_keras("mwitiderrick/fantasy_dreambooth_diffusion_model")
sd_dreambooth_model._diffusion_model = loaded_diffusion_model
# Note how the unique idenitifer and the class have been used in the prompt.
prompt = f"A photo of {unique_id} {class_label}"
num_imgs_to_gen = 3
generated_img = sd_dreambooth_model.text_to_image(
prompt, batch_size=num_imgs_to_gen, num_steps=100,
)
plot_images(generated_img, prompt)
```

Getting good images when fine-tuning people's faces is challenging. Here are some tips for getting good results:

1. **Training steps**

When fine-tuning on a faces dataset, use more training steps such as 800-1200 at a batch size of two and a learning rate of 1e-6 to 2e-6.

2. **Use prior preservation**

Use prior preservation to prevent overfitting on the training faces.

Prior preservation reduces overfitting using images of the person together with other images from the class "person".

Stable diffusion can generate these `class-images`.

3. **Use more images**

Use 20-25 images of the same person in different angles, postures, and backgrounds.

Don't use images containing multiple persons.

4. **Tune prompts**

Using the right positive and negative prompts will make a world of difference between good-looking and bad-looking images.

Apart from fine-tuning DreamBooth from scratch, you can also use no-code platforms created for this purpose, for example, Leap API.

Training models like Stable Diffusion is 5% of the work. The rest is deployment. Very few people know how to build production-ready machine learning systems because they are difficult to deploy, monitor and maintain.

Check out ML school if you want to build end-to-end ML systems.

]]>Join the newsletter to receive the technical deep dives in your inbox.

VAEs work as follows:

- Map an input into a distribution over the latent space
- Pick a point from the distribution in the latent space
- Decode the sampled point and compute the reconstruction and KL Divergence errors.

The reconstruction error is the same as the one used in standard autoencoders. The KL Divergence error measures the distance between the distribution of the generated and original image.

Variational AutoEncoders are constrained in the normal distribution during training. You can, therefore, pick a point in the normal distribution, and the network will create a new image based on the training data.

Let's illustrate how to build a VAE model in Keras using the Fruits and Vegetables Image Recognition Dataset.

First, let's get the usual imports out of the way.

```
from keras.models import Model
from keras import backend as K
from keras import metrics
from keras.losses import mse
import numpy as np
from tensorflow.keras.layers import Input, Dense, Lambda, Conv2D, Flatten, Reshape, Conv2DTranspose,BatchNormalization,LeakyReLU,Dropout
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
```

Next, load the training and validation set.

```
base_dir = '/kaggle/input/fruit-and-vegetable-image-recognition/train'
batch_size = 32
img_size = 128
training_set = tf.keras.utils.image_dataset_from_directory(
base_dir,
label_mode="int",
validation_split=0.02,
subset="training",
seed=100,
image_size=(img_size, img_size),
batch_size=batch_size)
validation_set = tf.keras.utils.image_dataset_from_directory(
base_dir,
validation_split=0.2,
subset="validation",
seed=100,
image_size=(img_size, img_size),
batch_size=batch_size)
```

Since we don't need the target variables, we create training and validation data without them.

```
x_train = np.array([])
for x, y in training_set:
x_train = np.concatenate([x])
x_test = np.array([])
for x, y in validation_set:
x_test = np.concatenate([x])
```

Here's what the dataset looks like.

```
class_names = training_set.class_names
print(class_names)
plt.figure(figsize=(10, 10))
for images, labels in training_set.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
```

Finally, let's normalize the data as required when training deep learning models.

```
# Normalize pixel values between 0 and 1
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
```

The next step is to create the building blocks for the VAE model.

Join the newsletter to receive the technical deep dives in your inbox.

The VAE encoder outputs a mean and variance. As you can see below, it's a normal encoder defined using the Keras Functional API.

`latent_dim`

dictates the number of dimensions in the latent space. You can tweak this number to see how it affects the model's performance.

`shape_before_flattening`

gets the shape of the tensor `x`

, which will be used later in the decoder network to reshape the flattened tensor back to the original shape of the feature maps.

The output of `z_mean`

represents the mean of the normal distribution that generates the latent representation `z`

. The output `z_log_var`

represents the log variance of the normal distribution that produces the latent representation `z`

.

```
# Define input shape and latent dimension
latent_dim = 2
input_shape = (img_size, img_size, 3)
# Encoder network
inputs = Input(shape=input_shape)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(inputs)
x = Conv2D(32, (3, 3), activation='relu', strides=(2, 2), padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
shape_before_flattening = K.int_shape(x)
x = Flatten()(x)
z_mean = Dense(latent_dim)(x)
z_log_var = Dense(latent_dim)(x)
```

As noted earlier, we need a way to sample from the normal distribution. This is the purpose of the sampling function.

```
# Sampling function
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim))
return z_mean + K.exp(z_log_var / 2) * epsilon
# Reparameterization trick
z = Lambda(sampling)([z_mean, z_log_var])
```

Epsilon is the standard normal distribution which we randomly sample from. Since it's random, it's not trained. Therefore, the learned parameters will be the mean and standard deviation.

The reparameterization trick enables the computation of gradients because it is impossible to compute gradients over a stochastic process. Reparameterization makes the process deterministic. The Lambda layer enables the calculation of the sampling function.

The decoder creates an image from the sampled latent vector. It performs upsampling of the low dimensional latent vector.

```
# Decoder network
decoder_input = Input(K.int_shape(z)[1:])
x = Dense(np.prod(shape_before_flattening[1:]), activation='relu')(decoder_input)
x = Reshape(shape_before_flattening[1:])(x)
x = Conv2DTranspose(128, (2, 2), activation='relu', padding='same', )(x)
x = Conv2DTranspose(64, (2, 2), activation='relu', padding='same', strides=(2, 2))(x)
x = Conv2DTranspose(32, (2, 2), activation='relu', padding='same', )(x)
x = Conv2DTranspose(16, (2, 2), activation='relu', padding='same', )(x)
x = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
```

The decoder's input shape is the shape of the `z`

tensor. The input is then passed to a dense layer. The input to this dense layer is a product. Let's examine what it means.

`shape_before_flattening`

is the shape of the output tensor from the last convolutional layer in the encoder before flattening. The output tensor has shape `(batch_size, height, width, channels)`

. `shape_before_flattening[1:]`

corresponds to the dimensions `(height, width, channels)`

. The `Dense`

layer in the decoder network takes as input a tensor of shape `(batch_size, num_features)`

, where `num_features`

is the product of the dimensions `(height, width, channels)`

of the last convolutional layer output tensor in the encoder. Therefore, `np.prod(shape_before_flattening[1:])`

computes the value of `num_features`

, which is the number of features that the `Dense`

layer should output.

Let's take an intuitive example where the output tensor shape of the last convolutional layer in the encoder network is `(None, 8, 8, 64)`

where `None`

is the batch size, and 8,8,64 are the width, height, and number of channels, respectively. The number of features in this tensor can be computed using `np.prod(shape_before_flattening[1:])`

. This represents the product of all elements in `shape_before_flattening`

except the batch size, which is `None`

.

`np.prod(shape_before_flattening[1:])`

is, therefore, the same as:

```
num_features = shape_before_flattening[1] * shape_before_flattening[2] * shape_before_flattening[3]
num_features = 8 * 8 * 64 = 4096
```

In this case, 4096 becomes the number of units in the dense layer of the decoder.

The tensor is then reshaped into the same shape as the output of the final convolutional layer in the encoder network by `Reshape(shape_before_flattening[1:])(x)`

. Using the same example above, we can see that after flattening in the encoder, we will get the shape `(None, 4096)`

. The goal of `Reshape(shape_before_flattening[1:])(x)`

is to get back the 3D image before flattening. In this case, `(8, 8, 64)`

. The output of the `Reshape`

layer will be `(None, 8, 8, 64)`

. Hence the objective of this `Reshape`

layer is to unflatten the image enabling us to get back the 3D image from the 1D representation.

The original full-resolution image is obtained through a sequence of `Conv2DTranspose`

layers that perform convolution and upsampling at the same time. The aim is to get a final output tensor of the shape `(None, img_size, img_size, 3)`

.

With all the building blocks in place, the next step is to define the Keras VAE model. Passing an input image to the encoder produces the mean, standard deviation, and a sample from the latent space. The sample is passed to the decoder to obtain an image.

```
# Define the VAE model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
decoder = Model(decoder_input, x, name='decoder')
outputs = decoder(encoder(inputs)[2])
vae = Model(inputs, outputs, name='vae')
```

`encoder(inputs)`

produces output from the encoder. This output is `z_mean`

, `z_log_var`

, and `z`

. The decoder expects `z`

, which is the latent representation of the input image. `encoder(inputs)[2]`

gives `z`

because it's the value at index 2. `z`

is then passed to the decoder producing `outputs`

, an approximation of the original input tensor.

To visualize the VAE, you can use:

`vae.summary()`

or

`tf.keras.utils.plot_model(vae,"model.png", show_shapes = True,)`

The summary of the encoder:

The summary of the decoder:

Join the newsletter to receive the technical deep dives in your inbox.

The VAE loss function combines the reconstruction loss and the KL Divergence loss.

Let's define the two loss functions and add them to the VAE model.

```
# Define the VAE loss function
reconstruction_loss = mse(K.flatten(inputs), K.flatten(outputs))
reconstruction_loss *= input_shape[0] * input_shape[1] * input_shape[2]
kl_loss = -0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=1)
B = 1000
vae_loss = K.mean(B * reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
```

We can add the metrics in the same way.

```
vae.add_metric(kl_loss, name="kl_loss")
vae.add_metric(reconstruction_loss, name="reconstruction_loss")
```

The final step is to compile and train the VAE model.

```
vae.compile(optimizer='adam')
vae.fit(x_train, epochs=500, batch_size=batch_size, validation_data=(x_test, None))
```

Next, run some predictions using the test images.

```
import matplotlib.pyplot as plt
# Convert the predictions into images
decoded_imgs = vae.predict(x_test)
# Display the original and reconstructed images
n = 10 # number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
# Display the original image
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(img_size, img_size,3))
# plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display the reconstructed image
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(img_size, img_size,3))
# plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
```

The network can generate new images from the test images.

You can also try and make some predictions using pure noise from the normal distribution to see if the network can generate images from that.

```
import matplotlib.pyplot as plt
# Convert the predictions into images
num_samples = 10
random_latent_vectors = np.random.random((num_samples, img_size, img_size, 3))
decoded_imgs = vae.predict(random_latent_vectors)
# Display the original and reconstructed images
n = 10 # number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
# Display the original image
ax = plt.subplot(2, n, i + 1)
plt.imshow(random_latent_vectors[i].reshape(img_size, img_size,3))
# plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display the reconstructed image
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(img_size, img_size,3))
# plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
```

Try tweaking the network parameters to see if it can generate different images from pure noise.

Join the newsletter to receive the technical deep dives in your inbox.

In this article, you have learned how to create a Variational AutoEncoder in Keras and generate images from pure noise. Check out the Kaggle notebook to play with the code and the references to dive deeper into the topic.

Auto-Encoding Variational Bayes

An Introduction to Variational Autoencoders

**Whenever you're ready, there is 2 ways I can help you:**

If you're looking for a way to build a career while writing about data science and machine learning, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.

**→ **Data Science and Machine Learning Ebook: I offer numerous free and paid data science and machine learning ebooks to help you in your data science career. Check them out.

The data you will use is provided by Hugging Face for the AI or Not image classification competition. You can follow along with this Kaggle Notebook. Select the accelerator option with 2 GPUs.

The first step is to load the data from the directory containing the images.

```
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing import image_dataset_from_directory
size = 224
training_set = image_dataset_from_directory("/kaggle/input/aidata/train",shuffle=True,batch_size=32,image_size=(size, size))
val_dataset = image_dataset_from_directory("/kaggle/input/aidata/val",shuffle=True,batch_size=32,image_size=(size, size))
```

Interested in learning more about image classification with TensorFlow

Check our How to build CNN in TensorFlow tutorial.

Check our How to build CNN in TensorFlow tutorial.

Image augmentation helps improve the mode's performance by exposing it to images at various angles and aspect ratios.

Let's perform some basic image augmentation.

```
data_augmentation = keras.Sequential(
[keras.layers.RandomFlip("horizontal_and_vertical"),
keras.layers.RandomRotation(0.2),
])
```

Visualize the images using Matplotlib based on the augmentations defined above.

```
import numpy as np
import matplotlib.pyplot as plt
for images, labels in training_set.take(1):
plt.figure(figsize=(12, 12))
first_image = images[0]
for i in range(12):
ax = plt.subplot(3, 4, i + 1)
augmented_image = data_augmentation(
tf.expand_dims(first_image, 0)
)
plt.imshow(augmented_image[0].numpy().astype("int32"))
plt.axis("off")
```

TensorFlow provides various strategies for distributed training. One of them is the `MirroredStrategy`

which allows distributed training on multiple GPUs on a single machine. It creates one replica per GPU and mirrors all model variables across the replicas. The variables form one variable called `MirroredVariable`

.

Apply a `EfficientNetV2M`

via transfer learning. To train the model with mirrored strategy, create a `mirrored_strategy.scope()`

and define the model within that scope.

Apart from the model, the metrics and optimizer must be defined within the scope. Creating the variables within this scope leads to the creation of distributed variables.

Train the model normally after defining the variables within the `mirrored_strategy`

scope. `MirroredStrategy`

will perform the training on all the available GPUs or the one you'd define manually.

```
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
base_model = tf.keras.applications.EfficientNetV2M(
weights='imagenet',
input_shape=(size, size, 3),
include_top=False)
base_model.trainable = False
inputs = keras.Input(shape=(size, size, 3))
x = data_augmentation(inputs)
x = tf.keras.applications.efficientnet_v2.preprocess_input(x)
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
accuracy = keras.metrics.BinaryAccuracy()
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(),metrics=accuracy)
model.fit(training_set, epochs=10, validation_data=val_dataset)
```

Notice that the two GPUs on Kaggle are being used.

There are other training strategies you can try apart from the mirrored strategy; they include:

`TPUStrategy`

for training on TPUs.-
`MultiWorkerMirroredStrategy`

for distributed training across multiple workers. `ParameterServerStrategy`

a data-parallel method for training on multiple machines.`CentralStorageStrategy`

performs synchronous training without mirroring variables but places them on the CPU.

**Whenever you're ready, there is 2 way I can help you:**

If you're looking for a way to build a career while writing about data science and machine learning, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.

**→ **Data Science and Machine Learning Ebook: I offer numerous free and paid data science and machine learning ebooks to help you in your data science career. Check them out.

I have created technical content for various companies over the last 5 years.

Educating developers is how technology companies use to grow their communities. Developers hate being sold to, so this is the best way to get developers to use a company's product. The product should solve a

]]>I have created technical content for various companies over the last 5 years.

Educating developers is how technology companies use to grow their communities. Developers hate being sold to, so this is the best way to get developers to use a company's product. The product should solve a problem for the developer to convince them to try it.

Companies also need content to create awareness about the company and its offering. At some point, the developer may be in a role where they need the enterprise version of the company's offering and guess which product they will recommend– the one they have been using. This is where technical writing comes in.

A technical writing skill is a massive asset.

— Derrick Mwiti (@themwiti) January 25, 2023

I have written over 200 articles in the last 5 years.

While being paid $250 to $500 for each of them.

Here are the 5 simple steps to get your first paid article, even if you have never written a single blog post before : pic.twitter.com/8bTmNU1hit

Join the newsletter to receive technical writing tips in your inbox.

A **technical writer** is a person who educates developers by simplifying various technical concepts. The aim is to educate developers on various topics through technical documentation, blog posts, guides, and tutorials. This content helps establish the company as an expert in their domain.

Technical writers are not the only people who write technical content. Other roles that involve writing technical content include:

- Developer relations
- Developer advocate
- Technical documentation manager
- Developer educator

The current demand for technical content makes technical writing a valuable skill in your repertoire. Whether you want to build a career in a technical writing role or not, writing is an important skill for various reasons:

- Gain a thorough understanding of the technical concepts
- Help developers who are a few steps behind you understand the technical concepts
- Build a personal brand in the topics you are writing about
- Be part of a community where you can also get assistance

Why you should write technical blog posts

— Derrick Mwiti (@themwiti) January 25, 2023

Gain a better understanding

When you teach something, you understand it better.

Forces you to pay more attention to details to explain them easily to others

Writing also prepares you for other opportunities in the future, for instance, authoring books. With a library of content, creating a digital product in the future becomes easier than starting from scratch. You are also practicing for potential job opportunities in the future.

Join the newsletter to receive the technical deep dives in your inbox.

In 2022, I wrote two books.

— Derrick Mwiti (@themwiti) December 23, 2022

Feel exhilarating to hold the printed versions.#datascience #writing #machinelearning pic.twitter.com/bVDMOZ7xQX

Whether you want to build a technical writing career or write on the side, there are critical skills that you have to master:

- Coming up with ideas for technical content
- Researching the topics from various sources, such as books
- Tactics and techniques for promoting your work
- Formatting your work to make it skimmable
- SEO tactics that will enable your posts to rank on Google
- Ability to pitch companies to get technical writing work

There are 6 skills every top data science and ML writer knows about writing…

— Derrick Mwiti (@themwiti) January 1, 2023

The beginners don’t know.

Want them? Here they are...👇🏻

A technical article is made up of 3 key parts:

- Introduction
- Body
- Conclusion

Let's explore some ideas of how to create great intros and conclusions.

Each technical article you write will solve a specific problem for the reader. Here's a framework I use for writing introductions.

Let's look at it through the lens of the Accelerate Hugging Face Inference Endpoints with DeepSparse article.

**What is the problem?**Handling infrastructure**Who is facing the problem?**Technical experts**What is the consequence of this problem?**Long deployment times that impact the iteration rate on your model’s journey to production**What is the solution?**Hugging Face 🤗 Inference Endpoints**Why should you use this solution?**Generates an inference endpoint in minutes**Why is the solution better than the alternatives?**You don’t have to manage servers**What the article will cover**. How to quickly deploy a sentiment analysis pipeline as a Hugging Face Inference Endpoint**Expected results after using this solution**. Implied above

I would then divide these points into 3 paragraphs:

**Paragraph 1:** The problem. Who is facing the problem? What is the consequence of this problem?

**Paragraph 2:** The solution. Why is the solution better than the alternatives?

**Paragraph 3:** What will this article cover? Expected results after using this solution.

The body is where you cover the meat of the article. Use heading and subheading to separate different ideas. If the article is a how-to guide, you can use headings that look like this:

- Step 1: Installing Git LFS
- Step 2: Install the package
- Step 3: Downloading Deployment Files from the SparseZoo
- Step 4: Adding DeepSparse Pipeline to the Endpoint Handler

The conclusion covers the summary of the article with a call to action.

Here is a framework for crafting a great outro from the same article used above:

**What was covered in the article?**Showed how easy it is to set up an HTTP endpoint using the Hugging Face Inference Endpoints platform with DeepSparse**Advantages of using the solution proposed in the article.**Dramatic performance improvement and cost reduction.**Call to action and next steps.**View GitHub and join the Slack community

Join the newsletter to receive the technical deep dives in your inbox.

I get messages from people about what they should write about. When starting, writing what you are currently doing will be easier. For instance, if you are taking a course on Pandas, you can write articles about that. However, since these are technical concepts, you'll have to spend time learning a topic if you are not already familiar with it. Here are some more ideas for finding and validating ideas.

You might already have content to write about without knowing it. Look at the projects you are currently working on or projects you have done in the past. The beauty of this is that the content is ready, and the only thing left is to document it.

In the spirit of doubling the audacity and putting into practice the advice from @themwiti , 🙏

— Grace Musungu (@Musungutt) January 24, 2023

I just published my first article 'Fake News Detection Project' Using Machine Learning. Here is the link https://t.co/ltlchdfvSh @_jumaallan 🙏🙏 pic.twitter.com/EanQ9BVZdB

Other technical blogs can also inspire what to write. For instance, look at other web development or data science blogs. If you find an interesting topic, you can research it and write about it. If you use content from that blog, you should reference it.

You can also get ideas from technical books you have read or are currently reading. Mentioning the author when promoting the content can also help you reach a wider audience.

Writing something unique about a popular topic leads to many reads on your articles. Reddit is one great place for sourcing popular technical topics. Visit the subreddit related to the topic you want to write about and filter for top posts.

Writing about a popular topic can get your content trending, especially if you approach it from a different angle. For instance, chatGPT is a hot topic at the moment.

Join the newsletter to receive the technical deep dives in your inbox.

When you are done writing, let the words sit for a few minutes, then come back to editing with a new perspective. Don't try to edit as your write; it will slow you down. Write to the end and treat that as a first draft. Next, edit that draft to make it better.

Here is what my current editing checklist looks like.

Use Grammarly to check for grammar and typos.

Check the article with the provided style guide, for example:

- Capitalize words in the title and subheadings
- Start heading and subheading with action verbs
- Start list items with a word that ends with
*s*or*ing*whenever possible - Proper abbreviation formatting, e.g., regions of interest (RoIs)
- No fullstop at end of list

Some words in a sentence can be removed, and the sentence will still be understandable. For example:

~~You will be able to~~/ You can~~You have to install the package~~/ Install the package~~You should be sure~~/ Ensure~~Which is~~~~That is~~~~will~~~~Came up with~~/ Developed~~You can use~~/ Use~~Enables the generation~~/ Generates

Maintain consistency in naming. For example, if you start with ResNet50, use it throughout the post. Don’t switch between ResNet-50, Res-Net50, etc

Write in active voice using the first person “You”.

Sometimes a statement can be made less wordy by removing *of* and *in*. For instance:

~~reduces the cost of production~~/ reduces production cost~~ensuring optimal usage of resource~~/ ensuring optimal resource usage

Check for proper in-line styling:

- Bold important words or phrases
- Use monospace font for in-line
`code`

Delete or rewrite repeated information.

Use short sentences (3-5 sentences) to make the content skimmable

Bad: To learn more about Keras check this blog.

Bad: To learn more about Keras click here.

Recommended: To learn more about Keras, check the Quickstart guide.

Here are some more technical writing resources:

Join the newsletter to receive the technical deep dives in your inbox.

Technical writing principles can accelerate your career as a technical writer. Here are some principles to keep in mind:

**Don't start a blog**. When starting, you need to publish in places where you can be found. A personal blog is not that place. Instead, publish in places that already have readers, such as Medium.com. Apart from the factor that publications with a reader base will recommend your articles to their readers, you also stand a better chance of being found by people looking for writers.**Don't be salesy**. Developers can smell sales copy from a mile away. Don't try to sell developers on the company's offerings. Instead, teach them and let them decide whether they will use the product.**Go deep**. Don't skim over technical concepts. Explain them such that the reader understands them in-depth.**Visualization is kin**g. Show, don’t tell. Use GIFs, screenshots, and videos to explain complex topics. It is much easier to understand a technical topic after looking at a visual illustration.**Treat your ideas like cattle, not pets**. Don't be attached to your article ideas. If you are attached, you will be disappointed if a certain article doesn't perform as well as you thought. Instead, focus on churning out as much high-quality content as possible.**Track views and repeat well-performing ideas.**Track how your articles perform and double down on what works. For instance, you can create more MLOps content if you realize such articles get more readership.**Use plain English**. Avoid Jargon. Write at the level of an 8th grader, no one wants to spend time googling words when reading an article.**Provide all assets**. Developers like trying things for themselves. Provide all the assets they need to reproduce what they are learning.**Write for a specific developer**. Know your developer. Writing for a specific audience dictates how you'll write the articles. For example, if you are writing for TensorFlow beginners, you have to explain the process of loading data, something you don't have to do when writing for advanced users.**Make your content skimmable**. Avoid large walls of text. Write in short paragraphs that make it easy for the reader to skim and find what they are looking for. You can also make the content skimmable by writing clear headings and subheadings.**Hit publish**. Your first pieces of content won't be the ones you are most proud of. Don't obsess with numerous editing iterations. Hit publish, you can always come back and edit the content in the future.**Consistency breeds consistency**. The more you write, the more you write. Don't give up after publishing the first article. Keep writing if you want to keep writing.**Fact check**. Since you are writing technical content, check that it's technically factual. You can do this by reading research papers or talking to experts on that topic.**Use transitions**. Keep your content engaging by adding transitions from one section to the other.**Check your grammar**. With free grammar tools such as Grammarly, it has never been easier to fix grammar issues in your writing. Articles with numerous grammatical issues and typos are hard to read, making this an important item to review in your technical articles.

Technical writing can lead to numerous opportunities, including getting full-time or freelance roles.

Opportunities you can get as a result of technical writing.

— Derrick Mwiti (@themwiti) January 27, 2023

💡Get paid to write

💡Get technical jobs

💡Speaking engagements

💡Podcast appearances

💡Join technical programs such as GDE

I am convinced you will get technical writing and other technical jobs faster if you consistently create content. Your content is like a silent resume on the internet. The content will work for you even when you are asleep.

Single piece of advice I give to anyone trying to penetrate the data science and machine learning space is “Start writing.”

— Derrick Mwiti (@themwiti) January 19, 2023

Seems intimidating, but you get the hang of it as you keep writing. The biggest handle is starting. When you get over that first article, you are set.

Before you start applying for technical writing jobs, I recommend you first take these 5 steps:

- Pick 3-10 technical topics

2. Read and research these topics

3. Create 3-10 high-quality writing samples

4. Get an experienced writer to review them to ensure high-quality

5. Submit your work to publications with numerous readers on and off Medium

Link your LinkedIn profile on every article you write and wait for the LinkedIn DMs asking you if you can create content for companies.

Join the newsletter to receive the technical deep dives in your inbox.

You don't need any fancy equipment to create technical content. However, some tools can improve your writing experience and help you create engaging content.

Here are the tools I use:

**Grammarly**for checking typos and Grammar issues**TinyWow**or**ezgif.com**to create GIFs from video**Canva**for creating book covers and article feature images**Xnapper**for taking beautiful screenshots**DaVinci Resolve**to edit videos**Affinity**Designer and Photo for simple designs**Google****Docs**for writing**Code blocks**add-on for code formatting**Ghost**for writing a personal blog

Starting a blog is not advisable if you are starting your technical writing journey. However, you should write in a publication with a large following to get your name seen. These publications also promote your content to their readers. Some great examples are:

- KDnuggets
- Hashnode
- Towards Data Science
- Dev.to
- Freecodecamp
- GeekCulture on Medium

Start by writing on a personal Medium account because you are unlikely to get accepted into these publications without a few writing samples.

To become a prolific technical writer, you must craft a routine for ideating, researching, and creating content. I suggest that you use the strategy below.

Technical Writing Routinepic.twitter.com/8daVKKLKpz

— Derrick Mwiti (@themwiti) January 9, 2023

**Whenever you're ready, there is 2 way I can help you:**

If you're looking for a way to build a career while writing about data science and machine learning, I'd recommend starting with an affordable ebook:

**→** **Writing for Data Scientists:** The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.

**→ Data Science and Machine Learning Ebook**: I offer numerous free and paid data science and machine learning ebooks to help you in your data science career. Check them out.

I earned $300 for my first paid data science

]]>I earned $300 for my first paid data science and machine learning article. I get paid between $250 and $500 for each data science article I write. In this ebook, I'll show you how you too, can earn while writing about data science and machine learning.

- You have been learning about data science and machine learning
- You want to start writing to build a personal brand in data science
- You have or are yet to put out at least one piece of content
- You have no idea where to start
- You don't know which topics to write about
- You are wondering which publications you should submit your work to
- Apart from building a brand, you also want to be paid to write
- You want to create data science and machine learning content as a freelancer, contractor, or in a full-time role
- Maybe you are already blogging about data science but wondering where and how to get paid jobs
- Or how to monetize your data science and machine learning content

If you identify with any of the above, keep reading 👇

**What You'll Get**

You will get an ebook that shows you how to move from writing free data science articles to getting paid up to $500 per article. The ebook covers everything you need to know about writing in the data science and machine learning space from my 5+ years of experience.

**Topics:**

**✍️ How to write about data science: **We'll start with why you should consider writing**. **When you know how writing about data science and machine learning can change your life, you will be more motivated to dive in. Next, I'll show you how to write about data science and machine learning. What should you write about? You must create more writing samples to keep growing and getting more jobs. I'll show you ways of generating and validating data science article ideas.

**🙅♂️How to deal with rejection when you start writing: **You will get some rejections when looking for data science writing jobs. How do you get past them until you get a job that pays? In this chapter, I explore some strategies that have worked for me.

**💸 How to make a full-time income writing about data science: **Can you make a full-time income writing about data science and machine learning? In this chapter, we do some math to see how to make it possible. I'll also show you how to monetize your data science content apart from being paid to write.

**🌎 SEO for data science writers: **Getting your data science and machine learning articles seen is crucial. Understand the techniques I have used to get my articles on the first page of Google.

**🗓 Where to find writing jobs:** Where are the machine learning and data science writing jobs? In this chapter, I share strategies that have helped me get a consistent flow of data science writing work. Understand how you can leverage your data science articles to get full-time roles such as developer advocate or developer relations.

**🏆 Promoting your work: **You can't only rely on SEO to promote your data science articles. in this chapter, I share the techniques and channels I use to share my data science and machine learning blog posts.

**📗 Writing templates: **In this section, I share templates that have worked for me, including Upwork sample proposals, LinkedIn notes and messages, and email templates. I also walk through real examples of writing a data science topic, from getting a topic, researching, creating the code, writing the outline, and drafting the article

**Praise for Derrick's work**

Here are some messages I have received on LinkedIn. Imagine how you will feel when you start getting positive feedback on your writing. Let me teach you how!

"Nice article here! I'm the main contributor to the Neuraxle library." **Software Senior Principal Developer**

"Hi Derrick, I run into your useful post about data science learning paths, a fascinating topic I am trying to grasp and learn at the moment. I would like to have you part of my LinkedIn network and follow your posts. Thank you and have a great day" **IT HR Roadmap Lead**

"love your article on neptune about tensorboard! we are trying to integrate tensorboard with kubeflow and your tutorial was a great help" **Machine Learning Engineer**

"Great articles and content! Cheers" **Associate Manager, Statistical Genetics**

"Howdy! I don't think I have met anyone else who is a data scientist and a writer. It's nice to meet you. Have a spiffy day" **Data Scientist**

"Hey Derrick, I have just started exploring technical writing. Your expertise in the field of writing can really help me a lot. I would be more than grateful to have a productive conversation with you." **Senior Engineer**

"Dear Derrick. Thank you very much for your article on Python decorators, it was very useful for me. It is very well explained and the examples displayed show the concepts with simplicity. If it's ok with you I'd like to share this comment in Linkedin as well. Once again thank you" **Network Architect**

"I have read some of your articles on neptune and I've learned a lot from them. Writing is something that has been quite tricky for me, and I was wondering if you could mentor me, also since I am about starting a career in AI" **Research Intern**

"Hi Derrick, I found your article about Random Forest Regression on Neptune AI and found it very useful - thanks!" **Data Scientist**

"I found on the web your article that predicts employee turnover — it is brilliantly written. Very clear and easy to understand. Thank you for writing it" **Senior Human Resources Analyst**

"Hi Derrick, I found your articles really insightful and informative. I am also a freelance writer for neptune." **Python Developer**

"Hi Derrick, I read your Model Distillation blog post on Medium and found it really interesting. Thanks for sharing! Would love to connect." **Staff Machine Learning Engineer**

"Hi Derrick, I read your article in 'Towards Data Science' on blogging about data science. It was really helpful, ignited a spark in me to start blogging and contribute to this community. I'll be glad to connect with you and read more of your writings!" **Data Science Specialist**

"Saw an article that you wrote on Medium and wanted to connect to follow what you are doing with Data Science now and in the future. Thanks!" **Senior Software Engineer**

"Hi Derrick, I enjoyed your article on Neptune on Image Segmentation tips from Kaggle ! Would like to keep in touch" **Associate Director of Applied Machine Learning**

"You picked a good topic with your article "Best of Machine Learning in 2019: Reddit Edition". I want to add you to my LinkedIn network." **Financial Services Analyst**

"Hello Derrick,Came across your writing on towards data science. I'd like to connect with you to learn and be in your network. " **Data Scientist**

"Hi Derrick--read your blog on activation functions and would be grateful to connect to you." **Chief Data Scientist**

"Hello Derick, I am a junior data scientist, I would like to connect with relevant people. BTW, I came across you when I read your column in Medium about "blogging about Data Science" which is really helpful. Nice to meet you!" **Aeronautical QA Engineer**

"Thank you for a great article on the style transfer. I have some ideas regarding the practical application of this technology." **Senior Software Engineer**

"Derrick, great piece on predicting Employee Retention Using Keras and TensorFlow. I'm learning Python and having a lot of fun with this!" **Director of People Strategy & Analytics**

"Hi Derrick Was reading your articles on medium , really loved them . would like to connect" **Head Of AI & Innovations**

"Hi Derrick I read your lstm writeup on KD website. Very clear and one of the easiest to understand!" **Equity analyst**

"Thanks for the connect. I learnt a great deal from one of your articles on Medium, on Recommender Systems. Great read!" **Data Scientist**

"I found this article is so AMAZING, I think that is of great value to our Chinese readers, may I ask would you mind if I translate it into Chinese and reach our readers? Of course, the Chinese edition will add the URL and title of the original, if our readers need, then can back here by clicking the link!" **Editor of InfoQ China**

**Book Sample**

Here's a __sample from the book__. No email address necessary.

**30-Day Money-Back Guarantee**

If you do not like the book for any reason, you can request a full refund within 30 days of your purchase. No questions asked.

]]>