How to Detect AI Generated Content With TensorFlow

Derrick Mwiti
Derrick Mwiti

Table of Contents

With the plethora of open-source language models, it's incredibly difficult to determine if a piece of text is AI generated. However, with a good dataset, you can train a model in TensorFlow to detect if a large language model generated text. It's such an interesting problem that there is even a Kaggle competition dedicated to solving it.

In this blog post, we'll take a stab at solving this problem using TensorFlow.

Getting Started

We kick off by importing all the required modules:

  • Pandas to load the dataset
  • array to convert the text to NumPy arrays
  • Matplotlib to plot the test and validation chats
  • TensorFlow utilities for building the network
from numpy import array
from sklearn.model_selection import train_test_split 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Bidirectional

Load the Data

To train the model, we use the DAIGT V2 Train Dataset. Load the dataset with Pandas and drop any duplicates in the dataset.

test = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/test_essays.csv')
sub = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/sample_submission.csv')
org_train = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/train_essays.csv')

train = pd.read_csv("/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv", sep=',')
train = train.drop_duplicates(subset=['text'])

Display part of the training set:


For this exercise, we are interested in the text and label columns.

Split the Data

Next, split the data into a training and validation set using Scikit-learn. We'll use 80% for training and 20% for validation.

docs = train['text']
labels = array(train['label'])
X_train, X_test , y_train, y_test = train_test_split(docs, labels , test_size = 0.2, random_state=0)

Text Vectorization in TensorFlow

Deep learning models don't understand raw text. We, therefore, have to convert the text to a numerical representation. In TensorFlow, this is done using the TextVectorization layer. Given text, the layer will create a sequence of integers. Some of the arguments you can pass to the layer are:

  • standardize to apply specific standardizations to the text, for example, lower_and_strip_punctuation will lowercase all the text and remove punctuation.
  • max_tokens to determine the vocabulary size.
  • output_mode dictates the output of the layer, for example, int will output integers.
  • output_sequence_length ensure that the text is padded or truncated to the maximum sequence length
max_features = 150000  # Maximum vocab size.
batch_size = 32
max_len = 300 # Sequence length to pad the outputs to.
vectorize_layer = tf.keras.layers.TextVectorization(standardize='lower_and_strip_punctuation',

vocab_size = vectorize_layer.vocabulary_size()
X_train_padded =  vectorize_layer(X_train)
X_test_padded =  vectorize_layer(X_test)
test_data = vectorize_layer(test['text'])

Create TensorFlow Dataset

Next, create a TensorFlow dataset and create batches. Setting up a TensorFlow dataset allows you to configure further data settings such as prefetching the data with an automatic buffer size.

training_data =, y_train))
validation_data =, y_test))
training_data = training_data.batch(batch_size)
validation_data = validation_data.batch(batch_size)

Setup Pretrained Word Embeddings

A word embedding is a representation of text data in a vector space such that similar words appear close to each other. In this case, words that are more likely to be generated by an LLM may be close to each other. In TensorFlow, we can use the Embedding layer to achieve that. You can either train one from scratch or use a pre-trained one. In this case, we will do the latter.

We will use the pre-trained GloVe embeddings to initialize a pretrained Embedding layer. The process involves loading the embeddings into an embeddings_index dictionary.

embeddings_index = {}
f = open('/kaggle/input/glove7b/glove.6B.300d.txt')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs

print('Found %s word vectors.' % len(embeddings_index))

For example, here is part of the vector representation of the word word:

The next step is to create an embedding matrix by looking at every word in our vocabulary and fetching its embedding matrix from the embeddings_index dictionary. If a word can't be found it will be represented by zeros.

embedding_matrix = np.zeros((len(vocabulary) + 1, max_len))
for i,word in enumerate(vocabulary):
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

Create TensorFlow Model

To create the network, we start by creating an embedding layer from the computed embedding matrix. We do this by initializing the Embedding layer and setting the weights to the embedding matrix. Setting trainable to False ensures that the layer is not trained again.

input_dim is set asvocab_size + 1 representing the size of the vocabulary. The second argument, output_dim is the dimension of the dense embedding. input_length is the length of the input sequences.

embedding_layer = Embedding(vocab_size + 1,

Define the model using the Keras Sequential layer:

model = Sequential([
Bidirectional(LSTM(256, return_sequences=True)),
Bidirectional(LSTM(128, return_sequences=True)),
Bidirectional(LSTM(64, return_sequences=True)),
Dense(300, activation='relu'),
Dense(150, activation='relu'),
Dense(75, activation='relu'),
Dense(24, activation='relu'),
Dense(1, activation='sigmoid')

The network uses a bidirectional LSTM to ensure that information flows in both directions. Since it's a binary classification problem, the final output has 1 unit and the sigmoid activation fiction.

Train TensorFlow Model

Compile and train the model:

num_epochs = 20
history =, epochs=num_epochs, validation_data=validation_data)

Plot Model Evaluation Charts

When training is complete, we can plot the training and validation charts using Matplotlib:

plt.title('model accuracy')
plt.legend(['train', 'val'], loc='upper left')

Finally, you can run predictions on the test set.

final_preds = model.predict(test_data)

Final Thoughts

Some of the things you could do to improve this model include:

  • Use a different metric because the dataset is slightly imbalanced
  • Try a Transformer network instead
  • Source a different and better dataset

Kaggle notebook


Derrick Mwiti Twitter

Google Developer Expert - Machine Learning