How to Build LLM Applications With LangChain and Openai

How to Build LLM Applications With LangChain and Openai

LangChain is an open-source tool for building large language model (LLM) applications. It supports a variety of open-source and closed models, making it easy to create these applications with one tool. Some of the modules in Langchain include:

  • Models for supported models and integrations
  • Prompts for making it easy to manage prompts
  • Memory for managing the memory between different model calls
  • Indexes for loading, querying and updating external data
  • Chains for creating subsequent calls to an LLM
  • Agents to develop applications where the LLM model can direct itself
  • Callbacks for logging and streaming the intermediate steps in a chain

You will see the use of these modules in this article as you build an application to transcribe YouTube videos and ask them questions.

Let's dive in.


Installing Required Packages

Install the required packages:

  • Langchain
  • Openai
  • python-dotenv for reading environment variables
  • pinecone-client to store embeddings
  • pytube for downloading YouTube videos
  • whisper for transcribing the video
pip install langchain docarray openai pytube python-dotenv tiktoken pinecone-client git+https://github.com/openai/whisper.git

Next, create a .env file and add your Openai and Pinecone keys. Pinecine is a vector database for storing embeddings. This is particularly important for real applications where you want to persist and not process them in memory. Use this notebook to follow along.

OPENAI_API_KEY=OPENAI_API_KEY
PINECONE_API_KEY = PINECONE_API_KEY
PINECONE_API_ENV = PINECONE_API_ENV

Import Packages

Import the required packages and load the environment variables.

import os
import whisper
import tiktoken
import openai
import pinecone
import tempfile
import numpy as np
import pandas as pd
from pytube import YouTube
from uuid import uuid4
from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from IPython.display import display, Markdown
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone

Transcribe YouTube Videos

The first is to download YouTube videos and transcribe them. For example, use some of Lex's videos which are over two hours long. The following code was provided in one of ML School's sessions.

YOUTUBE_VIDEOS = ["https://www.youtube.com/watch?v=Z3_PwvvfxIU",
                  "https://www.youtube.com/watch?v=DxREm3s1scA"]
def transcribe(youtube_url, model):
    youtube = YouTube(youtube_url)

    audio = youtube.streams.filter(only_audio=True).first()

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        title = os.path.basename(file)[:-4]
        result = model.transcribe(file, fp16=False)

    return title, youtube_url, result["text"].strip()


transcriptions = []
model = whisper.load_model("base")

for youtube_url in YOUTUBE_VIDEOS:
    transcriptions.append(transcribe(youtube_url, model))

df = pd.DataFrame(transcriptions, columns=["title", "url", "text"])
df.to_csv("text.csv")

df.head()

Store the text in a CSV file so you don't have to keep transcribing the same videos.

Split the Text into Many

Large language models have a maximum number of tokens that they can accept. You, therefore, can't pass the entire text from the transcribed video as the content when asking a question. To go around this, you have to:

  • Split the text into smaller pieces with a few tokens
  • Creating embeddings for each piece
  • Save the embeddings in a vector store such as Pinecone
  • Pass the relevant embedding instead of the entire transcript embedding to Openai when someone asks a question

When a question is written, create an embedding for that question. Next, find embeddings in your vector store similar to that question's embeddings. Pick the top, say 4. Send these four embeddings to the large language model instead of all the embeddings since it's impossible to pass them all. Sending all the embedding will exceed the model's content length.

Split the text and save them in a CSV file:

MAX_TOKENS = 500
tokenizer = tiktoken.get_encoding("cl100k_base")
df = pd.read_csv("text.csv", index_col=0)
df["tokens"] = df.text.apply(lambda x: len(tokenizer.encode(x)))

def split_into_many(text, max_tokens):
    # Split the text into sentences
    sentences = text.split('. ')

    # Get the number of tokens for each sentence
    n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences]
    
    chunks = []
    tokens_so_far = 0
    chunk = []

    # Loop through the sentences and tokens joined together in a tuple
    for sentence, token in zip(sentences, n_tokens):

        # If the number of tokens so far plus the number of tokens in the current sentence is greater 
        # than the max number of tokens, then add the chunk to the list of chunks and reset
        # the chunk and tokens so far
        if tokens_so_far + token > max_tokens:
            chunks.append(". ".join(chunk) + ".")
            chunk = []
            tokens_so_far = 0

        # If the number of tokens in the current sentence is greater than the max number of 
        # tokens, go to the next sentence
        if token > max_tokens:
            continue

        # Otherwise, add the sentence to the chunk and add the number of tokens to the total
        chunk.append(sentence)
        tokens_so_far += token + 1
        
    # Add the last chunk to the list of chunks
    if chunk:
        chunks.append(". ".join(chunk) + ".")

    return chunks


data = []
for row in df.iterrows():
    title = row[1]["title"]
    url = row[1]["url"]
    text = row[1]["text"]
    tokens = row[1]["tokens"]

    if tokens <= MAX_TOKENS:
        data.append((title, url, text))
    else:
        for chunk in split_into_many(text, MAX_TOKENS):
            data.append((title, url, chunk))

df = pd.DataFrame(data, columns=["title", "url", "text"])
df["tokens"] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.to_csv("video_text.csv", index=False)

Loading Data With LangChain

Load the data using LangChain's CSVLoader.

file = "video_text.csv"

loader = CSVLoader(file_path=file)
docs = loader.load()

Create a Pinecone Index

The Pinecone Index is useful for storing vector data and serving queries. You will compare the question embeddings to those stored in this index and return the most similar embeddings.

You can create a free index on Pinecone with the desired:

  • Dimensions
  • Name of the index
  • Metadata you'd like to store
  • Metric for performing similarity search, cosine is a common choice
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)

PINECONE_INDEX = "another-tube"
embedding_dimension = 1536
if PINECONE_INDEX not in pinecone.list_indexes():
    pinecone.create_index(
        PINECONE_INDEX,
        dimension=embedding_dimension,
        metric="cosine",
        metadata_config={"indexed": ["title", "url"]},
    )

index = pinecone.Index(PINECONE_INDEX)
index.describe_index_stats()

Question answering With Pinecone and Openai

You have everything needed to start querying the transcribed YouTube video. Declare a database using the Pinecone Index, the docs created from the transcription, and Openai embeddings:

embeddings = OpenAIEmbeddings()
db = Pinecone.from_documents(docs, embeddings, index_name=PINECONE_INDEX)

The from_documents command initializes a vector store from documents and stores all the embeddings.

Create a query and check the number of documents similar to that query in the vector store:

query = "What does Mr Beast say about succeeding on YouTube?"
docs = db.similarity_search(query)
len(docs)
# 4

There are multiple ways to get the answer. Let's use the RetrievalQA chain here. This chain expects:

  • The LLM model, Openiai chat model in this case
  • The type of chain, stuff dumps all the documents into the context and makes one call to the LLM
  • The retriever for fetching documents and passing them to the LLM
retriever = db.as_retriever()
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)
response = qa.run(query)

Setting the LLM's temperature to 0 ensures no randomness in the answer generation.


In this post, you have seen how to build large language model applications with LangChain and Openai. The beauty of using LangChain is that you can use different LLMs. For instance, you can switch the Openai LLM with another model provided by LangChain.

I am working on a web interface for this application. Reply to this email or comment below if you'd like to see it and learn more about the stack used to develop it.

Check out ML Schol if you are interested in these machine learning applications, particularly how to deploy them for real-world usage.