How to Build Large Language Model Applications with PaLM API and LangChain

How to Build Large Language Model Applications with PaLM API and LangChain

You can now use Generative AI Studio on Vertex AI to prompt, tune and deploy Google's foundational models, including PaLM 2, Imagen, Codey, and Chirp. You can easily design and fine-tune your prompt and copy the code required to deploy the solution.

Leveraging a foundational model is a no-brainer because of the time, complexity, and computational requirements for training these language models (LLMs) from scratch. Deploying large language applications through APIs, even open-source ones, is easier because the size of these models makes them harder to deploy.

There are numerous large language models; both closed and open-source. LangChain has become a popular solution for building LLMs applications because it reduces complexity and makes switching from one model to another easy.

In this article, you will discover how to use the PaLM API with LangChain to build LLM applications.

Getting Started

You will learn how to use the PaLM API with two applications:

  • Chatting with YouTube videos
  • Chatting with PDFs

The first step is to install all the required libraries, including:

  • LangChain
  • google-cloud-aiplatform
  • pypdf for reading PDF files
  • Whisper for transcription
  • Pytube for downloading YouTube videos
pip install google-cloud-aiplatform  google-api-python-client pypdf  langchain pytube git+
Are you interested in digging deeper into building applications with Google's large language models? Check out the Generative AI learning path from Google. It contains numerous curated resources to get you started. You can also sign up for Google Cloud and get free credits for trying the foundational models at no cost.

You can follow along using this and this Kaggle notebook.

Chat With YouTube Video

Chatting with YouTube videos is done in the following steps:

  • Download the video using PyTube
  • Convert the video to audio
  • Transcribe the video using Whisper
  • Split the transcribed text into chunks because the LLMs have a maximum number of tokens they can accept
  • Create word embeddings for each chunk
  • Store the embeddings in a vector database
  • Create an embedding for the question
  • Compare the questions embedding's to the embeddings in the vector store
  • Return the top similar embeddings
  • Pass these embeddings instead of the entire text to the LLM
  • Get a response from the LLM

Let's look at the code implementation. First, import all the required packages:

import pandas as pd
# Utils
import time
from typing import List
from pydantic import BaseModel
from import aiplatform
from langchain.chat_models import ChatVertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.vectorstores import Chroma
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import Chroma
import whisper
from pytube import YouTube
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

Authenticate Your Google Cloud Account

Using the PaLM API requires that you authenticate your Google account:

import vertexai
PROJECT_ID = "PROJECT_ID"  # @param {type:"string"}
vertexai.init(project=PROJECT_ID, location="us-central1")
You might need to set up a Google Service account and give it the required permissions for this to work.

Transcribe Video

Next, use PyTube to download the video and Whisper to transcribe it. Save the result in a CSV file.

def transcribe(youtube_url, model):
    youtube = YouTube(youtube_url)

    audio = youtube.streams.filter(only_audio=True).first()

    with tempfile.TemporaryDirectory() as tmpdir:
        file =
        title = os.path.basename(file)[:-4]
        result = model.transcribe(file, fp16=False)

    return title, youtube_url, result["text"].strip()

transcriptions = []
model = whisper.load_model("base")

for youtube_url in YOUTUBE_VIDEOS:
    transcriptions.append(transcribe(youtube_url, model))

df = pd.DataFrame(transcriptions, columns=["title", "url", "text"])

Load Transcribe Text With LangChain

LangChain provides various data loaders. In this case, we are interested in the CSVLoader.

file = "text.csv"

loader = CSVLoader(file_path=file)

docs = loader.load()

Set Up Vertex AI Embeddings

The following is a utility function for using the Vertex AI Embedding with rate limiting. Check the Console Quoatas page for the allowed request per minute.

# Utility functions for Embeddings API with rate limiting
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    while True:
        before = time.time()
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")

class CustomVertexAIEmbeddings(VertexAIEmbeddings, BaseModel):
    requests_per_minute: int
    num_instances_per_batch: int

    # Overriding embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.requests_per_minute)
        results = []
        docs = list(texts)

        while docs:
            # Working in batches because the API accepts maximum 5
            # documents per request to get embeddings
            head, docs = (
                docs[: self.num_instances_per_batch],
                docs[self.num_instances_per_batch :],
            chunk = self.client.get_embeddings(head)

        return [r.values for r in results]
        # Embedding
embeddings = CustomVertexAIEmbeddings(

Store the Text Embeddings in a Vector Database

There are various open-source vector databases for storing word embeddings. Chromadb is a common choice among developers.

Next, set up a retriever to fetch documents and pass them to the LLM.

db = Chroma.from_documents(docs, embeddings)
retriever = db.as_retriever()

Answering Questions from the YouTube Video

To answer questions from the video, you need to set up the VertexAI LLM and the RetrievalQA from LangChain.

llm = VertexAI(

qa_stuff = RetrievalQA.from_chain_type(

query =  "WHow is Java mounting a come back?"

response =

"""Java is mounting a comeback because it is a well-balanced language that is performing enough for most things. It is relatively easy to use, and most importantly, has a huge ecosystem of stable libraries and frameworks."""

Chat With PDFs

The process of chatting with PDFs is very similar to the video application. The only difference is reading in the PDF with LangChain.

Load PDF With LangChain

LangChain provides various utilities for loading a PDF. Let's use the PyPDFLoader.

loader = PyPDFLoader("yourpdf.pdf")
documents = loader.load()

Split the Text Into Chunks

We used a very short video from the Fireship YouTube channel in the video example. However, in some cases, the text will be too long to fit the LLM's context. In such a case, you have to split it into chunks. LangChain provides tools for doing that.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

Save the Embeddings of the PDF Text

Save the text embeddings using Chromadb as done previously.

db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever()

Question Answering With the PDF Text

Next, use ChatVertexAI and start chatting with the PDF.

llm = ChatVertexAI()
qa_stuff = RetrievalQA.from_chain_type(

In this case, let's have the LLM summarize the contents of the PDF as a list.

from langchain.schema import (
# prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.

Here's the article you need to summarize.

Title: {article_title}


Now, provide a summarized version of the article in a bulleted list format.

# format prompt
prompt = template.format(article_title="Activation functions in JAX and Flax", article_text="texts")

# generate summary
summary = llm([HumanMessage(content=prompt)])

As you can see from the screenshot, the model summarized the PDF in a list format as instructed.

Final Thoughts

These two applications provide a glimpse into the world of building applications using Google's foundational models. What you build using them is only limited by your imagination. What will you build? Let me know in the comments below or by replying to this email.

Whenever you're ready, there is 2 way I can help you:

If you're looking to accelerate your career, I'd recommend starting with an affordable ebook:

Writing for Data Scientists: The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.

Data Science and Machine Learning Ebook: I offer numerous free and paid data science and machine learning ebooks to help you in your data science and machine learning career. Check them out.