Building a Context-Aware LLM Chatbot with 🦜LangChain

6 min readJul 12, 2024

Source: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/

If you are a developer and you have been using or want to use any of the 🤖LLM APIs to create a chatbot for any project and you also want to dive a bit deeper into LangChain 🦜, then this article is for you.

Nowadays, developers using these LLM APIs like OpenAI’s one, are facing a problem that at first does not seem like a big deal but when your project scales and becomes larger it can mean a lot of money. The problem I am referring to is keeping the context of a conversation when using these chatbots.

The issue is that Large Language Models are ‘stateless’

When you use these APIs, you usually pass an array of messages from one or more of these types:

📖 System: Messages that are targeted to give some background or instructions to the LLM.
🧔‍♂️ User: Human input messages.
🤖 Assistant: Chatbot responses to the user messages.

The issue is that Large Language Models are ‘stateless’, that is, each transaction is independent. The common way to deal with this, is to give the chatbot some memory by providing the full conversation as ‘context’.

This array of messages usually starts either empty or with a system message as an introduction to the conversation. Once the conversation keeps going on, you start adding the convo messages into this array that you send to the API so that the chatbot knows how to continue. The longer the conversation, the more that the array grows, implying in a lot of token costs.

As I said, at first, it may not be a problem for you because you either have few users in your platform, or the conversations are not that long (yet) or you just accept the costs.

Here is where 🦜LangChain comes into play. LangChain for those who don’t know is an open-source library that was created by the AI community after seeing the same pain points across all developers when using these LLMs. So they decided to create a set of helpers to simplify repetitive tasks and wrap it up into a library.

The 4 Ways to Give “Memory” to the LLM

1. Buffer Memory

The idea with the Buffer memory is pretty simple. It stores the conversation into a variable and inserts it inside your prompt every time you call the model. This is a very straightforward solution and pretty much what us developers usually do to give context to the model.

import { OpenAI } from "@langchain/openai";
import { BufferMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";

const model = new OpenAI({});
const memory = new BufferMemory();
const chain = new ConversationChain({ llm: model, memory: memory });
const res1 = await chain.call({ input: "Hi! I'm Jim." });
const res2 = await chain.call({ input: "What's my name?" });
console.log({ res2 }); // {response: ' You said your name is Jim. Is there anything else you would like to talk about?'}

Drawback: High token cost when the conversation gets longer.

2. Buffer Window Memory

This approach is similar to the previous one (BufferMemory), but this time the difference is that it only stores the last k interactions of the conversation. So, for example, if you just want to provide as context the last 5 messages of the conversation, then you will set k=5 like shown below:

import { OpenAI } from "@langchain/openai";
import { BufferWindowMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";

const model = new OpenAI({});
const memory = new BufferWindowMemory({ k: 5 });
const chain = new ConversationChain({ llm: model, memory: memory });
const res1 = await chain.call({ input: "Hi! I'm Jim." });

Drawback: It might be the case where it is too little or too much context the k interactions you define.

3. Token Buffer Memory

Very similar to option number 2 but the main difference is that instead of specifying the number of interactions to store, you specify the maximum limit of tokens to store as context. In this way you can specify for example 100 tokens like below:

import { OpenAI } from "@langchain/openai";
import { ConversationTokenBufferMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";

const model = new OpenAI({});
const memory = new ConversationTokenBufferMemory({ maxTokenLimit: 100});
const chain = new ConversationChain({ llm: model, memory: memory });
const res1 = await chain.call({ input: "Hi! I'm Jim." });

Drawback: The same as option 2, you can either end up setting a very large context or a not large enough context.

4. Summary Memory

This last solution I consider it to be the most practical one if the conversations in your platform are going to be large or you expect them to grow a lot. It is based on using the model to summarize your conversations so then you insert the summary into the prompt. In this way you are giving the “full” context to the LLM in a summarized way and you reduce token costs.

Here is how it works:

import { OpenAI } from "@langchain/openai";
import { ConversationSummaryMemory } from "langchain/memory";
import { LLMChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";

export const run = async () => {
  const memory = new ConversationSummaryMemory({
    memoryKey: "chat_history",
    llm: new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 }),
  });

  const model = new OpenAI({ temperature: 0.9 });
  const prompt =
    PromptTemplate.fromTemplate(`The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

  Current conversation:
  {chat_history}
  Human: {input}
  AI:`);
  const chain = new LLMChain({ llm: model, prompt, memory });

  const res1 = await chain.invoke({ input: "Hi! I'm Jim." });
  console.log({ res1, memory: await memory.loadMemoryVariables({}) });
  /*
  {
    res1: {
      text: " Hi Jim, I'm AI! It's nice to meet you. I'm an AI programmed to provide information about the environment around me. Do you have any specific questions about the area that I can answer for you?"
    },
    memory: {
      chat_history: 'Jim introduces himself to the AI and the AI responds, introducing itself as a program designed to provide information about the environment. The AI offers to answer any specific questions Jim may have about the area.'
    }
  }
  */

  const res2 = await chain.invoke({ input: "What's my name?" });
  console.log({ res2, memory: await memory.loadMemoryVariables({}) });
  /*
  {
    res2: { text: ' You told me your name is Jim.' },
    memory: {
      chat_history: 'Jim introduces himself to the AI and the AI responds, introducing itself as a program designed to provide information about the environment. The AI offers to answer any specific questions Jim may have about the area. Jim asks the AI what his name is, and the AI responds that Jim had previously told it his name.'
    }
  }
  */
};

Drawback: On every interaction, underneath it will mean 2: one for summarizing the conversation preparing the context that will be passed around in the second prompt. Also, summarizing the conversation can mean that certain important specific details get lost.

As I wrap up this article, it’s important to remember that each example we’ve talked about has its own limitations. How well a chatbot keeps track of conversations and manages costs really depends on the specific situation it’s used in. What might work great for one project might not be the best for another. So, when you’re planning your own project, think carefully about what you really need. That way, you can make smart choices that fit your goals and set your chatbot up for success.

Sources

🦜 LangChain Package: https://www.npmjs.com/package/langchain
DeepLearning LangChain Course: https://learn.deeplearning.ai/courses/langchain/lesson/1/introduction