How does ChatGPT "Remember" Context?

Level up by understanding how context relates to memory

Apr 16, 2023

People often wonder how context is provided to OpenAI’s models. Each model has a limit to how many tokens the prompt will remember (1 token = 4 characters of text), and there can be different formats for each of the models. These limitations apply regardless of whether you are using an OpenAI API endpoint, the ChatGPT UI, or the OpenAI Playground. Beyond the prompt context covered in this article, there are other methods of providing context such as Embeddings and Fine Tuning.

The purpose of this article is to give you an understanding of how context works, and will be helpful for your understanding of how you can best optimize your use of the models.

I am not sponsored by OpenAI and don’t get paid for your use of it in any way. I don’t recommend it or endorse it’s use, but this article will focus on OpenAI tools. I think most people who are here have probably already used the free version of ChatGPT. You do need to create an account, and if you use the paid version you will have access to more models and more features, including the API and Playground which are discussed below.

ChatGPT

It’s helpful to understand that ChatGPT is simply a very lightweight interface that allows you to better manage your interactions with OpenAI’s models. It is intentionally (and successfully) AI for the masses. ChatGPT Plus is very similar except with prioritized access and access to additional models. Many other tools are being released which use a similar approach (e.g., Flux which allows you to visualize and explore differing responses; note an API key is required).

When you submit a prompt to ChatGPT it appears that the system is interacting one prompt at a time. However, this is not the case. What is actually happening behind the scenes is that the entire conversation is being passed with each prompt.

Well, not the entire conversation. What OpenAI doesn’t tell you is that ChatGPT will only remember the most recent aspects of the conversation up to the limit of the model. If you go over that limit, the tool will keep working, but it’s a sliding window that will forget more and more of your early conversation as you keep going. I suspect most conversations won’t make it to this limit, and so most people won’t need the early context, but if you’ve ever felt that a ChatGPT conversation had become senile it’s because of this limited ability to look back.

Here’s an example to show how this happens:

Start by giving it a prompt that seeds every future conversation. I asked ChatGPT to provide a random fact with every response, regardless of what the prompt was.
I then asked a variety of unrelated questions about cars, and ChatGPT dutifully included something interesting like the following before answering my prompt:
Random Fact: The lifespan of a single taste bud is about 10 days.
Then, I started feeding in some information from external sources, such as articles and guides related to buying cars “SUV vs Minivan: Which is the Best for Families?“ and “Hatchback vs. Sedan: What Are the Differences?“
Everything was great, and it kept providing more interesting tidbits. For example, I learned that the shortest war in history lasted only 38 minutes between Britain and Zanzibar on August 27, 1896.
But it was clear that something went wrong when I provided an additional article that caused context to exceed the max tokens limit. Instead of responding to my prompt question about cars, ChatGPT ONLY provided a fun fact, and it didn’t preface it with “Random Fact:” like it did for all the previous responses. Future prompts only responded with standard prompt responses.

This happened because of the max token limitations which are present for each of the ChatGPT models. The ChatGPT UI does a cool thing by keeping each conversation separate, but once you get over those limits you’re out of luck! At the time of writing ChatGPT offered two text-davinci-002 models (Legacy and Default GPT 3.5, maxing out at 4,097 tokens) and one gpt-4 model (GPT 4, maxing out at 8,192 tokens).

There are some things you can do to optimize this. First, while ChatGPT’s UI is great for very basic interactions, all of the misfired questions and regenerated responses will ‘clog’ up your previous history. If you don’t need all of that other history, you can start a new chat which is limited to only the topics you are interested in.

If you really want to make sure you’re getting the most of out context, you can cleanup things that you don’t like (related: I suspect the thumbs down 👎 removes that response from future context). You’ll know you’ve made it and maxed out what is possible when you receive a error that says your message is too long. Being just a few characters shy of the goal, because it means you’ve included the most amount of context possible. The adage "garbage in, garbage out” applies, so ideally the characters you provide are high value and not full of junk. The more care and attention you put into providing good context the better the output will be. The simplest way to do this is to provide as much context as is relevant, and you can further optimize this by providing context that mimics a few shot approach.

Another benefit of this “context packing” is that you won’t burn your limited prompts on staging the conversation, because you can seed the entire context outside of the conversation in one go.

Now that we’ve covered how context works, let’s talk about how it is provided using other approaches.

OpenAI API

Behind the scenes ChatGPT is basically formatting your prior conversations in a way that a computer can understand and passing them to the OpenAI API. It’s not very exciting from a technical perspective, but it solves a big need which is to provide a simple interface that pretty much everyone can easily use, without having to know or understand what is actually happening. The good news is that the app is really simple, but the bad news is that responses are limited, and it does undesirable things like quietly truncate your message history if it becomes too long.

The API basically uses two formats for context, depending on whether those models are based on GPT-3 or GPT-4. I won’t go in depth on the API here (but may in a separate article if there is interest) but the short version is that GPT-3 wants the history and prompt in one go, whereas GPT-4 separates history into “System” context and the prompt into “User” context.

However, you don’t have to learn the API if you would like to use the more advanced features. For most people the paid approach with API access will be preferred over ChatGPT. Although the learning is only slightly more complex, you will have greater understanding and far better control over what is happening. The big reason for this is because it is easy to visualize the approaches and figure out which format is needed when using the Playground. I think it will also better prepare you for going further down the rabbit hole if you are someone who wants to go deeper.

OpenAI Playground

The OpenAI Playground is basically a supercharged version of ChatGPT. If your goal is only to ask simple questions and explore the very basics of the latest OpenAI models and you don’t really care about going deeper, then spending the $20/month for ChatGPT will be a straightforward way to get things done.

However, if you want to learn more I would skip the simple UI and setup a pay-per-use account which will give you access to the Playground. Not only will that unlock additional models, but you will have more control over the responses and types of prompts you can provide. Yes you will need to learn a little more (and I’ll provide a more in-depth article if there is interest) but if you’re not tinkering with anything it is a pretty short learning curve. If you have lower volumes of interaction it will also be less expensive, but if you’re a power user costs can add up pretty quickly (and for that reason I would recommend setting a monthly maximum spend if you have a paid account). For example, you could spend nearly $2 per prompt with maxed out context and the latest/largest models! That’s awfully expensive for an API call and can add up fast even if you are limiting your interactions to the Playground.

Cost Considerations

As of April 2023 the cost of a maxed-out 8k context API/Playground call using a gpt-4 model is $0.24, plus the cost of the response. Costs are of course much lower with less context, but it takes only 84 fully contextualized gpt-4 API/Playground calls to exceed the monthly equivalent for the ChatGPT UI. However, with ChatGPT Plus there are throttling limits, and you don’t have access to the larger models.

In short, your specific needs will dictate what the right approach is, but I’ve found both to be useful. As a general rule I am using the UI for very simple/brief conversations and larger contexts that don’t need programmatic access, and the API & Playground for higher volumes/additional models/more curated needs. I am guessing that as more people figure this out OpenAI will impose additional limits on the UI (maybe don’t share this article!), but in the meantime I think the pricing is low enough to justify the use of both.

OpenAI References

These same links are embedded within the conversation above, but I am providing them here as a quick reference:

ChatGPT UI

API Endpoints and Models

Playground

The Brain Wave 🧠🌊

Discussion about this post