People often wonder how context is provided to OpenAIâs models. Each model has a limit to how many tokens the prompt will remember (1 token = 4 characters of text), and there can be different formats for each of the models. These limitations apply regardless of whether you are using an OpenAI API endpoint, the ChatGPT UI, or the OpenAI Playground. Beyond the prompt context covered in this article, there are other methods of providing context such as Embeddings and Fine Tuning.
The purpose of this article is to give you an understanding of how context works, and will be helpful for your understanding of how you can best optimize your use of the models.
I am not sponsored by OpenAI and donât get paid for your use of it in any way. I donât recommend it or endorse itâs use, but this article will focus on OpenAI tools. I think most people who are here have probably already used the free version of ChatGPT. You do need to create an account, and if you use the paid version you will have access to more models and more features, including the API and Playground which are discussed below.
ChatGPT
Itâs helpful to understand that ChatGPT is simply a very lightweight interface that allows you to better manage your interactions with OpenAIâs models. It is intentionally (and successfully) AI for the masses. ChatGPT Plus is very similar except with prioritized access and access to additional models. Many other tools are being released which use a similar approach (e.g., Flux which allows you to visualize and explore differing responses; note an API key is required).
When you submit a prompt to ChatGPT it appears that the system is interacting one prompt at a time. However, this is not the case. What is actually happening behind the scenes is that the entire conversation is being passed with each prompt.
Well, not the entire conversation. What OpenAI doesnât tell you is that ChatGPT will only remember the most recent aspects of the conversation up to the limit of the model. If you go over that limit, the tool will keep working, but itâs a sliding window that will forget more and more of your early conversation as you keep going. I suspect most conversations wonât make it to this limit, and so most people wonât need the early context, but if youâve ever felt that a ChatGPT conversation had become senile itâs because of this limited ability to look back.
Hereâs an example to show how this happens:
Start by giving it a prompt that seeds every future conversation. I asked ChatGPT to provide a random fact with every response, regardless of what the prompt was.
I then asked a variety of unrelated questions about cars, and ChatGPT dutifully included something interesting like the following before answering my prompt:
Random Fact: The lifespan of a single taste bud is about 10 days.
Then, I started feeding in some information from external sources, such as articles and guides related to buying cars âSUV vs Minivan: Which is the Best for Families?â and âHatchback vs. Sedan: What Are the Differences?â
Everything was great, and it kept providing more interesting tidbits. For example, I learned that the shortest war in history lasted only 38 minutes between Britain and Zanzibar on August 27, 1896.
But it was clear that something went wrong when I provided an additional article that caused context to exceed the max tokens limit. Instead of responding to my prompt question about cars, ChatGPT ONLY provided a fun fact, and it didnât preface it with âRandom Fact:â like it did for all the previous responses. Future prompts only responded with standard prompt responses.
This happened because of the max token limitations which are present for each of the ChatGPT models. The ChatGPT UI does a cool thing by keeping each conversation separate, but once you get over those limits youâre out of luck! At the time of writing ChatGPT offered two text-davinci-002 models (Legacy and Default GPT 3.5, maxing out at 4,097 tokens) and one gpt-4 model (GPT 4, maxing out at 8,192 tokens).
There are some things you can do to optimize this. First, while ChatGPTâs UI is great for very basic interactions, all of the misfired questions and regenerated responses will âclogâ up your previous history. If you donât need all of that other history, you can start a new chat which is limited to only the topics you are interested in.
If you really want to make sure youâre getting the most of out context, you can cleanup things that you donât like (related: I suspect the thumbs down đ removes that response from future context). Youâll know youâve made it and maxed out what is possible when you receive a error that says your message is too long. Being just a few characters shy of the goal, because it means youâve included the most amount of context possible. The adage "garbage in, garbage outâ applies, so ideally the characters you provide are high value and not full of junk. The more care and attention you put into providing good context the better the output will be. The simplest way to do this is to provide as much context as is relevant, and you can further optimize this by providing context that mimics a few shot approach.
Another benefit of this âcontext packingâ is that you wonât burn your limited prompts on staging the conversation, because you can seed the entire context outside of the conversation in one go.
Now that weâve covered how context works, letâs talk about how it is provided using other approaches.
OpenAI API
Behind the scenes ChatGPT is basically formatting your prior conversations in a way that a computer can understand and passing them to the OpenAI API. Itâs not very exciting from a technical perspective, but it solves a big need which is to provide a simple interface that pretty much everyone can easily use, without having to know or understand what is actually happening. The good news is that the app is really simple, but the bad news is that responses are limited, and it does undesirable things like quietly truncate your message history if it becomes too long.
The API basically uses two formats for context, depending on whether those models are based on GPT-3 or GPT-4. I wonât go in depth on the API here (but may in a separate article if there is interest) but the short version is that GPT-3 wants the history and prompt in one go, whereas GPT-4 separates history into âSystemâ context and the prompt into âUserâ context.
However, you donât have to learn the API if you would like to use the more advanced features. For most people the paid approach with API access will be preferred over ChatGPT. Although the learning is only slightly more complex, you will have greater understanding and far better control over what is happening. The big reason for this is because it is easy to visualize the approaches and figure out which format is needed when using the Playground. I think it will also better prepare you for going further down the rabbit hole if you are someone who wants to go deeper.
OpenAI Playground
The OpenAI Playground is basically a supercharged version of ChatGPT. If your goal is only to ask simple questions and explore the very basics of the latest OpenAI models and you donât really care about going deeper, then spending the $20/month for ChatGPT will be a straightforward way to get things done.
However, if you want to learn more I would skip the simple UI and setup a pay-per-use account which will give you access to the Playground. Not only will that unlock additional models, but you will have more control over the responses and types of prompts you can provide. Yes you will need to learn a little more (and Iâll provide a more in-depth article if there is interest) but if youâre not tinkering with anything it is a pretty short learning curve. If you have lower volumes of interaction it will also be less expensive, but if youâre a power user costs can add up pretty quickly (and for that reason I would recommend setting a monthly maximum spend if you have a paid account). For example, you could spend nearly $2 per prompt with maxed out context and the latest/largest models! Thatâs awfully expensive for an API call and can add up fast even if you are limiting your interactions to the Playground.
Cost Considerations
As of April 2023 the cost of a maxed-out 8k context API/Playground call using a gpt-4 model is $0.24, plus the cost of the response. Costs are of course much lower with less context, but it takes only 84 fully contextualized gpt-4 API/Playground calls to exceed the monthly equivalent for the ChatGPT UI. However, with ChatGPT Plus there are throttling limits, and you donât have access to the larger models.
In short, your specific needs will dictate what the right approach is, but Iâve found both to be useful. As a general rule I am using the UI for very simple/brief conversations and larger contexts that donât need programmatic access, and the API & Playground for higher volumes/additional models/more curated needs. I am guessing that as more people figure this out OpenAI will impose additional limits on the UI (maybe donât share this article!), but in the meantime I think the pricing is low enough to justify the use of both.
OpenAI References
These same links are embedded within the conversation above, but I am providing them here as a quick reference: