July 2022 – Cloud Exams and RAGs to Riches: Elevating AI with External Data

Token limits in ChatGPT models

Depending on the model, the token limits on the model will vary. As of Feb 2024, the token limit for the family of GPT-4 models ranges from 8,192 to 128,000 tokens. This means the sum of prompt and completion tokens for an API call cannot exceed 32,768 tokens for the GPT-4-32K model. If the prompt is 30,000 tokens, the response cannot be more than 2,768 tokens. The GPT4-Turbo 128K is the most recent model as of Feb 2024, with 128,000 tokens, which is close to 300 pages of text in a single prompt and completion. This is a massive context prompt compared to its predecessor models.

Though this can be a technical limitation, there are creative ways to address the problem of limitation, such as using chunking and condensing your prompts. We discussed chunking strategies in Chapter 4, which can help you address token limitations.

The following figure shows various models and token limits:

Model Token Limit

GPT-3.5-turbo	4,096
GPT-3.5-turbo-16k	16,384
GPT-3.5-turbo-0613	4,096
GPT-3.5-turbo-16k-0613	16,384
GPT-4	8,192
GPT-4-0613	32,768
GPT-4-32K	32,768
GPT-4-32-0613	32,768
GPT-4-Turbo 128K	128,000

Figure 5.4 – Models and associated Token Limits

For the latest updates on model limits for newer versions of models, please check the OpenAI website.

Tokens and cost considerations

The cost of using ChatGPT or similar models via an API is often tied to the number of tokens processed, encompassing both the input prompts and the model’s generated responses.

In terms of pricing, providers typically have a per-token charge, leading to a direct correlation between conversation length and cost; the more tokens processed, the higher the cost. The latest cost updates can be found on the OpenAI website.

From an optimization perspective, understanding this cost-token relationship can guide more efficient API usage. For instance, creating more succinct prompts and configuring the model for brief yet effective responses can help control token count and, consequently, manage expenses.

We hope you now have a good understanding of the key components of a prompt. Now, you are ready to learn about prompt engineering. In the next section, we will explore the details of prompt engineering and effective strategies, enabling you to maximize the potential of your prompt contents through the one-shot and few-shot learning approaches.

Archives July 2022

Token limits in ChatGPT models – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI