What is a context window?
The simple definition
A context window is the maximum number of tokens a language model can hold in working memory during a single API call. Think of it as a desk. Everything you want the model to consider — your prompt, any system instructions, conversation history, attached documents, and the model's own response — must fit on that desk. When the desk is full, nothing else can be added. Anything that does not fit is either rejected or silently dropped.
Both input and output tokens count toward the limit. If a model has a 128,000 token context window and you send 120,000 tokens of input, the model can only generate up to 8,000 tokens in response. The context window is shared space.
Why it matters
Context window size determines what you can accomplish in a single API call. A small context window forces you to break work into pieces — chunking documents, summarising earlier conversation turns, or using retrieval systems to fetch relevant excerpts. A large context window lets you process entire codebases, full legal contracts, or hour-long meeting transcripts in one pass without any intermediate steps.
For developers building applications, the context window is a hard engineering constraint. A chatbot that maintains conversation history will eventually hit the limit. A document analysis pipeline must check whether each document fits. A code review tool must know whether the full repository context can be included. The number matters every day.
The context window expansion
In four years, context windows grew 244x. GPT-3 launched in June 2020 with a 4,096 token context — enough for roughly three pages of text. By February 2024, Gemini 1.5 Pro offered 1,000,000 tokens — enough for approximately ten full novels. The breakthrough moment came in July 2023 when Anthropic released Claude 2 with a 100,000 token context, proving that production-quality models could handle book-length inputs.
| Model | Date | Context (tokens) |
|---|---|---|
| GPT-3 | Jun 2020 | 4,096 |
| GPT-3.5 Turbo | Mar 2023 | 16,384 |
| GPT-4 | Mar 2023 | 8,192 |
| Claude 2 | Jul 2023 | 100,000 |
| GPT-4 Turbo | Nov 2023 | 128,000 |
| Gemini 1.5 Pro | Feb 2024 | 1,000,000 |
| Claude 3 | Mar 2024 | 200,000 |
Practical token guide
Tokens are not words. As a rough guide, 1,000 tokens is approximately 750 words in English. Here is how common content types translate to token counts:
| Content type | Approximate tokens |
|---|---|
| Short email | ~200 |
| 10-page PDF | ~8,000 |
| 1-hour meeting transcript | ~10,000 |
| 100-page report | ~75,000 |
| Novel (80,000 words) | ~106,000 |
| Gemini 1.5 Pro (1M tokens) | ~10 novels simultaneously |
The attention quality caveat
A larger context window does not automatically mean better results. Research has consistently demonstrated a phenomenon called "lost in the middle" — models tend to attend well to content at the beginning and end of long inputs but pay less attention to content buried in the middle. This means that stuffing a 200,000 token context window with every piece of information you have may actually produce worse results than carefully selecting the most relevant 50,000 tokens.
The practical implication: context window size tells you the maximum capacity, not the optimal operating range. Treat it as a ceiling, not a target. For many tasks, retrieval-augmented generation (fetching only the relevant chunks) outperforms simply dumping everything into the context. The quality of what goes in matters as much as the quantity the model can accept.
How sourc.dev tracks it
sourc.dev tracks the context window size in tokens for every model in the LLM directory. The value is verified against the provider's official documentation and recorded with a source URL and verification date. When a provider updates a model's context window — as has happened multiple times with GPT-4 variants and Claude models — the old value is preserved in the history table and the new value is recorded alongside it. Nothing is overwritten. The timeline shows exactly when each change happened and where the data came from.
Frequently asked questions
What happens when you exceed the context window?
The API will either return an error or silently truncate the oldest tokens in the conversation. The exact behaviour depends on the provider. OpenAI returns a 400 error with a message about maximum context length. Anthropic returns a similar error. Some wrapper libraries handle truncation automatically, but the underlying model cannot process more tokens than its context window allows. If you are building an application, you need to manage token counts proactively — do not wait for the error.
Does a larger context window mean better quality?
Not necessarily. A larger context window means the model can accept more input, but the quality of attention across that input varies. Research has shown that many models exhibit a "lost in the middle" effect — they attend well to content at the beginning and end but pay less attention to content in the middle. sourc.dev tracks context window size as a capability metric, not a quality indicator.
How are context windows measured?
Context windows are measured in tokens. A token is a chunk of text — roughly 3-4 characters in English, or about 0.75 words. Different models use different tokenisers, so the same text may produce different token counts across models. Most providers offer tokeniser tools or libraries so you can count tokens before sending a request.
What is the difference between context window and memory?
A context window is the model's working memory for a single API call — everything must fit within the token limit. Memory refers to persistent storage across conversations. Standard LLM APIs are stateless: each call is independent, and the model has no memory of previous calls. Some providers offer memory features as additional layers on top of the base API, but the context window itself resets with each call.
Which model has the largest context window?
As of early 2025, Google's Gemini 1.5 Pro holds the record at 1,000,000 tokens — approximately 750,000 words or ten full novels. Anthropic's Claude 3 models support 200,000 tokens. OpenAI's GPT-4 Turbo and GPT-4o support 128,000 tokens. The trend is clearly upward — context windows have grown 244x in four years.
Can I extend a model's context window?
You cannot extend the context window of a hosted API model — the limit is set by the provider. However, there are strategies to work within the limit: chunking (splitting documents and processing separately), summarisation (condensing earlier conversation turns), and retrieval-augmented generation (RAG), which stores content externally and retrieves relevant sections on demand. For open-weight models, researchers have explored positional encoding extensions, but these require retraining and may degrade quality.