What is a context window?

The simple definition

A context window is the maximum number of tokens a language model can hold in working memory during a single API call. Think of it as a desk. Everything you want the model to consider — your prompt, any system instructions, conversation history, attached documents, and the model's own response — must fit on that desk. When the desk is full, nothing else can be added. Anything that does not fit is either rejected or silently dropped.

Both input and output tokens count toward the limit. If a model has a 128,000 token context window and you send 120,000 tokens of input, the model can only generate up to 8,000 tokens in response. The context window is shared space.

Why it matters

Context window size determines what you can accomplish in a single API call. A small context window forces you to break work into pieces — chunking documents, summarising earlier conversation turns, or using retrieval systems to fetch relevant excerpts. A large context window lets you process entire codebases, full legal contracts, or hour-long meeting transcripts in one pass without any intermediate steps.

For developers building applications, the context window is a hard engineering constraint. A chatbot that maintains conversation history will eventually hit the limit. A document analysis pipeline must check whether each document fits. A code review tool must know whether the full repository context can be included. The number matters every day.

The context window expansion

In four years, context windows grew 244x. GPT-3 launched in June 2020 with a 4,096 token context — enough for roughly three pages of text. By February 2024, Gemini 1.5 Pro offered 1,000,000 tokens — enough for approximately ten full novels. The breakthrough moment came in July 2023 when Anthropic released Claude 2 with a 100,000 token context, proving that production-quality models could handle book-length inputs.

Model Date Context (tokens)
GPT-3Jun 20204,096
GPT-3.5 TurboMar 202316,384
GPT-4Mar 20238,192
Claude 2Jul 2023100,000
GPT-4 TurboNov 2023128,000
Gemini 1.5 ProFeb 20241,000,000
Claude 3Mar 2024200,000
Context window timeline — tokens by model Horizontal bar chart showing context window growth from GPT-3 at 4,096 tokens to Gemini 1.5 Pro at 1,000,000 tokens, a 244x increase over four years. GPT-3 Claude 2 GPT-4 Turbo Claude 3 Gemini 1.5 Pro 4,096 ≈ 3 pages 100,000 128,000 200,000 1,000,000 ≈ 10 novels

Practical token guide

Tokens are not words. As a rough guide, 1,000 tokens is approximately 750 words in English. Here is how common content types translate to token counts:

Content type Approximate tokens
Short email~200
10-page PDF~8,000
1-hour meeting transcript~10,000
100-page report~75,000
Novel (80,000 words)~106,000
Gemini 1.5 Pro (1M tokens)~10 novels simultaneously

The attention quality caveat

A larger context window does not automatically mean better results. Research has consistently demonstrated a phenomenon called "lost in the middle" — models tend to attend well to content at the beginning and end of long inputs but pay less attention to content buried in the middle. This means that stuffing a 200,000 token context window with every piece of information you have may actually produce worse results than carefully selecting the most relevant 50,000 tokens.

The practical implication: context window size tells you the maximum capacity, not the optimal operating range. Treat it as a ceiling, not a target. For many tasks, retrieval-augmented generation (fetching only the relevant chunks) outperforms simply dumping everything into the context. The quality of what goes in matters as much as the quantity the model can accept.

How sourc.dev tracks it

sourc.dev tracks the context window size in tokens for every model in the LLM directory. The value is verified against the provider's official documentation and recorded with a source URL and verification date. When a provider updates a model's context window — as has happened multiple times with GPT-4 variants and Claude models — the old value is preserved in the history table and the new value is recorded alongside it. Nothing is overwritten. The timeline shows exactly when each change happened and where the data came from.

Frequently asked questions

What happens when you exceed the context window?

The API will either return an error or silently truncate the oldest tokens in the conversation. The exact behaviour depends on the provider. OpenAI returns a 400 error with a message about maximum context length. Anthropic returns a similar error. Some wrapper libraries handle truncation automatically, but the underlying model cannot process more tokens than its context window allows. If you are building an application, you need to manage token counts proactively — do not wait for the error.

Does a larger context window mean better quality?

Not necessarily. A larger context window means the model can accept more input, but the quality of attention across that input varies. Research has shown that many models exhibit a "lost in the middle" effect — they attend well to content at the beginning and end but pay less attention to content in the middle. sourc.dev tracks context window size as a capability metric, not a quality indicator.

How are context windows measured?

Context windows are measured in tokens. A token is a chunk of text — roughly 3-4 characters in English, or about 0.75 words. Different models use different tokenisers, so the same text may produce different token counts across models. Most providers offer tokeniser tools or libraries so you can count tokens before sending a request.

What is the difference between context window and memory?

A context window is the model's working memory for a single API call — everything must fit within the token limit. Memory refers to persistent storage across conversations. Standard LLM APIs are stateless: each call is independent, and the model has no memory of previous calls. Some providers offer memory features as additional layers on top of the base API, but the context window itself resets with each call.

Which model has the largest context window?

As of early 2025, Google's Gemini 1.5 Pro holds the record at 1,000,000 tokens — approximately 750,000 words or ten full novels. Anthropic's Claude 3 models support 200,000 tokens. OpenAI's GPT-4 Turbo and GPT-4o support 128,000 tokens. The trend is clearly upward — context windows have grown 244x in four years.

Can I extend a model's context window?

You cannot extend the context window of a hosted API model — the limit is set by the provider. However, there are strategies to work within the limit: chunking (splitting documents and processing separately), summarisation (condensing earlier conversation turns), and retrieval-augmented generation (RAG), which stores content externally and retrieves relevant sections on demand. For open-weight models, researchers have explored positional encoding extensions, but these require retraining and may degrade quality.

See all LLMs tracked by context window →