#2 of 50

Context window

It determines what your product can actually do — and what it cannot

What is a context window

Think of it as the model's desk.

Everything on the desk, the model can see and work with. Your instructions, the conversation history, the document you uploaded, the code you pasted — if it is on the desk, the model uses it. The moment the desk is full, something has to come off before anything new can go on.

That is a context window. The maximum amount of text a model can hold in a single session — input and output combined.

The number that makes it real

Claude 3.5 Sonnet has a 200,000 token context window — roughly 150,000 words, or two full novels, in a single request. GPT-4o has 128,000. Gemini 1.5 Pro has 1,000,000.

In 2020, the largest context window available was 2,048 tokens. About three pages of text. That number has grown 244× in five years. The desk got dramatically bigger. Verified March 2026.

Why this matters to you

Context window size is not a spec sheet detail. It is the ceiling of what you can build.

Small context window: you summarise documents before sending them. You split long conversations. Your product forgets things. You build workarounds for limitations the model imposes.

Large context window: you send the whole document. The model holds the full conversation. You let it reason across an entire codebase at once. The workarounds disappear.

The difference between a chatbot that forgets what you said four messages ago and one that holds everything — that is a context window decision.

How to use this

Bigger is not always better. Performance can soften at extreme context lengths and you pay for every token either way. A 200,000 token window does not mean you should fill it. Precise context — only what the model needs — outperforms stuffed context almost every time.

Think of it as the desk again. A cluttered desk slows you down too.

1 million token context window. That is approximately 10 novels. Unfortunately it still cannot remember where you left your keys.

Verified March 2026 · Source: Anthropic, OpenAI, Google model documentation

← All terms

← Token Input price →