Context window
Maximum tokens the model can hold in context — input + output combined
What is context window?
The context window is the maximum number of tokens an LLM can process in a single call — input and output combined. Think of it as the model's desk: everything it can see at once. Gemini 1.5 Pro has a 1,000,000 token context window — roughly 750,000 words, or about 1,500 pages. GPT-4o has 128,000 tokens. The difference determines whether you can process an entire codebase in one call or need to chunk it into pieces.
Why it matters
If you need to process a 50-page legal contract in one call, you need roughly 40,000 tokens of context. A model with 8,192 tokens cannot do it — you would need to chunk and summarise, losing information at each step. A model with 128,000+ tokens handles it in one pass. The context window is the hard constraint that determines your application architecture. sourc.dev tracks context_window_tokens across all models — the leaderboard at /leaderboards/largest-context-window shows who leads.
Where models stand
Data available for 34 of 271 tracked entities.
How sourc.dev tracks this
sourc.dev tracks context window through its automated monitoring pipeline. Data is collected on a regular schedule, compared against previous values, and any changes are recorded in the history table with full provenance — source URL, effective date, and verification timestamp. Nothing is overwritten. The pipeline ensures this attribute stays current without manual intervention.
The API will either return an error or silently truncate the oldest tokens. The exact behaviour depends on the provider. OpenAI returns a 400 error. Anthropic returns a similar error. If you are building an application, manage token counts proactively.
Not necessarily. Research has shown a "lost in the middle" effect — models attend well to the beginning and end of long inputs but less to content in the middle. A 200K context window may not utilise all tokens with equal quality.
In tokens. A token is roughly 3-4 characters in English, or about 0.75 words. Different models use different tokenisers, so the same text may produce different token counts. Most providers offer tokeniser libraries for pre-counting.