Question 1

What happens when you exceed the context window?

Accepted Answer

The API will either return an error or silently truncate the oldest tokens in the conversation. The exact behaviour depends on the provider. OpenAI returns a 400 error with a message about maximum context length. Anthropic returns a similar error. Some wrapper libraries handle truncation automatically, but the underlying model cannot process more tokens than its context window allows. If you are building an application, you need to manage token counts proactively — do not wait for the error.

Question 2

Does a larger context window mean better quality?

Accepted Answer

Not necessarily. A larger context window means the model can accept more input, but the quality of attention across that input varies. Research has shown that many models exhibit a 'lost in the middle' effect — they attend well to content at the beginning and end of long inputs but pay less attention to content in the middle. A model with a 200,000 token context window may not utilise all 200,000 tokens with equal quality. sourc.dev tracks context window as a capability metric, not a quality indicator.

Question 3

How are context windows measured?

Accepted Answer

Context windows are measured in tokens. A token is a chunk of text — roughly 3-4 characters in English, or about 0.75 words. Different models use different tokenisers, so the same text may produce different token counts across models. OpenAI uses tiktoken, Anthropic uses their own tokeniser, and open-weight models typically use SentencePiece or similar. Most providers offer tokeniser tools or libraries so you can count tokens before sending a request.

Question 4

What is the difference between context window and memory?

Accepted Answer

A context window is the model's working memory for a single API call — everything must fit within the token limit. Memory, in contrast, refers to persistent storage across conversations. Standard LLM APIs are stateless: each call is independent, and the model has no memory of previous calls. Some providers offer memory features (conversation history, retrieval-augmented generation) as additional layers on top of the base API, but the context window itself resets with each call.

Question 5

Which model has the largest context window?

Accepted Answer

As of early 2025, Google's Gemini 1.5 Pro holds the record at 1,000,000 tokens (approximately 750,000 words or 10 novels). Anthropic's Claude 3 models support 200,000 tokens. OpenAI's GPT-4 Turbo and GPT-4o support 128,000 tokens. The trend is clearly upward — context windows have grown 244x in four years, from GPT-3's 4,096 tokens in 2020 to Gemini's 1,000,000 in 2024.

Question 6

Can I extend a model's context window?

Accepted Answer

You cannot extend the context window of a hosted API model — the limit is set by the provider. However, there are strategies to work within the limit: chunking (splitting documents and processing them separately), summarisation (condensing earlier conversation turns), and retrieval-augmented generation (RAG), which stores content externally and retrieves relevant sections on demand. For open-weight models, researchers have explored techniques like positional encoding extensions, but these require retraining or fine-tuning and may degrade quality.

Model	Date	Context (tokens)
GPT-3	Jun 2020	4,096
GPT-3.5 Turbo	Mar 2023	16,384
GPT-4	Mar 2023	8,192
Claude 2	Jul 2023	100,000
GPT-4 Turbo	Nov 2023	128,000
Gemini 1.5 Pro	Feb 2024	1,000,000
Claude 3	Mar 2024	200,000

Content type	Approximate tokens
Short email	~200
10-page PDF	~8,000
1-hour meeting transcript	~10,000
100-page report	~75,000
Novel (80,000 words)	~106,000
Gemini 1.5 Pro (1M tokens)	~10 novels simultaneously

What is a context window?

The simple definition

Why it matters

The context window expansion

Practical token guide

The attention quality caveat

How sourc.dev tracks it

Frequently asked questions