What is input price per million tokens?

The simple definition

Input price per million tokens is the cost charged by an API provider for every one million tokens you send to the model. This is the cost of asking the question. Every character of your prompt, system instructions, conversation history, and attached documents is tokenised and counted. The provider multiplies your total input token count by the published rate and charges accordingly.

The unit "per million tokens" became the industry standard because individual tokens cost fractions of a cent. Expressing the price per million makes the numbers human-readable: $5.00 per million tokens is easier to reason about than $0.000005 per token.

Why it matters

For applications that process large volumes of text — document analysis, classification, search indexing, customer support automation — input costs are often the dominant API expense. When you send a 10,000 token document to a model and receive a 500 token summary, 95% of your tokens (and a large share of your cost) are on the input side. At scale, a 10x difference in input price between two models with comparable quality can mean the difference between a viable product and an unprofitable one.

Input pricing also affects architecture decisions. Expensive input tokens incentivise shorter prompts, aggressive caching, and retrieval-augmented generation (sending only relevant chunks rather than full documents). Cheap input tokens make it viable to include more context, longer system prompts, and richer conversation histories. The price shapes the engineering.

The price collapse

LLM input prices have undergone one of the fastest price collapses in technology history. GPT-3 launched in June 2020 at $60.00 per million input tokens. By December 2024, DeepSeek V3 offered comparable general-purpose capability at $0.27 per million tokens — a 97% reduction in under five years. The deflation was not gradual. It came in sharp steps, each driven by a new model generation or a new competitor entering the market.

Model Date Input price / 1M tokens
GPT-3Jun 2020$60.00
GPT-3.5 TurboMar 2023$2.00
GPT-4Mar 2023$30.00
Claude 2Jul 2023$8.00
GPT-4 TurboNov 2023$10.00
GPT-4oMay 2024$5.00
GPT-4o miniJul 2024$0.15
DeepSeek V3Dec 2024$0.27
Input price per million tokens — price collapse timeline Bar chart showing LLM input pricing from GPT-3 at $60.00 in June 2020 to DeepSeek V3 at $0.27 in December 2024, a 97% reduction over five years. Log scale. $100 $10 $1 $0.10 $60 GPT-3 $2 3.5T $30 GPT-4 $8 Claude 2 $10 4 Turbo $5 4o $0.15 4o mini $0.27 DS V3 97% reduction in 5 years

Cost calculator example

To make pricing concrete, here is a worked example. Imagine an application making 10,000 API calls per day, with an average of 500 input tokens and 200 output tokens per call. That produces 5 million input tokens and 2 million output tokens daily.

Model Daily cost Monthly cost
GPT-4o ($5.00 in / $15.00 out) $55.00 ~$1,650
GPT-4o mini ($0.15 in / $0.60 out) $1.95 ~$60
DeepSeek V3 ($0.27 in / $1.10 out) $3.55 ~$105

Same workload, same call volume, same token counts — but the monthly bill ranges from $60 to $1,650 depending on which model you choose. This is why input pricing is one of the most important attributes in the LLM directory.

Input vs output — why the difference

Output tokens are typically 3-5x more expensive than input tokens. The reason is computational: processing input tokens can be parallelised efficiently — the model reads all input tokens at once. Generating output tokens is sequential — each new token depends on all previous tokens, requiring a separate forward pass through the model for every token produced. This sequential generation is the bottleneck, and the higher output price reflects the higher compute cost per token.

For cost optimisation, this asymmetry matters. Applications that send long prompts but receive short responses (classification, extraction, scoring) are input-heavy and benefit most from low input prices. Applications that generate long-form content (writing, code generation, analysis) are output-heavy and should pay close attention to the output price. Most real workloads are a mix of both.

How sourc.dev tracks it

sourc.dev checks the pricing page of every provider in the LLM directory daily. When a price changes, we record the new value, the date, and the source URL. The old value is preserved in the history table — we never overwrite. This means you can see the complete pricing timeline for any model: when it launched, when it was discounted, when a cheaper successor replaced it. The timeline is the asset.

Frequently asked questions

What is the difference between input price and output price?

Input price is what you pay to send text to the model — your prompt, system instructions, conversation history, and any documents. Output price is what you pay for the text the model generates in response. Output tokens are typically 3-5x more expensive than input tokens because generating text requires sequential computation that cannot be parallelised.

How do I calculate my monthly API cost?

Estimate your daily call volume, average input tokens per call, and average output tokens per call. Multiply to get daily token volumes for input and output separately. Divide each by one million and multiply by the respective price per million tokens. Add the two together for your daily cost, then multiply by 30 for monthly. Most providers offer usage dashboards that show actual consumption.

Why do prices vary so much between providers?

Price reflects model size, infrastructure costs, provider margin, and competitive strategy. Open-weight models like DeepSeek and Llama can be self-hosted, which puts downward pressure on hosted pricing. Frontier models command premium prices because they offer capabilities that smaller models cannot match. The market is intensely competitive and prices shift frequently.

What caused the 97% price reduction?

Three forces drove the collapse. First, hardware efficiency — newer GPU generations deliver more inference throughput per dollar. Second, model architecture improvements — techniques like grouped-query attention, mixture-of-experts, and quantisation reduce compute per token. Third, competition — open-weight models from Meta, Mistral, and DeepSeek forced proprietary providers to cut prices aggressively.

Are there hidden costs beyond token pricing?

Yes. Token pricing is the direct API cost, but total cost of ownership includes: fine-tuning costs, embedding costs for RAG pipelines, storage for conversation logs, engineering time to manage rate limits and retries, and evaluation and testing costs. Some providers charge differently for features like function calling, image inputs, or batch processing. Always check the full pricing page.

Will LLM prices continue to fall?

Historical data strongly suggests continued price deflation for equivalent capability. Every generation of hardware and model architecture delivers more performance per dollar. However, frontier models will likely continue to launch at premium prices before being undercut by the next generation. The pattern resembles semiconductor pricing: last year's cutting edge becomes this year's commodity.

See all LLMs ranked by input price →