sourc.dev
Home LLMs Tools SaaS APIs
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
Learn

Output price (per 1M tokens)

Cost in USD to receive 1 million output tokens

What is output price (per 1m tokens)?

Output price is what you pay per million tokens the model generates — its response to you. Output is always more expensive than input because generation is sequential (each token depends on the last) while reading input is parallel. A typical model charges 3–5× more for output than input. If your application generates long responses — code, articles, analysis — output cost dominates your bill. Controlling output length is often the single most effective cost optimisation.

Why it matters

A chain-of-thought response might generate 2,000 output tokens. At $15/1M (Claude 3 Opus output), that single response costs $0.03. At $0.60/1M (GPT-4o-mini output), it costs $0.0012. If your agent makes 8 calls per query, the output cost difference per query is $0.23 vs $0.0096. At 10,000 queries per day, that is $2,300/day vs $96/day. Same capability. Different model choice. sourc.dev tracks output_price_per_1m so you can make this comparison with verified numbers.

Where models stand

#1
#2
75 USD
#3
o1
60 USD
#4
60 USD

Data available for 34 of 271 tracked entities.

How sourc.dev tracks this

sourc.dev tracks output price (per 1m tokens) through its automated monitoring pipeline. Data is collected on a regular schedule, compared against previous values, and any changes are recorded in the history table with full provenance — source URL, effective date, and verification timestamp. Nothing is overwritten. The pipeline ensures this attribute stays current without manual intervention.

Related
Frequently asked questions
FAQ Why is output more expensive than input?

Output generation is computationally heavier. Reading input is parallel — the model processes all tokens at once. Generating output is sequential — each token depends on the previous one. This sequential computation costs more GPU time.

FAQ How can I reduce output costs?

Limit max_tokens in your API calls, use concise system prompts that encourage brief responses, and consider whether a smaller model can handle the task. For structured output, JSON mode often produces shorter responses than free-form text.

FAQ Do thinking tokens count toward output cost?

For models with chain-of-thought or "thinking" modes, the thinking tokens typically count as output tokens and are billed at the output rate. This can significantly increase costs for reasoning-heavy tasks.