sourc.dev
Home LLMs Tools SaaS APIs
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05

Glossary

50 terms. No jargon. Real numbers. Every explanation grounded in data from sourc.dev.

Token You already know what a token is. You just don't know you know.
5 related
Context window Think of it as the model's desk.
4 related
Input price Input price is what you pay to send text to a model.
4 related
Output price Output price is what you pay for the model's response.
4 related
What does "per 1M tokens" mean It is a pricing unit. Like kilowatt-hours for electricity, or per-kilometre for fuel. A way to compare costs at a standa…
4 related
Function calling You have used a phone menu. Press 1 for billing. Press 2 for support. Press 3 for account. The menu does not answer your…
4 related
Vision / image input Before vision, you described things to the model. Now you can show it.
4 related
Streaming You send a request. You wait. Nothing happens. Five seconds. Eight seconds. Twelve seconds. A full response appears at o…
3 related
MCP — Model Context Protocol For most of AI's recent history, every tool was an island.
4 related
Free tier A free tier is a usage allowance that costs nothing. Most major model providers offer one — a number of requests, tokens…
3 related
EU data residency Think of it as keeping your filing cabinet inside the building.
3 related
MMLU MMLU stands for Massive Multitask Language Understanding. It is a benchmark — a standardised test for language models — …
2 related
HumanEval HumanEval is a coding benchmark. 164 programming problems, each with a function signature, a docstring describing what t…
3 related
API Every time you check the weather on your phone, your phone did not go outside.
4 related
REST API Imagine a restaurant.
4 related
API key It is a password. A long one, impossible to guess, that identifies your application when it calls an API.
3 related
SDK SDK stands for Software Development Kit. It is a library — code someone else wrote — that wraps an API and makes it easi…
3 related
Rate limit A rate limit is a ceiling on how many requests you can make in a given time window.
4 related
LLM LLM stands for Large Language Model. The name tells you three things: it works with language, the model is large, and it…
4 related
Open weights When a model is released with open weights, the trained model itself is publicly available. You can download it, run it …
4 related
Hallucination A language model hallucinates when it generates text that is confident, fluent, and wrong.
4 related
RAG Think of it as the difference between an open-book exam and a closed-book exam.
3 related
Fine-tuning The difference between hiring someone already trained and training someone yourself.
4 related
Latency Receives your request → processes tokens → generates a response → sends it.
3 related
What does 70B mean When you see Llama 3.3 70B, the 70B is the parameter count. 70 billion parameters. It is not a version number. It is not…
4 related
Context caching Context caching stores previously sent prompt tokens on the provider side so they do not need to be reprocessed on subse…
3 related
Batch pricing Batch pricing is a discounted rate offered by model providers for API requests submitted in bulk with no latency guarant…
3 related
Price per request Price per request is the total cost of a single API call — input tokens plus output tokens combined. It is computed from…
3 related
Overage Overage is the additional cost incurred when API usage exceeds a plan's included quota. Most SaaS and API products inclu…
2 related
Max output tokens Max output tokens is the maximum number of tokens a model can generate in a single response. It is a hard limit — the mo…
3 related
Tool use Tool use — also called function calling — is the ability of a language model to invoke external functions during a conve…
3 related
System prompt A system prompt is a set of instructions sent to the model at the beginning of every conversation. It defines the model'…
3 related
Grounding Grounding is the practice of connecting a language model's responses to specific, verifiable source material. Instead of…
3 related
Prompt engineering Prompt engineering is the practice of structuring input text to get more accurate, useful, and consistent outputs from a…
3 related
Temperature Temperature is a parameter that controls the randomness of a language model's output. A temperature of 0 makes the model…
2 related
API endpoint An API endpoint is a specific URL that accepts requests and returns responses from a service. For language model APIs, t…
3 related
Webhook A webhook is a URL that receives an HTTP POST request when a specific event occurs. Instead of your application repeated…
3 related
Throughput Throughput is the number of tokens or requests a model API can process per unit of time. It is measured in tokens per se…
3 related
Agents Think of the difference between a calculator and a spreadsheet. A calculator does what you ask, once. A spreadsheet watc…
4 related
Multimodal A multimodal model processes more than one type of input — text, images, audio, or video. GPT-4 was text-only. GPT-4V ad…
3 related
Structured output Structured output means the model returns data in a predictable, machine-readable format — typically JSON — instead of f…
3 related
Reasoning models Reasoning models generate internal "thinking" tokens before producing their final answer. OpenAI's o1 (September 2024) w…
3 related
AI benchmarks AI benchmarks are standardised tests that measure model performance on defined tasks. MMLU tests general knowledge acros…
3 related
Model family A model family is a group of language models that share a base architecture but differ in size, capability, and price. C…
3 related
Quantisation Quantisation reduces the numerical precision of a model's weights — from 32-bit floating point to 8-bit, 4-bit, or even …
3 related
Batch API The batch API is a mode offered by model providers where you submit a file of requests and receive results within 24 hou…
3 related
Async vs sync Synchronous (sync) API calls block until the response is complete — your code waits. Asynchronous (async) calls return i…
3 related
Cost per query Cost per query is the total cost of a single end-to-end interaction with your AI application — including all model calls…
4 related
GDPR The General Data Protection Regulation is a European Union law that governs how personal data is collected, processed, a…
2 related
Benchmark gaming Benchmark gaming is the practice of optimising a model's training or evaluation process to achieve higher benchmark scor…
3 related