Home LLMs Tools SaaS APIs

Claude 3.5 Sonnet input $3.00/1M ↓ -50%

GPT-4o input $2.50/1M

Gemini 1.5 Pro input $1.25/1M

Mistral Large input $2.00/1M ↓ -33%

DeepSeek V3 input $0.27/1M

synced 2026-04-05

Claude 3.5 Sonnet input $3.00/1M ↓ -50%

GPT-4o input $2.50/1M

Gemini 1.5 Pro input $1.25/1M

Mistral Large input $2.00/1M ↓ -33%

DeepSeek V3 input $0.27/1M

synced 2026-04-05

Glossary

50 terms. No jargon. Real numbers. Every explanation grounded in data from sourc.dev.

Token You already know what a token is. You just don't know you know.

Context window Think of it as the model's desk.

Input price Input price is what you pay to send text to a model.

Output price Output price is what you pay for the model's response.

What does "per 1M tokens" mean It is a pricing unit. Like kilowatt-hours for electricity, or per-kilometre for fuel. A way to compare costs at a standa…

Function calling You have used a phone menu. Press 1 for billing. Press 2 for support. Press 3 for account. The menu does not answer your…

Vision / image input Before vision, you described things to the model. Now you can show it.

Streaming You send a request. You wait. Nothing happens. Five seconds. Eight seconds. Twelve seconds. A full response appears at o…

MCP — Model Context Protocol For most of AI's recent history, every tool was an island.

Free tier A free tier is a usage allowance that costs nothing. Most major model providers offer one — a number of requests, tokens…

EU data residency Think of it as keeping your filing cabinet inside the building.

MMLU MMLU stands for Massive Multitask Language Understanding. It is a benchmark — a standardised test for language models — …

HumanEval HumanEval is a coding benchmark. 164 programming problems, each with a function signature, a docstring describing what t…

API Every time you check the weather on your phone, your phone did not go outside.

REST API Imagine a restaurant.

API key It is a password. A long one, impossible to guess, that identifies your application when it calls an API.

SDK SDK stands for Software Development Kit. It is a library — code someone else wrote — that wraps an API and makes it easi…

Rate limit A rate limit is a ceiling on how many requests you can make in a given time window.

LLM LLM stands for Large Language Model. The name tells you three things: it works with language, the model is large, and it…

Open weights When a model is released with open weights, the trained model itself is publicly available. You can download it, run it …

Hallucination A language model hallucinates when it generates text that is confident, fluent, and wrong.

RAG Think of it as the difference between an open-book exam and a closed-book exam.

Fine-tuning The difference between hiring someone already trained and training someone yourself.

Latency Receives your request → processes tokens → generates a response → sends it.

What does 70B mean When you see Llama 3.3 70B, the 70B is the parameter count. 70 billion parameters. It is not a version number. It is not…

Context caching Context caching stores previously sent prompt tokens on the provider side so they do not need to be reprocessed on subse…

Batch pricing Batch pricing is a discounted rate offered by model providers for API requests submitted in bulk with no latency guarant…

Price per request Price per request is the total cost of a single API call — input tokens plus output tokens combined. It is computed from…

Overage Overage is the additional cost incurred when API usage exceeds a plan's included quota. Most SaaS and API products inclu…

Max output tokens Max output tokens is the maximum number of tokens a model can generate in a single response. It is a hard limit — the mo…

Tool use Tool use — also called function calling — is the ability of a language model to invoke external functions during a conve…

System prompt A system prompt is a set of instructions sent to the model at the beginning of every conversation. It defines the model'…

Grounding Grounding is the practice of connecting a language model's responses to specific, verifiable source material. Instead of…

Prompt engineering Prompt engineering is the practice of structuring input text to get more accurate, useful, and consistent outputs from a…

Temperature Temperature is a parameter that controls the randomness of a language model's output. A temperature of 0 makes the model…

API endpoint An API endpoint is a specific URL that accepts requests and returns responses from a service. For language model APIs, t…

Webhook A webhook is a URL that receives an HTTP POST request when a specific event occurs. Instead of your application repeated…

Throughput Throughput is the number of tokens or requests a model API can process per unit of time. It is measured in tokens per se…

Agents Think of the difference between a calculator and a spreadsheet. A calculator does what you ask, once. A spreadsheet watc…

Multimodal A multimodal model processes more than one type of input — text, images, audio, or video. GPT-4 was text-only. GPT-4V ad…

Structured output Structured output means the model returns data in a predictable, machine-readable format — typically JSON — instead of f…

Reasoning models Reasoning models generate internal "thinking" tokens before producing their final answer. OpenAI's o1 (September 2024) w…

AI benchmarks AI benchmarks are standardised tests that measure model performance on defined tasks. MMLU tests general knowledge acros…

Model family A model family is a group of language models that share a base architecture but differ in size, capability, and price. C…

Quantisation Quantisation reduces the numerical precision of a model's weights — from 32-bit floating point to 8-bit, 4-bit, or even …

Batch API The batch API is a mode offered by model providers where you submit a file of requests and receive results within 24 hou…

Async vs sync Synchronous (sync) API calls block until the response is complete — your code waits. Asynchronous (async) calls return i…

Cost per query Cost per query is the total cost of a single end-to-end interaction with your AI application — including all model calls…

GDPR The General Data Protection Regulation is a European Union law that governs how personal data is collected, processed, a…

Benchmark gaming Benchmark gaming is the practice of optimising a model's training or evaluation process to achieve higher benchmark scor…