#18 of 50

Rate limit

Hit this in production at 2am and you will never forget it exists — check it before you launch

What is a rate limit

A rate limit is a ceiling on how many requests you can make in a given time window.

Typical formats: requests per minute, tokens per minute, requests per day. Hit the ceiling and the API returns an error — usually a 429 status code — instead of a response. Your application gets nothing. Your user gets nothing.

The limit exists because shared infrastructure needs to be fair. If one application could consume unlimited requests, it would affect every other application on the same infrastructure. Rate limits are the mechanism that prevents that.

The numbers that make it real

On free tiers, rate limits are tight. A typical free tier might allow 3–5 requests per minute. On paid tiers, limits scale with your plan — commonly 500–3,500 requests per minute depending on the model and provider. Enterprise tiers can be significantly higher.

Current rate limits for all models tracked on sourc.dev are available on each entity page. They change. Check before you plan production traffic. Verified March 2026.

Why this matters to you

Rate limits are easy to ignore during development. Development traffic is low. You send a request, get a response, move on. Nothing triggers.

Production traffic is different. A product with 1,000 active users sending 10 requests each in the same minute is 10,000 requests per minute. If your rate limit is 500, 95% of those requests fail. Your users get errors. You get support tickets. At 2am on a Saturday.

How to plan for this

1. Before you build: look up the rate limit for your intended model and tier. Calculate your expected peak requests per minute at your target user volume.

2. If the limit is below your expected peak: either upgrade to a higher tier, implement request queuing in your application, or evaluate whether a different model with higher limits fits your use case.

3. Build retry logic with exponential backoff. When a rate limit error arrives, wait, then try again — with increasing delays between attempts. Most SDKs have this built in.

4. Monitor your rate limit usage in production. Most providers surface this in their dashboards. Watch it before it becomes a problem.

Verified March 2026 · Source: Anthropic, OpenAI rate limit documentation

Related terms

API key SDK Free tier Latency

← All terms

← SDK LLM →