What is API rate limiting?

Rate limiting caps how many requests a client can make in a time window. It protects your API from abuse and overload, ensures fair access across customers, and lets you differentiate paid tiers by offering higher limits.

What does the token bucket algorithm do?

A token bucket refills tokens at a steady rate and spends one token per request. It permits short bursts up to the bucket size, then throttles to the refill rate. This is why the tool separates a sustained rate from a burst allowance.

Which response headers should a rate-limited API send?

Send RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset on every response so clients can self-throttle. On a 429, also send Retry-After telling the client how many seconds to wait before retrying.

What status code should I return when a limit is exceeded?

Return HTTP 429 Too Many Requests with a JSON error body and a Retry-After header. 429 is the standardized code clients and SDKs recognize, so well-behaved clients will back off automatically.

How should clients handle being rate limited?

Clients should read RateLimit-Remaining and slow down before hitting zero. On a 429 they should honor Retry-After and use exponential backoff with jitter to avoid all clients retrying at the same instant.

API Rate Limit Policy Builder

Make your limits predictable, not mysterious

The fastest way to frustrate developers is to throttle their requests without telling them the rules. A clear rate limit policy turns an opaque 429 into something a client can plan around. This builder generates a complete policy section for your API docs: the algorithm, per-tier limits, the response headers you send, a concrete 429 example, and the backoff behavior you expect from clients.

How it works

You pick a limiting algorithm and a scope. With a token-bucket or leaky-bucket model, the tool distinguishes a sustained rate (the steady refill — e.g. 600 requests per minute) from a burst allowance (a short spike the bucket absorbs before throttling kicks in). Scope decides what the limit is counted against: an API key, a client IP, or an authenticated user.

For each tier you define the limit, window, and burst. The policy then documents the RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers sent on every response, plus an optional Retry-After header on 429s, and renders a realistic 429 response with a JSON error body.

Tips and example

Always surface remaining quota in headers so well-built clients never hit the limit — they can slow down on their own.
Separate sustained and burst rates. Allowing a small burst makes the API feel responsive without letting clients sustain an abusive rate.
Document the backoff you expect: honor Retry-After, then exponential backoff with jitter. Without jitter, every throttled client retries in lockstep and re-overloads you.
Tie tiers to plans (Free, Pro, Enterprise) so higher limits become a tangible reason to upgrade.

API Rate Limit Policy Builder

Email me this result

Make your limits predictable, not mysterious

How it works

Tips and example