Backend & Data

Rate limits and exponential backoff

2026-06-24 · 5 min read

Every serious API has a bouncer. Ask for too much too fast and it says "slow down" — usually with a 429 Too Many Requests. What you do next is the difference between a system that stays healthy and one that digs its own hole.

Why limits exist

Rate limits protect a service from being overwhelmed and keep things fair between clients. They're not an insult — they're the API telling you its capacity. The mistake is treating a 429 as "try again immediately," because a wall of instant retries is exactly what a struggling server can least afford. That's how a small blip snowballs into an outage.

Back off — exponentially

Each retry waits roughly double the last. The gaps grow, giving the service room to recover.

Exponential backoff means each retry waits about twice as long as the one before: 1 second, then 2, then 4, then 8. The gaps grow quickly, so a client that's being turned away naturally eases off instead of piling on.

Two refinements make it production-grade:

Add jitter. If a thousand clients all back off on the exact same schedule, they'll retry in synchronised waves — a "thundering herd." Sprinkle in some randomness so everyone's retries spread out.
Respect Retry-After. Many APIs tell you exactly how long to wait in a response header. If they do, honour it — it beats guessing.

Rule of thumb

Got a 429? Don't retry immediately. Wait, then double the wait each time, add a little randomness — and stop after a sensible number of tries.

The instinct under pressure is to push harder. With rate limits, the reliable move is the opposite: back off, spread out, and let the system breathe.

APIsRate limitsBackoffReliability

← Back to the blog