Web Development · Backend
API Rate Limiting: Token Bucket, Sliding Window, and Redis Patterns
Every public API needs rate limiting, but the algorithm you choose shapes the user experience and the failure modes. Here's how each approach works and when to use it.
Anurag Verma
8 min read
Sponsored
An API that returns HTTP 500 under heavy load is broken. An API that returns HTTP 429 with a clear Retry-After header is working correctly. Rate limiting is the difference between those two behaviors. It protects your infrastructure, keeps costs predictable, and gives clients a recoverable error rather than a silent failure.
The interesting question isn’t whether to rate limit, but which algorithm fits the behavior you want.
The Four Algorithms
Fixed Window
Count requests per user per time window. Reset the counter at the window boundary.
Window: 60 seconds
Limit: 100 requests
User makes request at t=0: counter=1
User makes request at t=59: counter=100 (limit reached)
Window resets at t=60: counter=0
User makes 100 more requests at t=60–61: all allowed
The implementation is simple, but the edge case is painful: a user can send 200 requests in 2 seconds by straddling the window boundary (100 at t=59, 100 at t=61). This can spike load on your backend twice the expected rate.
async function fixedWindowAllow(userId: string, limit: number, windowSec: number): Promise<boolean> {
const now = Math.floor(Date.now() / 1000);
const windowStart = Math.floor(now / windowSec) * windowSec;
const key = `rl:${userId}:${windowStart}`;
const count = await redis.incr(key);
if (count === 1) {
await redis.expire(key, windowSec * 2); // TTL double the window
}
return count <= limit;
}
Use this when simplicity matters more than precision and the burst problem is acceptable. Internal tooling, low-frequency APIs, simple dashboards.
Sliding Window
Track requests over the last N seconds at any point in time, not just since the last window boundary. This eliminates the boundary-burst problem.
A practical implementation uses sorted sets in Redis: store each request as a scored entry (score = timestamp) and count entries newer than now - windowSec.
async function slidingWindowAllow(userId: string, limit: number, windowSec: number): Promise<boolean> {
const now = Date.now();
const windowStart = now - windowSec * 1000;
const key = `rl:${userId}`;
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // add current request
pipeline.zcard(key); // count remaining
pipeline.expire(key, windowSec + 1);
const results = await pipeline.exec();
const count = results[2][1] as number;
return count <= limit;
}
The sorted set approach has O(log n) writes, which is fine for typical limits (100-1000 requests/window). For very high-traffic endpoints, the memory footprint grows with requests per window.
There’s also a hybrid approach. The “sliding window log” stores timestamps, and the “sliding window counter” uses two fixed windows to approximate a sliding window with O(1) operations. The counter variant trades exact accuracy for speed:
// Approximate sliding window using two counters
async function slidingWindowCounterAllow(
userId: string, limit: number, windowSec: number
): Promise<boolean> {
const now = Math.floor(Date.now() / 1000);
const currentWindow = Math.floor(now / windowSec) * windowSec;
const prevWindow = currentWindow - windowSec;
const elapsed = (now % windowSec) / windowSec; // fraction into current window
const prevKey = `rl:${userId}:${prevWindow}`;
const currentKey = `rl:${userId}:${currentWindow}`;
const [prevCount, currentCount] = await redis.mget(prevKey, currentKey);
const prev = parseInt(prevCount ?? '0');
const current = parseInt(currentCount ?? '0');
// Weighted estimate of requests in the sliding window
const estimated = prev * (1 - elapsed) + current;
if (estimated >= limit) return false;
await redis.pipeline()
.incr(currentKey)
.expire(currentKey, windowSec * 2)
.exec();
return true;
}
Token Bucket
The bucket holds tokens up to a maximum capacity. Each request consumes one token. Tokens refill at a constant rate. When the bucket is empty, requests are rejected until tokens refill.
This is the best model for users who do occasional bursts followed by idle periods. The bucket fills up during idle time, and they can spend stored tokens on a burst when needed.
async function tokenBucketAllow(
userId: string,
maxTokens: number,
refillRatePerSec: number
): Promise<boolean> {
const key = `tb:${userId}`;
const now = Date.now() / 1000;
const data = await redis.hgetall(key);
let tokens = parseFloat(data.tokens ?? String(maxTokens));
const lastRefill = parseFloat(data.lastRefill ?? String(now));
// Add tokens earned since last request
const elapsed = now - lastRefill;
tokens = Math.min(maxTokens, tokens + elapsed * refillRatePerSec);
if (tokens < 1) {
// Persist updated state even on rejection
await redis.hset(key, { tokens: tokens.toFixed(4), lastRefill: now });
await redis.expire(key, Math.ceil(maxTokens / refillRatePerSec) + 60);
return false;
}
tokens -= 1;
await redis.hset(key, { tokens: tokens.toFixed(4), lastRefill: now });
await redis.expire(key, Math.ceil(maxTokens / refillRatePerSec) + 60);
return true;
}
Token bucket is the algorithm behind most well-known rate limiters (AWS API Gateway, GitHub API, Stripe). The X-RateLimit-Remaining and X-RateLimit-Reset headers naturally map to token count and refill time.
Leaky Bucket
Requests enter a queue and are processed at a fixed rate. If the queue is full, excess requests are dropped. This smooths out bursts. No spikes reach your backend. It’s useful for protecting a resource that can only handle a constant throughput.
Leaky bucket is less common for API rate limiting because it adds latency (queuing) rather than just rejecting. It’s more appropriate for background job processing or webhook delivery where you need to throttle a downstream service.
Response Headers
Rate limit headers give clients the information they need to retry intelligently. Return them on every response, not just on 429:
function setRateLimitHeaders(res: Response, info: {
limit: number;
remaining: number;
reset: number; // Unix timestamp when the window resets
}) {
res.setHeader('RateLimit-Limit', info.limit);
res.setHeader('RateLimit-Remaining', Math.max(0, info.remaining));
res.setHeader('RateLimit-Reset', info.reset);
res.setHeader('RateLimit-Policy', `${info.limit};w=60`); // IETF format
}
// On 429:
res.setHeader('Retry-After', secondsUntilReset);
res.status(429).json({
error: 'rate_limit_exceeded',
message: `Too many requests. Try again in ${secondsUntilReset} seconds.`,
});
The Retry-After header on 429 is the most important. Well-behaved clients use it to back off correctly. Without it, clients typically retry immediately, making the problem worse.
Where to Apply Rate Limits
Rate limits should live as close to the edge as possible:
- CDN / proxy layer (Cloudflare, Nginx): cheapest to enforce, can shed load before your app server is touched. Good for IP-based limiting.
- API gateway: per-API-key limits, no code changes required. Good for products with a public API.
- Middleware in your app: most flexible, can enforce per-user business rules.
Most production systems use all three layers:
Request
→ Cloudflare (IP-based: 1000 req/min, block known bot IPs)
→ API gateway (per-API-key: tiered limits based on plan)
→ App middleware (per-user: resource-specific limits)
Tiered Limits by Plan
SaaS products typically have different limits for different plans. Storing the limit in the user record keeps the logic clean:
const PLAN_LIMITS: Record<string, { limit: number; windowSec: number }> = {
free: { limit: 100, windowSec: 3600 }, // 100/hour
starter: { limit: 1000, windowSec: 3600 }, // 1000/hour
pro: { limit: 10000, windowSec: 3600 }, // 10000/hour
enterprise: { limit: 100000, windowSec: 3600 }, // 100000/hour
};
async function checkRateLimit(req: Request, res: Response): Promise<boolean> {
const user = req.user;
const planConfig = PLAN_LIMITS[user.plan] ?? PLAN_LIMITS.free;
const allowed = await slidingWindowAllow(user.id, planConfig.limit, planConfig.windowSec);
if (!allowed) {
res.status(429).json({ error: 'rate_limit_exceeded', plan: user.plan });
return false;
}
return true;
}
Testing Rate Limiters
Rate limiters are easy to test incorrectly. A test that mocks Redis and just checks counter increments isn’t testing whether the algorithm actually rejects at the right threshold.
Test with real Redis (or a Redis-compatible in-memory test double like ioredis-mock):
describe('sliding window rate limiter', () => {
beforeEach(() => redis.flushdb());
it('allows requests under the limit', async () => {
for (let i = 0; i < 10; i++) {
expect(await slidingWindowAllow('user1', 10, 60)).toBe(true);
}
});
it('blocks the 11th request', async () => {
for (let i = 0; i < 10; i++) {
await slidingWindowAllow('user1', 10, 60);
}
expect(await slidingWindowAllow('user1', 10, 60)).toBe(false);
});
it('allows again after the window expires', async () => {
// use jest fake timers or manipulate Redis timestamps
for (let i = 0; i < 10; i++) {
await slidingWindowAllow('user1', 10, 1); // 1-second window
}
await new Promise(r => setTimeout(r, 1100)); // wait for window
expect(await slidingWindowAllow('user1', 10, 1)).toBe(true);
});
});
Choosing an Algorithm
| Situation | Algorithm |
|---|---|
| Simple internal API, low risk of abuse | Fixed window |
| Public API with burst-sensitive clients | Sliding window |
| API where user experience matters (allow natural bursts) | Token bucket |
| Background worker or webhook delivery throttle | Leaky bucket |
For most web APIs, sliding window is the right default: it’s smooth, has no boundary artifacts, and the Redis sorted set implementation is straightforward. Token bucket is worth adding when you want to reward users who aren’t constantly active. Their stored tokens let them send a batch of requests without hitting the limit.
Whatever algorithm you pick, add the response headers. They cost nothing to implement and turn a frustrating 429 into a recoverable situation for your API clients.
Sponsored
More from this category
More from Web Development
CSS Anchor Positioning: Tooltips and Popovers Without JavaScript
gRPC in 2026: When to Use It Instead of REST or GraphQL
k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems
Sponsored
The dispatch
Working notes from
the studio.
A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored