API Rate Limiting: Token Bucket, Sliding Window, and Redis Patterns

An API that returns HTTP 500 under heavy load is broken. An API that returns HTTP 429 with a clear Retry-After header is working correctly. Rate limiting is the difference between those two behaviors. It protects your infrastructure, keeps costs predictable, and gives clients a recoverable error rather than a silent failure.

The interesting question isn’t whether to rate limit, but which algorithm fits the behavior you want.

The Four Algorithms

Fixed Window

Count requests per user per time window. Reset the counter at the window boundary.

Window: 60 seconds
Limit: 100 requests

User makes request at t=0: counter=1
User makes request at t=59: counter=100 (limit reached)
Window resets at t=60: counter=0
User makes 100 more requests at t=60–61: all allowed

The implementation is simple, but the edge case is painful: a user can send 200 requests in 2 seconds by straddling the window boundary (100 at t=59, 100 at t=61). This can spike load on your backend twice the expected rate.

async function fixedWindowAllow(userId: string, limit: number, windowSec: number): Promise<boolean> {
  const now = Math.floor(Date.now() / 1000);
  const windowStart = Math.floor(now / windowSec) * windowSec;
  const key = `rl:${userId}:${windowStart}`;

  const count = await redis.incr(key);
  if (count === 1) {
    await redis.expire(key, windowSec * 2); // TTL double the window
  }
  return count <= limit;
}

Use this when simplicity matters more than precision and the burst problem is acceptable. Internal tooling, low-frequency APIs, simple dashboards.

Sliding Window

Track requests over the last N seconds at any point in time, not just since the last window boundary. This eliminates the boundary-burst problem.

A practical implementation uses sorted sets in Redis: store each request as a scored entry (score = timestamp) and count entries newer than now - windowSec.

async function slidingWindowAllow(userId: string, limit: number, windowSec: number): Promise<boolean> {
  const now = Date.now();
  const windowStart = now - windowSec * 1000;
  const key = `rl:${userId}`;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);      // remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`); // add current request
  pipeline.zcard(key);                                  // count remaining
  pipeline.expire(key, windowSec + 1);

  const results = await pipeline.exec();
  const count = results[2][1] as number;
  return count <= limit;
}

The sorted set approach has O(log n) writes, which is fine for typical limits (100-1000 requests/window). For very high-traffic endpoints, the memory footprint grows with requests per window.

There’s also a hybrid approach. The “sliding window log” stores timestamps, and the “sliding window counter” uses two fixed windows to approximate a sliding window with O(1) operations. The counter variant trades exact accuracy for speed:

// Approximate sliding window using two counters
async function slidingWindowCounterAllow(
  userId: string, limit: number, windowSec: number
): Promise<boolean> {
  const now = Math.floor(Date.now() / 1000);
  const currentWindow = Math.floor(now / windowSec) * windowSec;
  const prevWindow = currentWindow - windowSec;
  const elapsed = (now % windowSec) / windowSec; // fraction into current window

  const prevKey = `rl:${userId}:${prevWindow}`;
  const currentKey = `rl:${userId}:${currentWindow}`;

  const [prevCount, currentCount] = await redis.mget(prevKey, currentKey);
  const prev = parseInt(prevCount ?? '0');
  const current = parseInt(currentCount ?? '0');

  // Weighted estimate of requests in the sliding window
  const estimated = prev * (1 - elapsed) + current;

  if (estimated >= limit) return false;

  await redis.pipeline()
    .incr(currentKey)
    .expire(currentKey, windowSec * 2)
    .exec();

  return true;
}

Token Bucket

The bucket holds tokens up to a maximum capacity. Each request consumes one token. Tokens refill at a constant rate. When the bucket is empty, requests are rejected until tokens refill.

This is the best model for users who do occasional bursts followed by idle periods. The bucket fills up during idle time, and they can spend stored tokens on a burst when needed.

async function tokenBucketAllow(
  userId: string,
  maxTokens: number,
  refillRatePerSec: number
): Promise<boolean> {
  const key = `tb:${userId}`;
  const now = Date.now() / 1000;

  const data = await redis.hgetall(key);
  let tokens = parseFloat(data.tokens ?? String(maxTokens));
  const lastRefill = parseFloat(data.lastRefill ?? String(now));

  // Add tokens earned since last request
  const elapsed = now - lastRefill;
  tokens = Math.min(maxTokens, tokens + elapsed * refillRatePerSec);

  if (tokens < 1) {
    // Persist updated state even on rejection
    await redis.hset(key, { tokens: tokens.toFixed(4), lastRefill: now });
    await redis.expire(key, Math.ceil(maxTokens / refillRatePerSec) + 60);
    return false;
  }

  tokens -= 1;
  await redis.hset(key, { tokens: tokens.toFixed(4), lastRefill: now });
  await redis.expire(key, Math.ceil(maxTokens / refillRatePerSec) + 60);
  return true;
}

Token bucket is the algorithm behind most well-known rate limiters (AWS API Gateway, GitHub API, Stripe). The X-RateLimit-Remaining and X-RateLimit-Reset headers naturally map to token count and refill time.

Leaky Bucket

Requests enter a queue and are processed at a fixed rate. If the queue is full, excess requests are dropped. This smooths out bursts. No spikes reach your backend. It’s useful for protecting a resource that can only handle a constant throughput.

Leaky bucket is less common for API rate limiting because it adds latency (queuing) rather than just rejecting. It’s more appropriate for background job processing or webhook delivery where you need to throttle a downstream service.

Response Headers

Rate limit headers give clients the information they need to retry intelligently. Return them on every response, not just on 429:

function setRateLimitHeaders(res: Response, info: {
  limit: number;
  remaining: number;
  reset: number; // Unix timestamp when the window resets
}) {
  res.setHeader('RateLimit-Limit', info.limit);
  res.setHeader('RateLimit-Remaining', Math.max(0, info.remaining));
  res.setHeader('RateLimit-Reset', info.reset);
  res.setHeader('RateLimit-Policy', `${info.limit};w=60`); // IETF format
}

// On 429:
res.setHeader('Retry-After', secondsUntilReset);
res.status(429).json({
  error: 'rate_limit_exceeded',
  message: `Too many requests. Try again in ${secondsUntilReset} seconds.`,
});

The Retry-After header on 429 is the most important. Well-behaved clients use it to back off correctly. Without it, clients typically retry immediately, making the problem worse.

Where to Apply Rate Limits

Rate limits should live as close to the edge as possible:

CDN / proxy layer (Cloudflare, Nginx): cheapest to enforce, can shed load before your app server is touched. Good for IP-based limiting.
API gateway: per-API-key limits, no code changes required. Good for products with a public API.
Middleware in your app: most flexible, can enforce per-user business rules.

Most production systems use all three layers:

Request
  → Cloudflare (IP-based: 1000 req/min, block known bot IPs)
  → API gateway (per-API-key: tiered limits based on plan)
  → App middleware (per-user: resource-specific limits)

Tiered Limits by Plan

SaaS products typically have different limits for different plans. Storing the limit in the user record keeps the logic clean:

const PLAN_LIMITS: Record<string, { limit: number; windowSec: number }> = {
  free:       { limit: 100,   windowSec: 3600 },  // 100/hour
  starter:    { limit: 1000,  windowSec: 3600 },  // 1000/hour
  pro:        { limit: 10000, windowSec: 3600 },  // 10000/hour
  enterprise: { limit: 100000, windowSec: 3600 }, // 100000/hour
};

async function checkRateLimit(req: Request, res: Response): Promise<boolean> {
  const user = req.user;
  const planConfig = PLAN_LIMITS[user.plan] ?? PLAN_LIMITS.free;
  const allowed = await slidingWindowAllow(user.id, planConfig.limit, planConfig.windowSec);

  if (!allowed) {
    res.status(429).json({ error: 'rate_limit_exceeded', plan: user.plan });
    return false;
  }
  return true;
}

Testing Rate Limiters

Rate limiters are easy to test incorrectly. A test that mocks Redis and just checks counter increments isn’t testing whether the algorithm actually rejects at the right threshold.

Test with real Redis (or a Redis-compatible in-memory test double like ioredis-mock):

describe('sliding window rate limiter', () => {
  beforeEach(() => redis.flushdb());

  it('allows requests under the limit', async () => {
    for (let i = 0; i < 10; i++) {
      expect(await slidingWindowAllow('user1', 10, 60)).toBe(true);
    }
  });

  it('blocks the 11th request', async () => {
    for (let i = 0; i < 10; i++) {
      await slidingWindowAllow('user1', 10, 60);
    }
    expect(await slidingWindowAllow('user1', 10, 60)).toBe(false);
  });

  it('allows again after the window expires', async () => {
    // use jest fake timers or manipulate Redis timestamps
    for (let i = 0; i < 10; i++) {
      await slidingWindowAllow('user1', 10, 1); // 1-second window
    }
    await new Promise(r => setTimeout(r, 1100)); // wait for window
    expect(await slidingWindowAllow('user1', 10, 1)).toBe(true);
  });
});

Choosing an Algorithm

Situation	Algorithm
Simple internal API, low risk of abuse	Fixed window
Public API with burst-sensitive clients	Sliding window
API where user experience matters (allow natural bursts)	Token bucket
Background worker or webhook delivery throttle	Leaky bucket

For most web APIs, sliding window is the right default: it’s smooth, has no boundary artifacts, and the Redis sorted set implementation is straightforward. Token bucket is worth adding when you want to reward users who aren’t constantly active. Their stored tokens let them send a batch of requests without hitting the limit.

Whatever algorithm you pick, add the response headers. They cost nothing to implement and turn a frustrating 429 into a recoverable situation for your API clients.

API Rate Limiting: Token Bucket, Sliding Window, and Redis Patterns

The Four Algorithms

Fixed Window

Sliding Window

Token Bucket

Leaky Bucket

Response Headers

Where to Apply Rate Limits

Tiered Limits by Plan

Testing Rate Limiters

Choosing an Algorithm

Temporal for Durable Workflows: How We Finally Stopped Losing Background Jobs

Passkeys Are Ready: Implementing Passwordless Auth in Your Web App

More from Web Development

CSS Anchor Positioning: Tooltips and Popovers Without JavaScript

gRPC in 2026: When to Use It Instead of REST or GraphQL

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

Working notes from
the studio.

Join the conversation.

The Four Algorithms

Fixed Window

Sliding Window

Token Bucket

Leaky Bucket

Response Headers

Where to Apply Rate Limits

Tiered Limits by Plan

Testing Rate Limiters

Choosing an Algorithm

Temporal for Durable Workflows: How We Finally Stopped Losing Background Jobs

Passkeys Are Ready: Implementing Passwordless Auth in Your Web App

More from Web Development

CSS Anchor Positioning: Tooltips and Popovers Without JavaScript

gRPC in 2026: When to Use It Instead of REST or GraphQL

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.