medium HLD 20 min read

Designing a Rate Limiter

A comprehensive guide to designing a rate limiter system - covering Token Bucket, Sliding Window, and distributed implementations.

Asked at: GoogleAmazonStripeUber
Published: December 20, 2024
Designing a Rate Limiter

Rate Limiter System Design

Problem Statement

Design a rate limiter that can:

  • Limit the number of requests a client can make in a given time window
  • Work in a distributed environment
  • Handle millions of requests per second
  • Provide configurable rules per client/API

Requirements

Functional Requirements

  1. Limit requests based on user ID, IP, or API key
  2. Support different rate limits for different APIs
  3. Return appropriate response when limit exceeded (HTTP 429)
  4. Allow burst traffic within limits

Non-Functional Requirements

  1. Low latency (<10ms for rate check)
  2. High availability
  3. Distributed (works across multiple servers)
  4. Accurate (no significant over/under counting)

Algorithms

1. Token Bucket

Bucket capacity: N tokens
Refill rate: R tokens/second

For each request:
  if tokens >= 1:
    tokens -= 1
    allow request
  else:
    reject request (429)

Pros: Allows burst, simple to implement Cons: Memory overhead per user

2. Sliding Window Log

Store timestamp of each request in a sorted set:

For each request at time T:
  Remove entries older than T - window_size
  if count < limit:
    Add timestamp T
    allow request
  else:
    reject request

Pros: Accurate Cons: Memory intensive

3. Sliding Window Counter

Hybrid approach:

current_window_count * overlap_percentage + previous_window_count

Pros: Memory efficient, smooths edges Cons: Approximate

High-Level Architecture

┌─────────┐     ┌──────────────┐     ┌─────────────┐
│ Client  │────▶│ Rate Limiter │────▶│ API Server  │
└─────────┘     └──────────────┘     └─────────────┘


               ┌──────────────┐
               │    Redis     │
               │  (Counters)  │
               └──────────────┘

Distributed Implementation

Using Redis

def is_rate_limited(user_id: str, limit: int, window: int) -> bool:
    key = f"rate:{user_id}"
    current = redis.get(key)
    
    if current is None:
        redis.setex(key, window, 1)
        return False
    
    if int(current) >= limit:
        return True
    
    redis.incr(key)
    return False

Race Condition Handling

Use Lua scripts for atomic operations:

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])

local current = redis.call('GET', key)
if current and tonumber(current) >= limit then
    return 1  -- Rate limited
end

if current then
    redis.call('INCR', key)
else
    redis.call('SETEX', key, window, 1)
end
return 0  -- Allowed

API Response

When rate limited:

{
  "error": "Rate limit exceeded",
  "retry_after": 30,
  "limit": 100,
  "remaining": 0,
  "reset": 1703123456
}

Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1703123456
Retry-After: 30

Scaling Considerations

  1. Multiple Redis instances - Use consistent hashing
  2. Local cache - Cache rate limit decisions briefly
  3. Async logging - Don’t block on analytics
  4. Graceful degradation - If Redis is down, allow or use local limits

Key Takeaways

  • Token Bucket is most commonly used (allows bursts)
  • Use Redis for distributed rate limiting
  • Always handle race conditions with atomic operations
  • Include rate limit headers in responses
  • Plan for Redis failures

Tags

system-designapidistributed-systemsscalability

Found this helpful?

Subscribe to get more system design content directly in your inbox.