Rate Limiter System Design

Problem Statement

Design a rate limiter that can:

Limit the number of requests a client can make in a given time window
Work in a distributed environment
Handle millions of requests per second
Provide configurable rules per client/API

Requirements

Functional Requirements

Limit requests based on user ID, IP, or API key
Support different rate limits for different APIs
Return appropriate response when limit exceeded (HTTP 429)
Allow burst traffic within limits

Non-Functional Requirements

Low latency (<10ms for rate check)
High availability
Distributed (works across multiple servers)
Accurate (no significant over/under counting)

Algorithms

1. Token Bucket

Bucket capacity: N tokens
Refill rate: R tokens/second

For each request:
  if tokens >= 1:
    tokens -= 1
    allow request
  else:
    reject request (429)

Pros: Allows burst, simple to implement Cons: Memory overhead per user

2. Sliding Window Log

Store timestamp of each request in a sorted set:

For each request at time T:
  Remove entries older than T - window_size
  if count < limit:
    Add timestamp T
    allow request
  else:
    reject request

Pros: Accurate Cons: Memory intensive

3. Sliding Window Counter

Hybrid approach:

current_window_count * overlap_percentage + previous_window_count

Pros: Memory efficient, smooths edges Cons: Approximate

High-Level Architecture

┌─────────┐     ┌──────────────┐     ┌─────────────┐
│ Client  │────▶│ Rate Limiter │────▶│ API Server  │
└─────────┘     └──────────────┘     └─────────────┘
                       │
                       ▼
               ┌──────────────┐
               │    Redis     │
               │  (Counters)  │
               └──────────────┘

Distributed Implementation

Using Redis

def is_rate_limited(user_id: str, limit: int, window: int) -> bool:
    key = f"rate:{user_id}"
    current = redis.get(key)
    
    if current is None:
        redis.setex(key, window, 1)
        return False
    
    if int(current) >= limit:
        return True
    
    redis.incr(key)
    return False

Race Condition Handling

Use Lua scripts for atomic operations:

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])

local current = redis.call('GET', key)
if current and tonumber(current) >= limit then
    return 1  -- Rate limited
end

if current then
    redis.call('INCR', key)
else
    redis.call('SETEX', key, window, 1)
end
return 0  -- Allowed

API Response

When rate limited:

{
  "error": "Rate limit exceeded",
  "retry_after": 30,
  "limit": 100,
  "remaining": 0,
  "reset": 1703123456
}

Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1703123456
Retry-After: 30

Scaling Considerations

Multiple Redis instances - Use consistent hashing
Local cache - Cache rate limit decisions briefly
Async logging - Don’t block on analytics
Graceful degradation - If Redis is down, allow or use local limits

Key Takeaways

Token Bucket is most commonly used (allows bursts)
Use Redis for distributed rate limiting
Always handle race conditions with atomic operations
Include rate limit headers in responses
Plan for Redis failures

Designing a Rate Limiter