medium HLD 20 min read
Designing a Rate Limiter
A comprehensive guide to designing a rate limiter system - covering Token Bucket, Sliding Window, and distributed implementations.
Asked at: GoogleAmazonStripeUber
Published: December 20, 2024
Rate Limiter System Design
Problem Statement
Design a rate limiter that can:
- Limit the number of requests a client can make in a given time window
- Work in a distributed environment
- Handle millions of requests per second
- Provide configurable rules per client/API
Requirements
Functional Requirements
- Limit requests based on user ID, IP, or API key
- Support different rate limits for different APIs
- Return appropriate response when limit exceeded (HTTP 429)
- Allow burst traffic within limits
Non-Functional Requirements
- Low latency (<10ms for rate check)
- High availability
- Distributed (works across multiple servers)
- Accurate (no significant over/under counting)
Algorithms
1. Token Bucket
Bucket capacity: N tokens
Refill rate: R tokens/second
For each request:
if tokens >= 1:
tokens -= 1
allow request
else:
reject request (429)
Pros: Allows burst, simple to implement Cons: Memory overhead per user
2. Sliding Window Log
Store timestamp of each request in a sorted set:
For each request at time T:
Remove entries older than T - window_size
if count < limit:
Add timestamp T
allow request
else:
reject request
Pros: Accurate Cons: Memory intensive
3. Sliding Window Counter
Hybrid approach:
current_window_count * overlap_percentage + previous_window_count
Pros: Memory efficient, smooths edges Cons: Approximate
High-Level Architecture
┌─────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │────▶│ Rate Limiter │────▶│ API Server │
└─────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ Redis │
│ (Counters) │
└──────────────┘
Distributed Implementation
Using Redis
def is_rate_limited(user_id: str, limit: int, window: int) -> bool:
key = f"rate:{user_id}"
current = redis.get(key)
if current is None:
redis.setex(key, window, 1)
return False
if int(current) >= limit:
return True
redis.incr(key)
return False
Race Condition Handling
Use Lua scripts for atomic operations:
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('GET', key)
if current and tonumber(current) >= limit then
return 1 -- Rate limited
end
if current then
redis.call('INCR', key)
else
redis.call('SETEX', key, window, 1)
end
return 0 -- Allowed
API Response
When rate limited:
{
"error": "Rate limit exceeded",
"retry_after": 30,
"limit": 100,
"remaining": 0,
"reset": 1703123456
}
Headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1703123456
Retry-After: 30
Scaling Considerations
- Multiple Redis instances - Use consistent hashing
- Local cache - Cache rate limit decisions briefly
- Async logging - Don’t block on analytics
- Graceful degradation - If Redis is down, allow or use local limits
Key Takeaways
- Token Bucket is most commonly used (allows bursts)
- Use Redis for distributed rate limiting
- Always handle race conditions with atomic operations
- Include rate limit headers in responses
- Plan for Redis failures
Tags
system-designapidistributed-systemsscalability
Found this helpful?
Subscribe to get more system design content directly in your inbox.