medium HLD 25 min read
Designing a URL Shortener (like bit.ly)
System design for a URL shortening service - handling billions of URLs, high availability, and analytics.
Asked at: GoogleMicrosoftAmazonMeta
Published: December 15, 2024
URL Shortener System Design
Problem Statement
Design a URL shortening service like bit.ly that:
- Converts long URLs to short URLs
- Redirects short URLs to original URLs
- Tracks click analytics
- Handles high traffic
Requirements
Functional
- Generate short URL from long URL
- Redirect short URL to original
- Custom short URLs (optional)
- URL expiration
- Analytics (click count, referrers)
Non-Functional
- High availability (99.99%)
- Low latency redirects (<100ms)
- Scale to billions of URLs
- Short URLs should not be predictable
Capacity Estimation
Assumptions:
- 100M new URLs/month
- Read:Write ratio = 100:1
- URL stored for 5 years
Storage:
- 100M × 12 × 5 = 6 billion URLs
- Each URL ~500 bytes
- Total: 6B × 500 = 3TB
Traffic:
- Writes: 100M / (30 × 24 × 3600) ≈ 40 URLs/sec
- Reads: 40 × 100 = 4000 redirects/sec
Short URL Generation
Approach 1: Base62 Encoding
Characters: a-z, A-Z, 0-9 (62 chars)
| Length | Combinations |
|---|---|
| 6 | 56.8 billion |
| 7 | 3.5 trillion |
7 characters is sufficient for our scale.
Approach 2: Hash + Truncate
import hashlib
def generate_short_url(long_url: str) -> str:
hash_obj = hashlib.md5(long_url.encode())
hash_hex = hash_obj.hexdigest()
# Take first 7 chars, convert to base62
return base62_encode(int(hash_hex[:12], 16))[:7]
Problem: Collisions Solution: Check DB, append counter if collision
Approach 3: Counter-based (Recommended)
Use a distributed counter service:
Counter: 1000000001
Base62: 1LY7VK
Pros: No collisions, predictable Cons: Sequential (use range-based allocation)
Database Design
Schema
CREATE TABLE urls (
id BIGINT PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP,
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);
Database Choice
- Write-heavy creation: NoSQL (Cassandra, DynamoDB)
- Read-heavy redirects: Cache + SQL
- Recommendation: PostgreSQL + Redis cache
High-Level Architecture
┌─────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│ Load │────▶│ API Servers │
└─────────┘ │ Balancer │ └─────────────┘
└─────────────┘ │
├─▶ Redis Cache
│
└─▶ Database
API Design
Create Short URL
POST /api/shorten
{
"url": "https://example.com/very/long/url",
"custom_alias": "mylink", // optional
"expires_at": "2025-12-31" // optional
}
Response:
{
"short_url": "https://short.ly/abc123",
"expires_at": "2025-12-31"
}
Redirect
GET /{short_code}
Response: 301 Redirect to long_url
Caching Strategy
def get_long_url(short_code: str) -> str:
# Check cache first
cached = redis.get(f"url:{short_code}")
if cached:
return cached
# Cache miss - query DB
url = db.query("SELECT long_url FROM urls WHERE short_code = ?", short_code)
if url:
redis.setex(f"url:{short_code}", 3600, url) # Cache for 1 hour
return url
Analytics
Track asynchronously:
- Push click events to Kafka
- Process with Spark/Flink
- Store aggregated data in ClickHouse
Key Takeaways
- Base62 encoding with 7 chars handles billions of URLs
- Use distributed counter for collision-free generation
- Cache aggressively - redirects are read-heavy
- Track analytics asynchronously
- Consider CDN for global low latency
Tags
system-designdatabasedistributed-systemshashing
Found this helpful?
Subscribe to get more system design content directly in your inbox.