URL Shortener System Design

Problem Statement

Design a URL shortening service like bit.ly that:

Converts long URLs to short URLs
Redirects short URLs to original URLs
Tracks click analytics
Handles high traffic

Requirements

Functional

Generate short URL from long URL
Redirect short URL to original
Custom short URLs (optional)
URL expiration
Analytics (click count, referrers)

Non-Functional

High availability (99.99%)
Low latency redirects (<100ms)
Scale to billions of URLs
Short URLs should not be predictable

Capacity Estimation

Assumptions:

100M new URLs/month
Read:Write ratio = 100:1
URL stored for 5 years

Storage:

100M × 12 × 5 = 6 billion URLs
Each URL ~500 bytes
Total: 6B × 500 = 3TB

Traffic:

Writes: 100M / (30 × 24 × 3600) ≈ 40 URLs/sec
Reads: 40 × 100 = 4000 redirects/sec

Short URL Generation

Approach 1: Base62 Encoding

Characters: a-z, A-Z, 0-9 (62 chars)

Length	Combinations
6	56.8 billion
7	3.5 trillion

7 characters is sufficient for our scale.

Approach 2: Hash + Truncate

import hashlib

def generate_short_url(long_url: str) -> str:
    hash_obj = hashlib.md5(long_url.encode())
    hash_hex = hash_obj.hexdigest()
    # Take first 7 chars, convert to base62
    return base62_encode(int(hash_hex[:12], 16))[:7]

Problem: Collisions Solution: Check DB, append counter if collision

Approach 3: Counter-based (Recommended)

Use a distributed counter service:

Counter: 1000000001
Base62:  1LY7VK

Pros: No collisions, predictable Cons: Sequential (use range-based allocation)

Database Design

Schema

CREATE TABLE urls (
    id BIGINT PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    user_id BIGINT,
    created_at TIMESTAMP,
    expires_at TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON urls(short_code);

Database Choice

Write-heavy creation: NoSQL (Cassandra, DynamoDB)
Read-heavy redirects: Cache + SQL
Recommendation: PostgreSQL + Redis cache

High-Level Architecture

┌─────────┐     ┌─────────────┐     ┌─────────────┐
│ Client  │────▶│ Load        │────▶│ API Servers │
└─────────┘     │ Balancer    │     └─────────────┘
                └─────────────┘            │
                                          ├─▶ Redis Cache
                                          │
                                          └─▶ Database

API Design

Create Short URL

POST /api/shorten
{
  "url": "https://example.com/very/long/url",
  "custom_alias": "mylink",  // optional
  "expires_at": "2025-12-31"  // optional
}

Response:
{
  "short_url": "https://short.ly/abc123",
  "expires_at": "2025-12-31"
}

Redirect

GET /{short_code}
Response: 301 Redirect to long_url

Caching Strategy

def get_long_url(short_code: str) -> str:
    # Check cache first
    cached = redis.get(f"url:{short_code}")
    if cached:
        return cached
    
    # Cache miss - query DB
    url = db.query("SELECT long_url FROM urls WHERE short_code = ?", short_code)
    
    if url:
        redis.setex(f"url:{short_code}", 3600, url)  # Cache for 1 hour
    
    return url

Analytics

Track asynchronously:

Push click events to Kafka
Process with Spark/Flink
Store aggregated data in ClickHouse

Key Takeaways

Base62 encoding with 7 chars handles billions of URLs
Use distributed counter for collision-free generation
Cache aggressively - redirects are read-heavy
Track analytics asynchronously
Consider CDN for global low latency

Designing a URL Shortener (like bit.ly)