medium HLD 25 min read

Designing a URL Shortener (like bit.ly)

System design for a URL shortening service - handling billions of URLs, high availability, and analytics.

Asked at: GoogleMicrosoftAmazonMeta
Published: December 15, 2024
Designing a URL Shortener (like bit.ly)

URL Shortener System Design

Problem Statement

Design a URL shortening service like bit.ly that:

  • Converts long URLs to short URLs
  • Redirects short URLs to original URLs
  • Tracks click analytics
  • Handles high traffic

Requirements

Functional

  1. Generate short URL from long URL
  2. Redirect short URL to original
  3. Custom short URLs (optional)
  4. URL expiration
  5. Analytics (click count, referrers)

Non-Functional

  1. High availability (99.99%)
  2. Low latency redirects (<100ms)
  3. Scale to billions of URLs
  4. Short URLs should not be predictable

Capacity Estimation

Assumptions:

  • 100M new URLs/month
  • Read:Write ratio = 100:1
  • URL stored for 5 years

Storage:

  • 100M × 12 × 5 = 6 billion URLs
  • Each URL ~500 bytes
  • Total: 6B × 500 = 3TB

Traffic:

  • Writes: 100M / (30 × 24 × 3600) ≈ 40 URLs/sec
  • Reads: 40 × 100 = 4000 redirects/sec

Short URL Generation

Approach 1: Base62 Encoding

Characters: a-z, A-Z, 0-9 (62 chars)

LengthCombinations
656.8 billion
73.5 trillion

7 characters is sufficient for our scale.

Approach 2: Hash + Truncate

import hashlib

def generate_short_url(long_url: str) -> str:
    hash_obj = hashlib.md5(long_url.encode())
    hash_hex = hash_obj.hexdigest()
    # Take first 7 chars, convert to base62
    return base62_encode(int(hash_hex[:12], 16))[:7]

Problem: Collisions Solution: Check DB, append counter if collision

Use a distributed counter service:

Counter: 1000000001
Base62:  1LY7VK

Pros: No collisions, predictable Cons: Sequential (use range-based allocation)

Database Design

Schema

CREATE TABLE urls (
    id BIGINT PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    user_id BIGINT,
    created_at TIMESTAMP,
    expires_at TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON urls(short_code);

Database Choice

  • Write-heavy creation: NoSQL (Cassandra, DynamoDB)
  • Read-heavy redirects: Cache + SQL
  • Recommendation: PostgreSQL + Redis cache

High-Level Architecture

┌─────────┐     ┌─────────────┐     ┌─────────────┐
│ Client  │────▶│ Load        │────▶│ API Servers │
└─────────┘     │ Balancer    │     └─────────────┘
                └─────────────┘            │
                                          ├─▶ Redis Cache

                                          └─▶ Database

API Design

Create Short URL

POST /api/shorten
{
  "url": "https://example.com/very/long/url",
  "custom_alias": "mylink",  // optional
  "expires_at": "2025-12-31"  // optional
}

Response:
{
  "short_url": "https://short.ly/abc123",
  "expires_at": "2025-12-31"
}

Redirect

GET /{short_code}
Response: 301 Redirect to long_url

Caching Strategy

def get_long_url(short_code: str) -> str:
    # Check cache first
    cached = redis.get(f"url:{short_code}")
    if cached:
        return cached
    
    # Cache miss - query DB
    url = db.query("SELECT long_url FROM urls WHERE short_code = ?", short_code)
    
    if url:
        redis.setex(f"url:{short_code}", 3600, url)  # Cache for 1 hour
    
    return url

Analytics

Track asynchronously:

  1. Push click events to Kafka
  2. Process with Spark/Flink
  3. Store aggregated data in ClickHouse

Key Takeaways

  • Base62 encoding with 7 chars handles billions of URLs
  • Use distributed counter for collision-free generation
  • Cache aggressively - redirects are read-heavy
  • Track analytics asynchronously
  • Consider CDN for global low latency

Tags

system-designdatabasedistributed-systemshashing

Found this helpful?

Subscribe to get more system design content directly in your inbox.