PE Hackathon 2026 — URL Shortener: Scalability Quest

Inspiration

The scalability challenge immediately caught my attention. Most developers can build a URL shortener in an afternoon — but can it handle 500 concurrent users hammering it at once without breaking a sweat? That's a fundamentally different engineering problem. I wanted to build something that could demonstrate real, measurable performance under load, not just a toy that works in development.

The constraint of using the MLH template stack (Flask + Peewee + PostgreSQL) made it more interesting — I couldn't just reach for the most performant framework. I had to make this stack fast.

What I Built

A production-grade URL shortener with a Redis caching layer, deployed on Railway with a full CI/CD pipeline. The system can handle 500 concurrent virtual users with zero errors, serving redirects in under 100ms on warm cache paths.

Core features:

  • POST /shorten — create short URLs with optional custom codes and titles
  • GET /<code> — redirect to original URL (cache-first, sub-1ms on hits)
  • GET /stats/<code> — click count and metadata for any short URL
  • GET /health — live database and Redis connectivity check
  • Full seed data pipeline loading 400 users, 2000 URLs, and 3422 events from the MLH platform CSVs

How I Built It

Architecture

The key insight was that a URL shortener has an extremely skewed read/write ratio — roughly 95% of traffic is redirects, 5% is creation. That means the hot path (GET /<code>) needs to be as fast as possible, while writes can afford a little latency.

The architecture has three layers:

  1. Redis cache — every active URL is pre-warmed into Redis on startup using a pipeline batch. On a cache hit, the redirect is served with a single Redis GET (~1ms). No database touch.

  2. PostgreSQL — only hit on cache misses (new URLs not yet cached) and writes. Peewee's ORM maps exactly to the seed data schema.

  3. Gunicorn with gthread workers — 4 workers × 4 threads = 16 concurrent request handlers per container, keeping CPU utilization high without thread contention.

Scalability Decisions

  • Cache warming on startup: instead of waiting for the first 2000 requests to populate Redis, all active URLs are bulk-loaded via a Redis pipeline before the first request arrives. Cold start penalty is eliminated.

  • 404 caching: non-existent short codes are cached for 60 seconds as a sentinel value. Repeated probes for missing codes never touch the database.

  • Connection keep-alive: gunicorn is configured with --keep-alive 5 to reuse TCP connections from Railway's edge proxy, reducing handshake overhead at scale.

  • Schema matches seed data exactly: the urls table uses original_url (not long_url) and includes title, updated_at, and click_count — matching the MLH CSVs field-for-field so the judge evaluator finds exactly what it expects.

CI/CD

GitHub Actions runs pytest on every push with a live PostgreSQL and Redis service container. Coverage threshold is enforced at 50%. Railway auto-deploys from the main branch on green builds.

Challenges

The localhost Crash

The first Railway deployment crashed immediately with a Postgres connection refused error. The app was connecting to localhost because Railway's Postgres service uses an internal hostname (postgres.railway.internal) that isn't reachable from a Mac. The fix required discovering the DATABASE_PUBLIC_URL variable, extracting the proxy hostname and port, and running the seed script with those values explicitly set.

Schema Mismatch Discovery

Halfway through building, I discovered the seed CSV uses original_url but my model was using long_url. The judges' evaluator would query for original_url and find nothing. This required updating the model, all routes, and the cache serialization simultaneously — while keeping the app backward-compatible with both field names in the POST body.

Free Tier CPU Throttling

Under 500 concurrent users, Railway's free tier throttles CPU aggressively. The p95 latency hit 9.5 seconds — not because the architecture was wrong, but because the shared vCPU couldn't process requests fast enough. The cache hits were fast; the bottleneck was pure compute. This highlighted an important lesson: caching solves latency, but hardware solves throughput.

Connection Pool Regression

Adding PooledPostgresqlDatabase to replace the default PostgresqlDatabase broke the /shorten endpoint entirely — all 667 POST requests in the load test failed. The template's DatabaseProxy pattern doesn't compose cleanly with Peewee's pool implementation. Rolled back to the standard database and accepted the connection overhead rather than debug ORM internals during a hackathon.

What I Learned

  • Measure before you optimize. The first instinct was to add more workers and tune gunicorn. The actual bottleneck was cold cache misses and 404s hitting the database. k6 metrics made this visible.

  • Schema is a contract. In a judge-evaluated system, your data model must match the evaluator's expectations exactly. A single renamed field (long_url vs original_url) is the difference between 0% and 100% on a test suite.

  • Cache warming is as important as caching. A cache that starts cold is useless for a load test that starts immediately. Pre-loading 2000 URLs into Redis via a pipeline batch on startup meant every redirect was a cache hit from request number one.

  • Hardware matters. No amount of Redis caching overcomes a throttled shared vCPU under 500 concurrent users. Scalable architecture requires scalable infrastructure.

Tech Stack

  • Language: Python 3.13
  • Framework: Flask + Peewee ORM
  • Database: PostgreSQL 16
  • Cache: Redis 7 (LRU eviction, 256MB)
  • Server: Gunicorn (gthread, 4 workers × 4 threads)
  • Deployment: Railway (Docker)
  • Load Testing: k6 (500 VUs, 3m30s)
  • CI: GitHub Actions
  • Package Manager: uv

Built With

Share this project:

Updates