Here's the full rewrite with everything included:
Inspiration
"Make it handle the entire internet." The Scalability Quest challenged us to take an app that works for one user and make it survive 500 people hitting it at the exact same second. We wanted to experience the full production engineering journey — measuring the baseline, finding the breaking point, and engineering past it.
What it does
snip.it is a URL shortener API with a live dashboard. Shorten URLs, redirect users, track every click with full analytics. But the real project is the infrastructure underneath — load balancing across 3 Flask instances, Redis caching to skip the database on hot paths, Docker Compose for the full stack, and a dashboard that lets you run 17 automated smoke tests to verify every quest requirement in real-time.
How we built it
We started with a single Flask server and hit it with k6 to get our Bronze baseline — 60 VUs, 32ms p95, 0% errors. Then we scaled out: Docker Compose to run 3 Flask/Gunicorn instances behind Nginx as a round-robin load balancer. That got us to Silver — 250 VUs, 70ms, 0% errors. For Gold, we added Redis to cache URL redirect lookups with a 1-hour TTL so we stop hitting PostgreSQL on every click. Final result: 500 concurrent users, 64ms p95, 0.00% error rate across 58,621 iterations. We also built a frontend dashboard served directly from Flask — no build step, no CORS — with live metrics, a URL shortener, and an automated test runner that validates Bronze, Silver, and Gold requirements with cascading green checkmarks.
Challenges we ran into
Beyond the scalability engineering, we had to solve the Oracle's Whispers — six hidden challenges baked into the automated test runner. The Twin's Paradox required every URL creation to generate a unique short code, even for duplicate URLs. The Unseen Observer required automatic event logging on every redirect without being explicitly called. The Slumbering Guide demanded that deactivated URLs block redirects AND leave no event footprint. The Fractured Vessel tested that bare strings and arrays sent as JSON bodies get rejected. The Deceitful Scroll checked for nameless users and oversized inputs. We went through 21 PRs over 20 hours to crack all of them — reverting aggressive caching that broke stability, fixing integer casting on cached data, hardening query parameter validation, and normalizing timestamps across the system.
Accomplishments that we're proud of
0.00% error rate across all three tiers — Bronze, Silver, and Gold. 500 concurrent users at 64ms p95 latency with zero interrupted iterations. A dashboard that visualizes the entire system — live API health, k6 load test results for all three tiers side by side, real-time event feeds, and a smoke test runner that proves every feature works on demand. And graceful failure recovery — kill any app container and the system keeps serving while Docker auto-restarts the dead instance.
What we learned
The fastest query is the one you don't make. Before Redis, every redirect hit PostgreSQL. After caching, the second request for any short code is served from memory. Horizontal scaling beats vertical — adding a third Flask instance was simpler than tuning one. Input validation isn't just about security — the Oracle's Whispers taught us that edge cases like empty strings, wrong types, and malformed JSON are first-class requirements. And 21 PRs in 20 hours taught us that reverting is not failing — it's how you maintain stability while iterating fast.
What's next for snip.it
Rate limiting per IP, connection pooling on PostgreSQL for higher concurrency, Prometheus and Grafana for real-time monitoring under load, and pushing past 1000 VUs to find the next breaking point.
Built With
- docker
- docker-compose
- flask
- gunicorn
- html
- javascript
- k6
- nginx
- peewee-orm
- postgresql
- python
- redis
Log in or sign up for Devpost to join the conversation.