kadi | Devpost

Inspiration

In the real world, code that works on your laptop often collapses under production load. We wanted to build something that doesn't just function — it survives. The MLH Production Engineering Hackathon challenged us to think like SREs: what happens when 500 users hit at once? What happens at 3 AM when the app crashes? Kadi is our answer.

What it does

Kadi is a production-grade URL shortener. You give it a long URL, it gives you a short code. Click the short code, get redirected instantly. Under the hood it handles user management, click analytics, and event tracking.

How we built it

Flask + Peewee ORM for the API layer
PostgreSQL for persistent storage
Redis for caching hot redirects (eliminating DB reads on repeat clicks)
Nginx as a reverse proxy load-balancing across 3 app instances
Docker Compose to orchestrate the full stack
Prometheus + Grafana for metrics and dashboards
Alertmanager + Discord for real-time alerting
pytest + GitHub Actions for CI with 76% code coverage
k6 for load testing up to 500 concurrent users

Challenges we ran into

Docker restart policy — docker kill sends SIGKILL which Docker treats as a manual stop and doesn't trigger restart. We had to simulate real crashes using kill -TERM 1 inside the container to demonstrate auto-recovery.
Alertmanager Discord integration — The native discord_configs in Alertmanager v0.31 had breaking changes. We built a lightweight Flask bridge that converts Alertmanager webhooks to Discord's format.
PostgreSQL sequence drift — After bulk-loading CSV seed data with explicit IDs, the auto-increment sequence was out of sync. New inserts collided with existing IDs until we reset the sequences post-import.
Redis caching with inactive URLs — Had to ensure the cache is invalidated when a URL is deactivated, so stale cached entries don't serve redirects for dormant routes.

Accomplishments that we're proud of

29/29 automated tests passing including all hidden edge case challenges
0% error rate at 500 concurrent users with 143 req/sec throughput
Full observability stack — metrics, structured JSON logs, alerting, and a live Grafana dashboard all wired together
Sub-40ms p95 latency at 50 users — the Redis cache makes redirects nearly instant after the first hit
Complete documentation: API reference, runbook, decision log, capacity plan, failure modes, and deploy guide

What we learned

Production engineering is less about writing code and more about anticipating failure
Caching is the single highest-leverage optimization — Redis cut our DB load by ~50% under heavy traffic
Observability isn't optional — without Prometheus and Grafana we would have been flying blind during load tests
Docker restart policies only trigger on non-zero exit codes from natural crashes, not from docker kill

What's next for Kadi

Custom short codes — let users choose their own slug
Link expiry — TTL-based deactivation
Dashboard UI — a frontend for managing links and viewing analytics
Async event logging — write click events to a Redis queue and flush to DB in background to reduce write latency under load
Auto-scaling — move to ECS or Kubernetes for dynamic horizontal scaling based on traffic

Built With

Updates

Lewis Sawe started this project — Apr 05, 2026 01:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.