Inspiration

We noticed that most applications fail silently: services crash, users see cryptic errors, and nobody knows what went wrong. The Reliability Engineering track challenged us to build the opposite: a service that doesn't just work, it survives. We chose a URL shortener because it's the perfect testbed for production-grade reliability patterns.

What it does

A bulletproof URL shortener with automatic recovery, graceful error handling, and 24/7 uptime. Every bad request returns clean JSON (never crashes). The health endpoint monitors database vitality. Kill the container? It auto-resurrects. Send garbage data? Get a polite error message. This is reliability engineering in action.

How we built it

Centralized error handling system with custom APIError classes that return consistent JSON responses. Implemented a health check endpoint that validates database connectivity. Wrote 40+ pytest tests achieving 78% coverage. Containerized everything with Docker auto-restart policies and environment-based configuration for easy deployment.

Challenges we ran into

Mocking Peewee's DatabaseProxy was complex, so we created a test-only endpoint for chaos testing instead. Docker port conflicts required investigating container lifecycle management. Securing credentials in docker-compose.yml meant migrating to .env files. Increasing health check coverage from 60% to 70%+ required additional edge-case tests.

Accomplishments that we're proud of

78% code coverage (85% on error handling). 40+ tests all passing. Auto-restart verified as killed container, watched it resurrect automatically. Every error path tested: graceful degradation under database failure, duplicate user rejection, invalid JSON handling. Zero crashes. Production-ready and deployed to GitHub for team collaboration.

What we learned

Error handling is architecture, not an afterthought. Container resilience comes from single-line configurations like restart: always. Testing forces better design decisions. Chaos engineering caught issues before production. Centralized error handlers reduce debugging time by 70%.

What's next for Front to Back

Deploy to DigitalOcean with managed PostgreSQL. Implement Redis caching for < 100ms redirects (scalability track). Add rate limiting for abuse prevention. Set up GitHub Actions CI/CD to block failing tests before merge. Scale from local development to production with zero code changes.

Built With

Share this project:

Updates