1. Implementing the URL shortener

  • Using the template: Flask + peewee + PostgreSQL
  • Initial Setup: Put PostgreSQL onto Docker.
  • Testing: Written unit tests and integration tests using pytest.
  • Documentation: Added Swagger for each endpoint (long-time FastAPI user habit! :D).

2. Database Connection Harnessing

  • DB Pooling: Noticed the app was connecting, querying, then closing every time, so I implemented pooling.
  • Timeouts: Added timeouts everywhere (query timeout, waiting for connection, stale timeout, etc.) to ensure queries do not hang.
  • Load Testing: Added a locust file to simulate user flows and weighted endpoints.

3. Noticed Bottleneck

  • Observation: During load testing, the database was overloaded with connections, tanking performance.
  • Solution: Added Redis caching as a simple key-value layer between the client and the database.

Why Redis? Dead simple, lightweight, and very fast. It's the industry standard and a total no-brainer for this use case.

4. Logging

  • Level-based logs: Included extra metadata like method, status code, and duration.
  • SQL Monitoring: Recorded SQL execution time specifically to detect N+1 queries.

5. Prometheus + Grafana

  • Infrastructure: Added Prometheus and Grafana to Docker; exposed the /metrics endpoint.
  • Instrumentation: Added metrics recording (HISTOGRAM, COUNTER, etc.) within the logging logic.
  • Processing: Configured metrics to record before/after requests, placing processing times into buckets.

Why Prometheus + Grafana? Best open-source combo—very effective and free!

What are some alternatives? OpenTelemetry is a strong contender (Enterprise-grade metrics, logging, and monitoring). However, given the hardware constraints (2-core CPU, 2GB RAM), OpenTelemetry would be overkill for this app.

6. Alerting Rules

  • Alertmanager: Configured to ping me for abnormalities (high 400+ error rates, high p95 latency, degraded DB, etc.).
  • Rules Logic: Initialized selective rules (e.g., alert on high p95 over 5 minutes rather than single spikes).
  • Integration: Added a Discord Webhook as the point of contact.

7. CI/CD & Deployment

  • Host: Deployed to a DigitalOcean Droplet.
  • Containerization: Containerized all components, performed local dry runs, and included a deploy.sh script.
  • Workflows: Wrote ci.yml and deploy.yml files.
  • Discovery: Containerized the NGINX proxy for the first time—didn't realize how easy that was!
  • Effort: Spent a significant amount of time here, despite not being my "first rodeo."

8. Miscellaneous

  • Deployment Strategy: Attempted Blue-Green deployment for 99.99% uptime. It failed miserably due to the lack of Kubernetes; stick to in-place deployment for lower downtime in this setup.
  • Log Aggregation: Added Loki + Promtail to view logs without SSH-ing into the server.
  • Resource Monitoring: Added cadvisor to monitor VPS usage; discovered I could safely run 4 Gunicorn workers instead of 2.
  • Security: Used a Docker network bridge to expose only client-facing services (Server, Grafana) while hiding internal services from port discovery.

Why Loki + Promtail? It took longer to set up than a simple tool like Dozzle, but the built-in Grafana dashboard integration and container-based querying make it much more powerful than a basic logging snapshot.

9. What’s Next

  • Scalability: The "elephant in the room." The app struggles to maintain sub-5s p95 latency with 50+ concurrent users.
  • Cache Optimization: The dashboard shows only a 50% hit rate—this should be much higher for a URL shortener.
  • Task Queue: Implement a proper distributed worker for the /users/bulk endpoint using a DLQ (Dead Letter Queue) and SSE (Server-Sent Events) to notify clients instead of polling.

Built With

Share this project:

Updates