πŸš€ Project Story: Scaling a Flask URL Shortener to Production-Grade Performance 🧩 Overview

We built and systematically evolved a Flask-based URL shortener from a single-instance prototype into a horizontally scaled, production-ready system capable of handling 500 concurrent users with <5% error rates.

Rather than jumping straight to scaling, we approached this as an engineering experiment, identifying bottlenecks at each stage and validating improvements through structured load testing using k6.

πŸ—οΈ Architecture Evolution πŸ”Ή Tier 1: Single Instance (Baseline) Stack: Flask + Gunicorn + PostgreSQL Deployment: Single VM Gunicorn Config: 4 workers Γ— 8 threads = 32 concurrent handlers

Performance @ 50 Concurrent Users:

Throughput: ~83 req/s p95 Latency: 455ms Error Rate: 0%

Observations:

System performs reliably under moderate load. Tail latency (p90+) increases due to: CPU-bound request handling Limited parallelism within worker processes

πŸ‘‰ Conclusion: The application layer is stable, but not optimized for high concurrency.

πŸ”Ή Tier 2: Horizontal Scaling (Silver) Stack Additions: Multiple Flask instances (3 replicas) Load balancing via NGINX Database: Single PostgreSQL instance

Performance @ 200 Concurrent Users:

Throughput: ~93 req/s Error Rate: 0% p95 Latency: Significantly increased

Key Bottleneck Identified:

Database saturation Every redirect triggers a SELECT query ~160 concurrent DB reads at peak load Connection pool exhaustion Increased disk I/O latency

Insight: Horizontal scaling improved throughput but shifted the bottleneck to the database layer.

πŸ‘‰ Conclusion: Stateless app scaling alone is insufficient for read-heavy systems.

πŸ”Ή Tier 3: Cached & Optimized (Gold) Stack Enhancements: Redis caching layer Cache-aside strategy for URL lookups Retained load-balanced multi-instance architecture

Optimization Strategy:

Cache frequently accessed short URLs in Redis Reduce redundant database reads Serve hot-path requests directly from memory

Performance @ 500 Concurrent Users:

Throughput: Significantly increased Error Rate: <5% Latency: Stabilized despite 10Γ— load increase

Impact:

Eliminated majority of repeated DB queries Reduced database load dramatically Improved response time consistency

πŸ‘‰ Conclusion: Introducing caching transformed the system from DB-bound to memory-optimized, enabling true scalability.

πŸ“Š Testing Methodology

All tiers were evaluated using:

k6 for consistent, scriptable load generation Incremental concurrency testing (50 β†’ 200 β†’ 500 users) Metrics tracked: Throughput (req/s) p95 latency Error rates

This ensured data-driven validation of every architectural decision.

🧠 Key Learnings

  1. Scaling Isn’t Linear

Adding more application instances doesn’t guarantee better performance β€” it often exposes deeper bottlenecks.

  1. Databases Are the First Breaking Point

Read-heavy workloads can quickly overwhelm a single relational database without:

Connection pooling strategies Query optimization Caching layers

  1. Caching Is a Force Multiplier

Introducing Redis:

Reduced database dependency Improved latency consistency Enabled horizontal scalability

  1. Measure Everything

Without systematic testing, bottlenecks remain invisible. Load testing was critical in:

Identifying failure points Validating improvements Guiding architectural decisions 🏁 Final Architecture β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ NGINX LB β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Flask App β”‚ β”‚ Flask App β”‚ β”‚ Flask App β”‚ β”‚ (Gunicorn) β”‚ β”‚ (Gunicorn) β”‚ β”‚ (Gunicorn) β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Redis β”‚ β”‚ PostgreSQL β”‚ β”‚ (Cache) β”‚ β”‚ (Primary) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’‘ What Makes This Project Stand Out Real-world scalability journey, not just a static implementation Clear demonstration of system design trade-offs Data-backed performance improvements Practical use of industry-standard tools (NGINX, Redis, PostgreSQL) Strong focus on observability and bottleneck analysis

Share this project:

Updates