π Project Story: Scaling a Flask URL Shortener to Production-Grade Performance π§© Overview
We built and systematically evolved a Flask-based URL shortener from a single-instance prototype into a horizontally scaled, production-ready system capable of handling 500 concurrent users with <5% error rates.
Rather than jumping straight to scaling, we approached this as an engineering experiment, identifying bottlenecks at each stage and validating improvements through structured load testing using k6.
ποΈ Architecture Evolution πΉ Tier 1: Single Instance (Baseline) Stack: Flask + Gunicorn + PostgreSQL Deployment: Single VM Gunicorn Config: 4 workers Γ 8 threads = 32 concurrent handlers
Performance @ 50 Concurrent Users:
Throughput: ~83 req/s p95 Latency: 455ms Error Rate: 0%
Observations:
System performs reliably under moderate load. Tail latency (p90+) increases due to: CPU-bound request handling Limited parallelism within worker processes
π Conclusion: The application layer is stable, but not optimized for high concurrency.
πΉ Tier 2: Horizontal Scaling (Silver) Stack Additions: Multiple Flask instances (3 replicas) Load balancing via NGINX Database: Single PostgreSQL instance
Performance @ 200 Concurrent Users:
Throughput: ~93 req/s Error Rate: 0% p95 Latency: Significantly increased
Key Bottleneck Identified:
Database saturation Every redirect triggers a SELECT query ~160 concurrent DB reads at peak load Connection pool exhaustion Increased disk I/O latency
Insight: Horizontal scaling improved throughput but shifted the bottleneck to the database layer.
π Conclusion: Stateless app scaling alone is insufficient for read-heavy systems.
πΉ Tier 3: Cached & Optimized (Gold) Stack Enhancements: Redis caching layer Cache-aside strategy for URL lookups Retained load-balanced multi-instance architecture
Optimization Strategy:
Cache frequently accessed short URLs in Redis Reduce redundant database reads Serve hot-path requests directly from memory
Performance @ 500 Concurrent Users:
Throughput: Significantly increased Error Rate: <5% Latency: Stabilized despite 10Γ load increase
Impact:
Eliminated majority of repeated DB queries Reduced database load dramatically Improved response time consistency
π Conclusion: Introducing caching transformed the system from DB-bound to memory-optimized, enabling true scalability.
π Testing Methodology
All tiers were evaluated using:
k6 for consistent, scriptable load generation Incremental concurrency testing (50 β 200 β 500 users) Metrics tracked: Throughput (req/s) p95 latency Error rates
This ensured data-driven validation of every architectural decision.
π§ Key Learnings
- Scaling Isnβt Linear
Adding more application instances doesnβt guarantee better performance β it often exposes deeper bottlenecks.
- Databases Are the First Breaking Point
Read-heavy workloads can quickly overwhelm a single relational database without:
Connection pooling strategies Query optimization Caching layers
- Caching Is a Force Multiplier
Introducing Redis:
Reduced database dependency Improved latency consistency Enabled horizontal scalability
- Measure Everything
Without systematic testing, bottlenecks remain invisible. Load testing was critical in:
Identifying failure points Validating improvements Guiding architectural decisions π Final Architecture ββββββββββββββββ β Client β ββββββββ¬ββββββββ β ββββββββΌββββββββ β NGINX LB β ββββββββ¬ββββββββ ββββββββββββββββΌβββββββββββββββ β β β ββββββββΌβββββββ βββββββΌβββββββ βββββββΌβββββββ β Flask App β β Flask App β β Flask App β β (Gunicorn) β β (Gunicorn) β β (Gunicorn) β ββββββββ¬βββββββ βββββββ¬βββββββ βββββββ¬βββββββ β β β ββββββββ¬ββββββββ΄ββββββββ¬βββββββ β β ββββββββΌβββββββ βββββββΌβββββββ β Redis β β PostgreSQL β β (Cache) β β (Primary) β βββββββββββββββ ββββββββββββββ
π‘ What Makes This Project Stand Out Real-world scalability journey, not just a static implementation Clear demonstration of system design trade-offs Data-backed performance improvements Practical use of industry-standard tools (NGINX, Redis, PostgreSQL) Strong focus on observability and bottleneck analysis
Log in or sign up for Devpost to join the conversation.