ScaleX

a brief outline of the whole production project

🚀 Project Story: Scaling a Flask URL Shortener to Production-Grade Performance 🧩 Overview

We built and systematically evolved a Flask-based URL shortener from a single-instance prototype into a horizontally scaled, production-ready system capable of handling 500 concurrent users with <5% error rates.

Rather than jumping straight to scaling, we approached this as an engineering experiment, identifying bottlenecks at each stage and validating improvements through structured load testing using k6.

🏗️ Architecture Evolution 🔹 Tier 1: Single Instance (Baseline) Stack: Flask + Gunicorn + PostgreSQL Deployment: Single VM Gunicorn Config: 4 workers × 8 threads = 32 concurrent handlers

Performance @ 50 Concurrent Users:

Throughput: ~83 req/s p95 Latency: 455ms Error Rate: 0%

Observations:

System performs reliably under moderate load. Tail latency (p90+) increases due to: CPU-bound request handling Limited parallelism within worker processes

👉 Conclusion: The application layer is stable, but not optimized for high concurrency.

🔹 Tier 2: Horizontal Scaling (Silver) Stack Additions: Multiple Flask instances (3 replicas) Load balancing via NGINX Database: Single PostgreSQL instance

Performance @ 200 Concurrent Users:

Throughput: ~93 req/s Error Rate: 0% p95 Latency: Significantly increased

Key Bottleneck Identified:

Database saturation Every redirect triggers a SELECT query ~160 concurrent DB reads at peak load Connection pool exhaustion Increased disk I/O latency

Insight: Horizontal scaling improved throughput but shifted the bottleneck to the database layer.

👉 Conclusion: Stateless app scaling alone is insufficient for read-heavy systems.

🔹 Tier 3: Cached & Optimized (Gold) Stack Enhancements: Redis caching layer Cache-aside strategy for URL lookups Retained load-balanced multi-instance architecture

Optimization Strategy:

Cache frequently accessed short URLs in Redis Reduce redundant database reads Serve hot-path requests directly from memory

Performance @ 500 Concurrent Users:

Throughput: Significantly increased Error Rate: <5% Latency: Stabilized despite 10× load increase

Impact:

Eliminated majority of repeated DB queries Reduced database load dramatically Improved response time consistency

👉 Conclusion: Introducing caching transformed the system from DB-bound to memory-optimized, enabling true scalability.

📊 Testing Methodology

All tiers were evaluated using:

k6 for consistent, scriptable load generation Incremental concurrency testing (50 → 200 → 500 users) Metrics tracked: Throughput (req/s) p95 latency Error rates

This ensured data-driven validation of every architectural decision.

🧠 Key Learnings

Scaling Isn’t Linear

Adding more application instances doesn’t guarantee better performance — it often exposes deeper bottlenecks.

Databases Are the First Breaking Point

Read-heavy workloads can quickly overwhelm a single relational database without:

Connection pooling strategies Query optimization Caching layers

Caching Is a Force Multiplier

Introducing Redis:

Reduced database dependency Improved latency consistency Enabled horizontal scalability

Measure Everything

Without systematic testing, bottlenecks remain invisible. Load testing was critical in:

Identifying failure points Validating improvements Guiding architectural decisions 🏁 Final Architecture ┌──────────────┐ │ Client │ └──────┬───────┘ │ ┌──────▼───────┐ │ NGINX LB │ └──────┬───────┘ ┌──────────────┼──────────────┐ │ │ │ ┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐ │ Flask App │ │ Flask App │ │ Flask App │ │ (Gunicorn) │ │ (Gunicorn) │ │ (Gunicorn) │ └──────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ └──────┬───────┴───────┬──────┘ │ │ ┌──────▼──────┐ ┌─────▼──────┐ │ Redis │ │ PostgreSQL │ │ (Cache) │ │ (Primary) │ └─────────────┘ └────────────┘

💡 What Makes This Project Stand Out Real-world scalability journey, not just a static implementation Clear demonstration of system design trade-offs Data-backed performance improvements Practical use of industry-standard tools (NGINX, Redis, PostgreSQL) Strong focus on observability and bottleneck analysis