-
Notifications
You must be signed in to change notification settings - Fork 613
[PERFORMANCE]: Global Rate Limiting for Gateway Protection #1854
Copy link
Copy link
Open
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releaseenhancementNew feature or requestNew feature or requestperformancePerformance related itemsPerformance related items
Milestone
Description
Summary
Implement configurable application-level rate limiting middleware to protect the gateway from traffic overload, ensure fair resource allocation, and provide graceful degradation under high load. This complements existing nginx rate and plugin rate limiting with application-aware controls.
Current State Analysis
What Already Exists
| Layer | Component | Scope | Status |
|---|---|---|---|
| Infrastructure | Nginx limit_req_zone |
Per-IP, global | Active |
| Infrastructure | Nginx limit_conn_zone |
Per-IP connections | Active |
| Plugin | RateLimiterPlugin |
Per-user/tenant/tool | Optional |
| Config | tool_rate_limit |
Tool invocations | Defined, partially used |
| Config | tool_concurrent_limit |
Concurrent tool calls | Defined, partially used |
Gaps Identified
| Gap | Impact |
|---|---|
| No application-level global rate limiting | All endpoints unprotected without nginx |
| No distributed rate limiting | Multi-replica deployments have no shared state |
| No per-user/API-key limits at REST layer | Users behind same IP share limits unfairly |
| No rate limit response headers | Clients cannot adapt to limits |
| No endpoint-specific limits | Critical endpoints (auth) have same limits as bulk operations |
| No adaptive/dynamic limits | No backpressure when system is under stress |
| No rate limit metrics | Cannot monitor or alert on rate limit events |
Proposed Solution
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ RATE LIMITING LAYERS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ │ Client │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ Layer 1: Infrastructure │
│ │ Nginx │ - Per-IP rate limiting │
│ │ Rate Limit │ - Connection limiting │
│ │ (3000 r/s) │ - DDoS protection │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ Layer 2: Application (NEW) │
│ │ FastAPI │ - Per-user/API-key limits │
│ │ Middleware │ - Per-endpoint limits │
│ │ (Global Rate │ - Redis-backed (distributed) │
│ │ Limiter) │ - Adaptive limits │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ Layer 3: Business Logic (Existing) │
│ │ Plugin │ - Per-tool limits │
│ │ Rate Limiter │ - Per-tenant limits │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Handler │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Implementation Components
-
Rate Limit Middleware (
mcpgateway/middleware/rate_limit.py)- Token bucket algorithm with Redis backend
- Per-user, per-API-key, and global rate limiting
- Endpoint-specific rate limits
- Adaptive rate limiting based on system health
-
Configuration Settings (
mcpgateway/config.py)RATE_LIMIT_ENABLED- Enable/disable rate limitingRATE_LIMIT_GLOBAL_RPS- Global requests per second (unauthenticated)RATE_LIMIT_USER_RPS- Per-user requests per secondRATE_LIMIT_APIKEY_RPS- Per-API-key requests per secondRATE_LIMIT_BURST_MULTIPLIER- Burst allowance factorRATE_LIMIT_BACKEND- Storage backend (redis/memory)
-
Rate Limit Response Headers
X-RateLimit-Limit- Maximum requests allowedX-RateLimit-Remaining- Requests remainingX-RateLimit-Reset- Unix timestamp when limit resetsRetry-After- Seconds until retry (on 429)
-
Prometheus Metrics
rate_limit_requests_total- Total requests by statusrate_limit_tokens_remaining- Token bucket state
Environment Variables
RATE_LIMIT_ENABLED=true
RATE_LIMIT_GLOBAL_RPS=1000
RATE_LIMIT_USER_RPS=100
RATE_LIMIT_APIKEY_RPS=200
RATE_LIMIT_BURST_MULTIPLIER=1.5
RATE_LIMIT_BACKEND=redis
RATE_LIMIT_ADAPTIVE_ENABLED=true
RATE_LIMIT_ADAPTIVE_THRESHOLD=0.8
RATE_LIMIT_ADAPTIVE_REDUCTION=0.5Defense in Depth
Layer 1: Nginx (Infrastructure)
├── Purpose: DDoS protection, connection limiting
├── Scope: Per-IP
├── Limit: 3000 r/s sustained, 6000 burst
└── When: Before request reaches application
Layer 2: Application Middleware (NEW)
├── Purpose: Fair usage, user/tenant isolation
├── Scope: Per-user, per-API-key, per-endpoint
├── Limit: Configurable (default 100/s per user)
└── When: After authentication, before handlers
Layer 3: Plugin (Existing, Optional)
├── Purpose: Business logic limits (per-tool, per-tenant)
├── Scope: Tool invocations, prompt fetches
├── Limit: Configured per-plugin
└── When: During tool/prompt operations
Files to Create
| File | Purpose |
|---|---|
mcpgateway/middleware/rate_limit.py |
Rate limiting middleware |
mcpgateway/middleware/__init__.py |
Middleware package init |
tests/unit/mcpgateway/middleware/test_rate_limit.py |
Unit tests |
tests/integration/test_rate_limiting.py |
Integration tests |
Files to Modify
| File | Changes |
|---|---|
mcpgateway/config.py |
Add rate limit configuration settings |
mcpgateway/main.py |
Register rate limit middleware |
.env.example |
Document rate limit environment variables |
docker-compose.yml |
Add default rate limit configuration |
charts/mcp-stack/values.yaml |
Add Helm values for rate limiting |
docs/docs/manage/configuration.md |
Document rate limiting configuration |
docs/docs/architecture/security-features.md |
Add rate limiting section |
Acceptance Criteria
- Rate limit middleware implemented with Redis backend
- Memory backend fallback for single-instance deployments
- Per-user, per-API-key, and global rate limiting working
- Standard rate limit headers in all responses
- 429 responses include
Retry-Afterheader - Prometheus metrics for rate limit events
- Adaptive rate limiting based on system load
- Configuration via environment variables
- Unit tests with >90% coverage
- Integration tests for multi-replica scenarios
- Documentation updated
- Performance impact < 1ms per request
Performance Considerations
- Redis round-trip adds ~0.5-1ms latency per request
- Memory backend has negligible latency
- Use Redis pipeline for atomic token bucket operations
- Consider local caching with short TTL to reduce Redis calls
- Skip rate limiting for health check endpoints
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releaseenhancementNew feature or requestNew feature or requestperformancePerformance related itemsPerformance related items