Skip to content

[PERFORMANCE]: Global Rate Limiting for Gateway Protection #1854

@crivetimihai

Description

@crivetimihai

Summary

Implement configurable application-level rate limiting middleware to protect the gateway from traffic overload, ensure fair resource allocation, and provide graceful degradation under high load. This complements existing nginx rate and plugin rate limiting with application-aware controls.

Current State Analysis

What Already Exists

Layer Component Scope Status
Infrastructure Nginx limit_req_zone Per-IP, global Active
Infrastructure Nginx limit_conn_zone Per-IP connections Active
Plugin RateLimiterPlugin Per-user/tenant/tool Optional
Config tool_rate_limit Tool invocations Defined, partially used
Config tool_concurrent_limit Concurrent tool calls Defined, partially used

Gaps Identified

Gap Impact
No application-level global rate limiting All endpoints unprotected without nginx
No distributed rate limiting Multi-replica deployments have no shared state
No per-user/API-key limits at REST layer Users behind same IP share limits unfairly
No rate limit response headers Clients cannot adapt to limits
No endpoint-specific limits Critical endpoints (auth) have same limits as bulk operations
No adaptive/dynamic limits No backpressure when system is under stress
No rate limit metrics Cannot monitor or alert on rate limit events

Proposed Solution

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RATE LIMITING LAYERS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────┐                                                       │
│   │     Client      │                                                       │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 1: Infrastructure                           │
│   │      Nginx      │     - Per-IP rate limiting                            │
│   │   Rate Limit    │     - Connection limiting                             │
│   │   (3000 r/s)    │     - DDoS protection                                 │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 2: Application (NEW)                        │
│   │    FastAPI      │     - Per-user/API-key limits                         │
│   │   Middleware    │     - Per-endpoint limits                             │
│   │  (Global Rate   │     - Redis-backed (distributed)                      │
│   │    Limiter)     │     - Adaptive limits                                 │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 3: Business Logic (Existing)                │
│   │    Plugin       │     - Per-tool limits                                 │
│   │  Rate Limiter   │     - Per-tenant limits                               │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │    Handler      │                                                       │
│   └─────────────────┘                                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation Components

  1. Rate Limit Middleware (mcpgateway/middleware/rate_limit.py)

    • Token bucket algorithm with Redis backend
    • Per-user, per-API-key, and global rate limiting
    • Endpoint-specific rate limits
    • Adaptive rate limiting based on system health
  2. Configuration Settings (mcpgateway/config.py)

    • RATE_LIMIT_ENABLED - Enable/disable rate limiting
    • RATE_LIMIT_GLOBAL_RPS - Global requests per second (unauthenticated)
    • RATE_LIMIT_USER_RPS - Per-user requests per second
    • RATE_LIMIT_APIKEY_RPS - Per-API-key requests per second
    • RATE_LIMIT_BURST_MULTIPLIER - Burst allowance factor
    • RATE_LIMIT_BACKEND - Storage backend (redis/memory)
  3. Rate Limit Response Headers

    • X-RateLimit-Limit - Maximum requests allowed
    • X-RateLimit-Remaining - Requests remaining
    • X-RateLimit-Reset - Unix timestamp when limit resets
    • Retry-After - Seconds until retry (on 429)
  4. Prometheus Metrics

    • rate_limit_requests_total - Total requests by status
    • rate_limit_tokens_remaining - Token bucket state

Environment Variables

RATE_LIMIT_ENABLED=true
RATE_LIMIT_GLOBAL_RPS=1000
RATE_LIMIT_USER_RPS=100
RATE_LIMIT_APIKEY_RPS=200
RATE_LIMIT_BURST_MULTIPLIER=1.5
RATE_LIMIT_BACKEND=redis
RATE_LIMIT_ADAPTIVE_ENABLED=true
RATE_LIMIT_ADAPTIVE_THRESHOLD=0.8
RATE_LIMIT_ADAPTIVE_REDUCTION=0.5

Defense in Depth

Layer 1: Nginx (Infrastructure)
├── Purpose: DDoS protection, connection limiting
├── Scope: Per-IP
├── Limit: 3000 r/s sustained, 6000 burst
└── When: Before request reaches application

Layer 2: Application Middleware (NEW)
├── Purpose: Fair usage, user/tenant isolation
├── Scope: Per-user, per-API-key, per-endpoint
├── Limit: Configurable (default 100/s per user)
└── When: After authentication, before handlers

Layer 3: Plugin (Existing, Optional)
├── Purpose: Business logic limits (per-tool, per-tenant)
├── Scope: Tool invocations, prompt fetches
├── Limit: Configured per-plugin
└── When: During tool/prompt operations

Files to Create

File Purpose
mcpgateway/middleware/rate_limit.py Rate limiting middleware
mcpgateway/middleware/__init__.py Middleware package init
tests/unit/mcpgateway/middleware/test_rate_limit.py Unit tests
tests/integration/test_rate_limiting.py Integration tests

Files to Modify

File Changes
mcpgateway/config.py Add rate limit configuration settings
mcpgateway/main.py Register rate limit middleware
.env.example Document rate limit environment variables
docker-compose.yml Add default rate limit configuration
charts/mcp-stack/values.yaml Add Helm values for rate limiting
docs/docs/manage/configuration.md Document rate limiting configuration
docs/docs/architecture/security-features.md Add rate limiting section

Acceptance Criteria

  • Rate limit middleware implemented with Redis backend
  • Memory backend fallback for single-instance deployments
  • Per-user, per-API-key, and global rate limiting working
  • Standard rate limit headers in all responses
  • 429 responses include Retry-After header
  • Prometheus metrics for rate limit events
  • Adaptive rate limiting based on system load
  • Configuration via environment variables
  • Unit tests with >90% coverage
  • Integration tests for multi-replica scenarios
  • Documentation updated
  • Performance impact < 1ms per request

Performance Considerations

  • Redis round-trip adds ~0.5-1ms latency per request
  • Memory backend has negligible latency
  • Use Redis pipeline for atomic token bucket operations
  • Consider local caching with short TTL to reduce Redis calls
  • Skip rate limiting for health check endpoints

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseenhancementNew feature or requestperformancePerformance related items

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions