[PERFORMANCE]: Global Rate Limiting for Gateway Protection

## Summary

Implement configurable application-level rate limiting middleware to protect the gateway from traffic overload, ensure fair resource allocation, and provide graceful degradation under high load. This complements existing nginx rate and plugin rate limiting with application-aware controls.

## Current State Analysis

### What Already Exists

| Layer | Component | Scope | Status |
|-------|-----------|-------|--------|
| **Infrastructure** | Nginx `limit_req_zone` | Per-IP, global | Active |
| **Infrastructure** | Nginx `limit_conn_zone` | Per-IP connections | Active |
| **Plugin** | `RateLimiterPlugin` | Per-user/tenant/tool | Optional |
| **Config** | `tool_rate_limit` | Tool invocations | Defined, partially used |
| **Config** | `tool_concurrent_limit` | Concurrent tool calls | Defined, partially used |

### Gaps Identified

| Gap | Impact |
|-----|--------|
| No application-level global rate limiting | All endpoints unprotected without nginx |
| No distributed rate limiting | Multi-replica deployments have no shared state |
| No per-user/API-key limits at REST layer | Users behind same IP share limits unfairly |
| No rate limit response headers | Clients cannot adapt to limits |
| No endpoint-specific limits | Critical endpoints (auth) have same limits as bulk operations |
| No adaptive/dynamic limits | No backpressure when system is under stress |
| No rate limit metrics | Cannot monitor or alert on rate limit events |

## Proposed Solution

### Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         RATE LIMITING LAYERS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────┐                                                       │
│   │     Client      │                                                       │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 1: Infrastructure                           │
│   │      Nginx      │     - Per-IP rate limiting                            │
│   │   Rate Limit    │     - Connection limiting                             │
│   │   (3000 r/s)    │     - DDoS protection                                 │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 2: Application (NEW)                        │
│   │    FastAPI      │     - Per-user/API-key limits                         │
│   │   Middleware    │     - Per-endpoint limits                             │
│   │  (Global Rate   │     - Redis-backed (distributed)                      │
│   │    Limiter)     │     - Adaptive limits                                 │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐     Layer 3: Business Logic (Existing)                │
│   │    Plugin       │     - Per-tool limits                                 │
│   │  Rate Limiter   │     - Per-tenant limits                               │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │    Handler      │                                                       │
│   └─────────────────┘                                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Implementation Components

1. **Rate Limit Middleware** (`mcpgateway/middleware/rate_limit.py`)
   - Token bucket algorithm with Redis backend
   - Per-user, per-API-key, and global rate limiting
   - Endpoint-specific rate limits
   - Adaptive rate limiting based on system health

2. **Configuration Settings** (`mcpgateway/config.py`)
   - `RATE_LIMIT_ENABLED` - Enable/disable rate limiting
   - `RATE_LIMIT_GLOBAL_RPS` - Global requests per second (unauthenticated)
   - `RATE_LIMIT_USER_RPS` - Per-user requests per second
   - `RATE_LIMIT_APIKEY_RPS` - Per-API-key requests per second
   - `RATE_LIMIT_BURST_MULTIPLIER` - Burst allowance factor
   - `RATE_LIMIT_BACKEND` - Storage backend (redis/memory)

3. **Rate Limit Response Headers**
   - `X-RateLimit-Limit` - Maximum requests allowed
   - `X-RateLimit-Remaining` - Requests remaining
   - `X-RateLimit-Reset` - Unix timestamp when limit resets
   - `Retry-After` - Seconds until retry (on 429)

4. **Prometheus Metrics**
   - `rate_limit_requests_total` - Total requests by status
   - `rate_limit_tokens_remaining` - Token bucket state

### Environment Variables

```bash
RATE_LIMIT_ENABLED=true
RATE_LIMIT_GLOBAL_RPS=1000
RATE_LIMIT_USER_RPS=100
RATE_LIMIT_APIKEY_RPS=200
RATE_LIMIT_BURST_MULTIPLIER=1.5
RATE_LIMIT_BACKEND=redis
RATE_LIMIT_ADAPTIVE_ENABLED=true
RATE_LIMIT_ADAPTIVE_THRESHOLD=0.8
RATE_LIMIT_ADAPTIVE_REDUCTION=0.5
```

### Defense in Depth

```
Layer 1: Nginx (Infrastructure)
├── Purpose: DDoS protection, connection limiting
├── Scope: Per-IP
├── Limit: 3000 r/s sustained, 6000 burst
└── When: Before request reaches application

Layer 2: Application Middleware (NEW)
├── Purpose: Fair usage, user/tenant isolation
├── Scope: Per-user, per-API-key, per-endpoint
├── Limit: Configurable (default 100/s per user)
└── When: After authentication, before handlers

Layer 3: Plugin (Existing, Optional)
├── Purpose: Business logic limits (per-tool, per-tenant)
├── Scope: Tool invocations, prompt fetches
├── Limit: Configured per-plugin
└── When: During tool/prompt operations
```

## Files to Create

| File | Purpose |
|------|---------|
| `mcpgateway/middleware/rate_limit.py` | Rate limiting middleware |
| `mcpgateway/middleware/__init__.py` | Middleware package init |
| `tests/unit/mcpgateway/middleware/test_rate_limit.py` | Unit tests |
| `tests/integration/test_rate_limiting.py` | Integration tests |

## Files to Modify

| File | Changes |
|------|---------|
| `mcpgateway/config.py` | Add rate limit configuration settings |
| `mcpgateway/main.py` | Register rate limit middleware |
| `.env.example` | Document rate limit environment variables |
| `docker-compose.yml` | Add default rate limit configuration |
| `charts/mcp-stack/values.yaml` | Add Helm values for rate limiting |
| `docs/docs/manage/configuration.md` | Document rate limiting configuration |
| `docs/docs/architecture/security-features.md` | Add rate limiting section |

## Acceptance Criteria

- [ ] Rate limit middleware implemented with Redis backend
- [ ] Memory backend fallback for single-instance deployments
- [ ] Per-user, per-API-key, and global rate limiting working
- [ ] Standard rate limit headers in all responses
- [ ] 429 responses include `Retry-After` header
- [ ] Prometheus metrics for rate limit events
- [ ] Adaptive rate limiting based on system load
- [ ] Configuration via environment variables
- [ ] Unit tests with >90% coverage
- [ ] Integration tests for multi-replica scenarios
- [ ] Documentation updated
- [ ] Performance impact < 1ms per request

## Performance Considerations

- Redis round-trip adds ~0.5-1ms latency per request
- Memory backend has negligible latency
- Use Redis pipeline for atomic token bucket operations
- Consider local caching with short TTL to reduce Redis calls
- Skip rate limiting for health check endpoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFORMANCE]: Global Rate Limiting for Gateway Protection #1854

Summary

Current State Analysis

What Already Exists

Gaps Identified

Proposed Solution

Architecture

Implementation Components

Environment Variables

Defense in Depth

Files to Create

Files to Modify

Acceptance Criteria

Performance Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Layer	Component	Scope	Status
Infrastructure	Nginx `limit_req_zone`	Per-IP, global	Active
Infrastructure	Nginx `limit_conn_zone`	Per-IP connections	Active
Plugin	`RateLimiterPlugin`	Per-user/tenant/tool	Optional
Config	`tool_rate_limit`	Tool invocations	Defined, partially used
Config	`tool_concurrent_limit`	Concurrent tool calls	Defined, partially used

Gap	Impact
No application-level global rate limiting	All endpoints unprotected without nginx
No distributed rate limiting	Multi-replica deployments have no shared state
No per-user/API-key limits at REST layer	Users behind same IP share limits unfairly
No rate limit response headers	Clients cannot adapt to limits
No endpoint-specific limits	Critical endpoints (auth) have same limits as bulk operations
No adaptive/dynamic limits	No backpressure when system is under stress
No rate limit metrics	Cannot monitor or alert on rate limit events

File	Purpose
`mcpgateway/middleware/rate_limit.py`	Rate limiting middleware
`mcpgateway/middleware/__init__.py`	Middleware package init
`tests/unit/mcpgateway/middleware/test_rate_limit.py`	Unit tests
`tests/integration/test_rate_limiting.py`	Integration tests

File	Changes
`mcpgateway/config.py`	Add rate limit configuration settings
`mcpgateway/main.py`	Register rate limit middleware
`.env.example`	Document rate limit environment variables
`docker-compose.yml`	Add default rate limit configuration
`charts/mcp-stack/values.yaml`	Add Helm values for rate limiting
`docs/docs/manage/configuration.md`	Document rate limiting configuration
`docs/docs/architecture/security-features.md`	Add rate limiting section

[PERFORMANCE]: Global Rate Limiting for Gateway Protection #1854

Description

Summary

Current State Analysis

What Already Exists

Gaps Identified

Proposed Solution

Architecture

Implementation Components

Environment Variables

Defense in Depth

Files to Create

Files to Modify

Acceptance Criteria

Performance Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions