[TESTING][PLUGINS]: Test, harden, and document the Rate Limiter plugin

### 🔧 Summary

Deliver complete test coverage, load test validation, documentation, and hardening for the Rate Limiter plugin as part of #3735. The plugin currently has solid unit tests for core logic but lacks integration tests, edge case coverage, bypass resistance validation, and production-readiness hardening.

---

### 🧱 Area Affected

- `plugins/rate_limiter/`
- `tests/unit/mcpgateway/plugins/plugins/rate_limiter/`
- `tests/integration/test_rate_limiter.py`

---

### ⚙️ Context / Rationale

Initial testing revealed that while the rate limiter's enforcement logic, multi-dimensional checking, and RFC-compliant headers are well-implemented, several production-readiness gaps exist:

| Gap | Impact |
|---|---|
| No shared state across instances | In-memory `_store` is per-process — 3 gateway instances behind nginx give `3× effective limit`, limits are never reliably enforced |
| No store eviction | `_store` grows unboundedly with unique user/tool combinations — memory leak under sustained load |
| No thread safety | `dict` not atomic under concurrent async access — counter can exceed configured limit |
| Fixed window burst | A user can make `2× limit` requests across a window boundary |
| `prompt_pre_fetch` ignores `by_tool` | Tool-level limits configured in `by_tool` are silently skipped for prompt hooks |
| No observability | No metrics or structured log events emitted on violations — throttled users are invisible in production |
| No per-server limits | Cannot configure `tool X on server Y: N/m` — `server_id` dimension is missing |

**Test:** 125 concurrent users, 60s, 3 gateway instances behind nginx, rate limiter set to `30/m` in `enforce` mode.

**Expected behaviour:** significantly more failures (429s) as users exceed their 30/m limit.

**Results:**

| Metric | Python baseline | With rate limiter (30/m, enforce) |
|---|---|---|
| RPS | 512 | 904 |
| Avg latency | 167ms | 74ms |
| Failures | 24 (0.08%) | 2 (0.00%) |

⚠️ These results look better with the rate limiter on — that is the bug, not the fix.

**Why:** nginx round-robins requests across 3 gateway instances, each with its own independent `_store`. Each instance only sees ~1/3 of a user's traffic (~10 requests/instance over 60s), which never reaches the `30/m` threshold on any single instance. No user was ever reliably blocked. The apparent improvement in RPS and latency is a side effect of occasional fast 429s (sub-millisecond) on whichever instance briefly hit its local counter — these inflate RPS and lower average latency without providing real enforcement.

**Effective limit with 3 instances = 3 × 30 = 90/m, not 30/m.**

---

### 📋 Acceptance Criteria

#### Testing
- [ ] Unit tests covering all identified gaps (multi-instance isolation, store growth, concurrency, window burst, prompt hook `by_tool` dimension)
- [ ] Edge case tests — empty user identity, `None` tenant, unicode user IDs, malformed rate strings
- [ ] Graceful degradation test — plugin exception does not crash the gateway
- [ ] Bypass resistance — tests prove each bypass vector exists and is documented

#### Load Testing
- [ ] Latency overhead measured per hook (`tool_pre_invoke`, `prompt_pre_fetch`) under 1000 iterations via `tests/performance/test_plugins_performance.py`
- [ ] Stress test results documented — 125 concurrent users, plugin active, single instance vs multi-instance comparison
- [ ] Memory behavior confirmed — `_store` growth measured under sustained unique-user load

#### Documentation
- [ ] Plugin README with description, configuration reference, known limitations, and examples
- [ ] Limitations section explicitly documents in-memory constraints and multi-instance behaviour
- [ ] `docs/` updated if plugin behaviour affects system-level guarantees

#### Hardening
- [ ] Input validation — malformed rate strings handled gracefully
- [ ] Error handling — no unhandled exceptions propagating to caller
- [ ] Logging — violations logged at appropriate level, no sensitive data (user identifiers) in log output
- [ ] Configuration defaults reviewed — current defaults (`60/m` user, `600/m` tenant) assessed for sane production behaviour

---

### 🧩 Additional Context

- Part of #3735
- Rate Limiter is Priority 2 in #3735 — basic integration tests and documentation are the minimum bar before GA
- The multi-instance shared state gap is the highest priority fix — Redis is already available in the stack and is the recommended solution
- Manual test plan for gaps that cannot be covered by automated tests (observability, per-server limits, multi-instance Docker proof) to be documented in the PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTING][PLUGINS]: Test, harden, and document the Rate Limiter plugin #3740

🔧 Summary

🧱 Area Affected

⚙️ Context / Rationale

📋 Acceptance Criteria

Testing

Load Testing

Documentation

Hardening

🧩 Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gap	Impact
No shared state across instances	In-memory `_store` is per-process — 3 gateway instances behind nginx give `3× effective limit`, limits are never reliably enforced
No store eviction	`_store` grows unboundedly with unique user/tool combinations — memory leak under sustained load
No thread safety	`dict` not atomic under concurrent async access — counter can exceed configured limit
Fixed window burst	A user can make `2× limit` requests across a window boundary
`prompt_pre_fetch` ignores `by_tool`	Tool-level limits configured in `by_tool` are silently skipped for prompt hooks
No observability	No metrics or structured log events emitted on violations — throttled users are invisible in production
No per-server limits	Cannot configure `tool X on server Y: N/m` — `server_id` dimension is missing

[TESTING][PLUGINS]: Test, harden, and document the Rate Limiter plugin #3740

Description

🔧 Summary

🧱 Area Affected

⚙️ Context / Rationale

📋 Acceptance Criteria

Testing

Load Testing

Documentation

Hardening

🧩 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions