-
Notifications
You must be signed in to change notification settings - Fork 615
[TESTING][PLUGINS]: Test, harden, and document the Rate Limiter plugin #3740
Description
🔧 Summary
Deliver complete test coverage, load test validation, documentation, and hardening for the Rate Limiter plugin as part of #3735. The plugin currently has solid unit tests for core logic but lacks integration tests, edge case coverage, bypass resistance validation, and production-readiness hardening.
🧱 Area Affected
plugins/rate_limiter/tests/unit/mcpgateway/plugins/plugins/rate_limiter/tests/integration/test_rate_limiter.py
⚙️ Context / Rationale
Initial testing revealed that while the rate limiter's enforcement logic, multi-dimensional checking, and RFC-compliant headers are well-implemented, several production-readiness gaps exist:
| Gap | Impact |
|---|---|
| No shared state across instances | In-memory _store is per-process — 3 gateway instances behind nginx give 3× effective limit, limits are never reliably enforced |
| No store eviction | _store grows unboundedly with unique user/tool combinations — memory leak under sustained load |
| No thread safety | dict not atomic under concurrent async access — counter can exceed configured limit |
| Fixed window burst | A user can make 2× limit requests across a window boundary |
prompt_pre_fetch ignores by_tool |
Tool-level limits configured in by_tool are silently skipped for prompt hooks |
| No observability | No metrics or structured log events emitted on violations — throttled users are invisible in production |
| No per-server limits | Cannot configure tool X on server Y: N/m — server_id dimension is missing |
Test: 125 concurrent users, 60s, 3 gateway instances behind nginx, rate limiter set to 30/m in enforce mode.
Expected behaviour: significantly more failures (429s) as users exceed their 30/m limit.
Results:
| Metric | Python baseline | With rate limiter (30/m, enforce) |
|---|---|---|
| RPS | 512 | 904 |
| Avg latency | 167ms | 74ms |
| Failures | 24 (0.08%) | 2 (0.00%) |
Why: nginx round-robins requests across 3 gateway instances, each with its own independent _store. Each instance only sees ~1/3 of a user's traffic (~10 requests/instance over 60s), which never reaches the 30/m threshold on any single instance. No user was ever reliably blocked. The apparent improvement in RPS and latency is a side effect of occasional fast 429s (sub-millisecond) on whichever instance briefly hit its local counter — these inflate RPS and lower average latency without providing real enforcement.
Effective limit with 3 instances = 3 × 30 = 90/m, not 30/m.
📋 Acceptance Criteria
Testing
- Unit tests covering all identified gaps (multi-instance isolation, store growth, concurrency, window burst, prompt hook
by_tooldimension) - Edge case tests — empty user identity,
Nonetenant, unicode user IDs, malformed rate strings - Graceful degradation test — plugin exception does not crash the gateway
- Bypass resistance — tests prove each bypass vector exists and is documented
Load Testing
- Latency overhead measured per hook (
tool_pre_invoke,prompt_pre_fetch) under 1000 iterations viatests/performance/test_plugins_performance.py - Stress test results documented — 125 concurrent users, plugin active, single instance vs multi-instance comparison
- Memory behavior confirmed —
_storegrowth measured under sustained unique-user load
Documentation
- Plugin README with description, configuration reference, known limitations, and examples
- Limitations section explicitly documents in-memory constraints and multi-instance behaviour
-
docs/updated if plugin behaviour affects system-level guarantees
Hardening
- Input validation — malformed rate strings handled gracefully
- Error handling — no unhandled exceptions propagating to caller
- Logging — violations logged at appropriate level, no sensitive data (user identifiers) in log output
- Configuration defaults reviewed — current defaults (
60/muser,600/mtenant) assessed for sane production behaviour
🧩 Additional Context
- Part of [CHORE][PLUGINS]: Test, load test, document, and harden security and resilience plugins #3735
- Rate Limiter is Priority 2 in [CHORE][PLUGINS]: Test, load test, document, and harden security and resilience plugins #3735 — basic integration tests and documentation are the minimum bar before GA
- The multi-instance shared state gap is the highest priority fix — Redis is already available in the stack and is the recommended solution
- Manual test plan for gaps that cannot be covered by automated tests (observability, per-server limits, multi-instance Docker proof) to be documented in the PR