Skip to content

[TESTING][PLUGINS]: Test, harden, and document the Rate Limiter plugin #3740

@gandhipratik203

Description

@gandhipratik203

🔧 Summary

Deliver complete test coverage, load test validation, documentation, and hardening for the Rate Limiter plugin as part of #3735. The plugin currently has solid unit tests for core logic but lacks integration tests, edge case coverage, bypass resistance validation, and production-readiness hardening.


🧱 Area Affected

  • plugins/rate_limiter/
  • tests/unit/mcpgateway/plugins/plugins/rate_limiter/
  • tests/integration/test_rate_limiter.py

⚙️ Context / Rationale

Initial testing revealed that while the rate limiter's enforcement logic, multi-dimensional checking, and RFC-compliant headers are well-implemented, several production-readiness gaps exist:

Gap Impact
No shared state across instances In-memory _store is per-process — 3 gateway instances behind nginx give 3× effective limit, limits are never reliably enforced
No store eviction _store grows unboundedly with unique user/tool combinations — memory leak under sustained load
No thread safety dict not atomic under concurrent async access — counter can exceed configured limit
Fixed window burst A user can make 2× limit requests across a window boundary
prompt_pre_fetch ignores by_tool Tool-level limits configured in by_tool are silently skipped for prompt hooks
No observability No metrics or structured log events emitted on violations — throttled users are invisible in production
No per-server limits Cannot configure tool X on server Y: N/mserver_id dimension is missing

Test: 125 concurrent users, 60s, 3 gateway instances behind nginx, rate limiter set to 30/m in enforce mode.

Expected behaviour: significantly more failures (429s) as users exceed their 30/m limit.

Results:

Metric Python baseline With rate limiter (30/m, enforce)
RPS 512 904
Avg latency 167ms 74ms
Failures 24 (0.08%) 2 (0.00%)

⚠️ These results look better with the rate limiter on — that is the bug, not the fix.

Why: nginx round-robins requests across 3 gateway instances, each with its own independent _store. Each instance only sees ~1/3 of a user's traffic (~10 requests/instance over 60s), which never reaches the 30/m threshold on any single instance. No user was ever reliably blocked. The apparent improvement in RPS and latency is a side effect of occasional fast 429s (sub-millisecond) on whichever instance briefly hit its local counter — these inflate RPS and lower average latency without providing real enforcement.

Effective limit with 3 instances = 3 × 30 = 90/m, not 30/m.


📋 Acceptance Criteria

Testing

  • Unit tests covering all identified gaps (multi-instance isolation, store growth, concurrency, window burst, prompt hook by_tool dimension)
  • Edge case tests — empty user identity, None tenant, unicode user IDs, malformed rate strings
  • Graceful degradation test — plugin exception does not crash the gateway
  • Bypass resistance — tests prove each bypass vector exists and is documented

Load Testing

  • Latency overhead measured per hook (tool_pre_invoke, prompt_pre_fetch) under 1000 iterations via tests/performance/test_plugins_performance.py
  • Stress test results documented — 125 concurrent users, plugin active, single instance vs multi-instance comparison
  • Memory behavior confirmed — _store growth measured under sustained unique-user load

Documentation

  • Plugin README with description, configuration reference, known limitations, and examples
  • Limitations section explicitly documents in-memory constraints and multi-instance behaviour
  • docs/ updated if plugin behaviour affects system-level guarantees

Hardening

  • Input validation — malformed rate strings handled gracefully
  • Error handling — no unhandled exceptions propagating to caller
  • Logging — violations logged at appropriate level, no sensitive data (user identifiers) in log output
  • Configuration defaults reviewed — current defaults (60/m user, 600/m tenant) assessed for sane production behaviour

🧩 Additional Context

Metadata

Metadata

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingpluginstestingTesting (unit, e2e, manual, automated, etc)wxowxo integration

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions