fix(rate-limiter): shared state, eviction, thread safety, config validation#3750
fix(rate-limiter): shared state, eviction, thread safety, config validation#3750crivetimihai merged 5 commits intomainfrom
Conversation
|
Thanks @gandhipratik203. This is a thorough and well-documented fix — correctly addresses 7 of 8 gaps from #3740 with the Redis backend, eviction, thread safety, config validation, and PII fix. Test coverage is strong (70 unit + 16 integration + multi-instance load test). Two notes: (1) the |
Closes #3740 ## What changed ### Plugin fixes (plugins/rate_limiter/rate_limiter.py) - Config validation at __init__: _validate_config() parses all rate strings at startup — bad config raises immediately, not mid-request - Graceful degradation: both hooks wrapped in try/except; unexpected errors are logged and the request is allowed through (permissive) - prompt_pre_fetch now enforces by_tool limits using prompt_id as key - MemoryBackend: asyncio.Lock makes counter increments atomic - MemoryBackend: background TTL sweep evicts expired windows (0.5s interval) - RedisBackend: atomic INCR+EXPIRE via Lua script; shared state across all gateway instances; native TTL expiry; falls back to memory on error ### Test additions (tests/unit/.../test_rate_limiter.py) - Gap tests: 4 xfail -> pass (shared state, eviction, prompt by_tool, graceful degradation); 1 xfail remains (fixed window burst, deferred) - Edge case tests: malformed/unsupported config raises at init (not request time); runtime errors degrade gracefully via mock injection - Redis backend test uses injected FakeRedis — no live server required ### Config changes - plugins/config.yaml: RateLimiterPlugin enabled with enforce mode - tests/performance/plugins/config.yaml: RateLimiterPlugin set to permissive for inclusion in cProfile benchmark runs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…config.yaml Switch the default stack config from in-memory to Redis-backed rate limiting. This ensures the 30/m per-user limit is enforced as a true shared limit across all gateway instances rather than 30/m per process. Validated via Redis MONITOR: all 3 gateway instances atomically increment the same rl:user:<id>:60 counter via the Lua INCR+EXPIRE script. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…stance correctness Adds locustfile_rate_limiter.py and a make benchmark-rate-limiter target to demonstrate the multi-instance rate limit enforcement gap and its fix. The test sends 1 req/s (60 req/min = 2x the 30/m limit) through 3 gateway instances. With a memory backend each instance only sees ~20 req/min and never fires the limiter (~0% failures). With the Redis backend the shared counter reaches 30/min and blocks ~50% of requests — clearly showing the fix works across instances. Expected results: Memory backend: ~0% blocked (each instance sees 20 req/min < 30/m limit) Redis backend: ~50% blocked (shared counter: 60 req/min > 30/m limit) Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…and updated docs - Add 22 new unit tests (70 passed total, 4 xfailed): - Permissive vs enforce mode through PluginExecutor - Redis fallback: memory takeover when Redis is down, limit still enforced, no-fallback graceful degradation - Cross-tenant isolation: independent counters, no counter bleed between tenants - Header accuracy: Retry-After bounds, X-RateLimit-Reset future/consistency, Remaining decrement - Bypass resistance: None/whitespace user identity, tool name case sensitivity and whitespace (documented as xfail gaps) - PII: violation description must not contain user or tenant identifiers - Fix PII leak in violation description: remove user/tenant from description string in both prompt_pre_fetch and tool_pre_invoke — identifiers appeared in log output via permissive-mode manager warning and enforce-mode PluginViolationError message - Rewrite plugins/rate_limiter/README.md: was describing the old pre-fix implementation (in-memory only, no Redis, Redis as TODO). Now documents both backends, full config reference, response headers, examples, and accurate limitations table - Update plugin-manifest.yaml description to reflect Redis backend support Closes #3740 Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
… validation - Remove unused _allow() module-level function (dead code — plugin uses self._rate_backend.allow() directly) - Fix test_graceful_degradation test: was patching _allow() which is never called by the plugin; now patches backend.allow() via patch.object so the try/except error path is actually exercised - Add prompt_pre_fetch graceful degradation test (was only tested for tool_pre_invoke) - Fix inconsistent by_tool lookup in tool_pre_invoke: remove unnecessary hasattr(__contains__) guard, align with prompt_pre_fetch pattern - Add backend validation to _validate_config(): typo like 'reddis' now raises ValueError at startup instead of silently falling back to memory - Add test for malformed by_tool rate string validation - Add test for invalid backend name validation - Change default config mode from enforce to permissive for safety (consistent with all other security plugins in the default config) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
10e120c to
222ba22
Compare
crivetimihai
left a comment
There was a problem hiding this comment.
Review: Approved with fixes applied
Solid PR addressing real gaps in the rate limiter plugin. The Redis backend with atomic Lua INCR+EXPIRE, memory eviction sweep, asyncio.Lock, and config validation are all well-designed and consistent with the codebase's plugin patterns.
Fixes applied in review commit
Bug: Dead code removed — _allow() module-level function was never called by the plugin (which uses self._rate_backend.allow()). Removed.
Bug: False-positive test fixed — test_graceful_degradation_unexpected_runtime_error_does_not_crash_caller patched _allow() (never called), so the try/except error path was never exercised. The test passed vacuously. Fixed to use patch.object(plugin._rate_backend, "allow", ...) which actually triggers the error handling.
Bug: Inconsistent by_tool lookup — tool_pre_invoke had an unnecessary hasattr(by_tool_config, "__contains__") guard; prompt_pre_fetch didn't. Since by_tool is always Optional[Dict] from Pydantic, the hasattr check is redundant. Aligned both hooks.
Enhancement: Backend validation — A typo like backend: "reddis" silently fell back to memory with no error. Added validation in _validate_config() to fail fast at startup.
Config: Mode changed to permissive — All other security/safety plugins in the default plugins/config.yaml ship as permissive. Switching rate limiter to enforce would be inconsistent and could break existing deployments on update. Changed to permissive for safety — operators can opt into enforce.
Tests added — Added 3 new tests: prompt_pre_fetch graceful degradation, invalid backend validation, malformed by_tool rate validation. Coverage at 97% (remaining 3% is defensive code for no-event-loop and real Redis client init — covered by integration tests).
Verified
- 73 unit tests pass, 4 xfail (documented gaps)
- 1508 plugin tests pass with no regressions
- Architecture consistent with existing plugin patterns (Config model, init, error handling)
- No security concerns: PII removed from violation descriptions, fail-open on errors, no sensitive data in headers
- Redis Lua script is standard atomic INCR+EXPIRE pattern
- xfail tests properly document known gaps (sliding window, whitespace bypass, case sensitivity)
Summary
Addresses gaps documented in issue #3740 (sub-task of #3735). The rate limiter plugin had several correctness issues that made it silently ineffective in multi-instance deployments.
Gaps closed
Gap 1 (HIGH) — No shared state across instances: each gateway process had its own `_store` dict, making the effective per-user limit `N × configured_limit` where N = number of instances. Fixed with `RedisBackend` using an atomic Lua INCR+EXPIRE script — all instances share the same counter.
Gap 2 (MEDIUM) — No store eviction: the in-memory `_store` dict grew unboundedly as unique users accumulated. Fixed with a background asyncio sweep task in `MemoryBackend` that evicts expired windows every 0.5s.
Gap 3 (MEDIUM) — No thread safety: dict access was not atomic under concurrent asyncio tasks. Fixed with `asyncio.Lock` in `MemoryBackend.allow()`.
Gap 5 (LOW) — `prompt_pre_fetch` silently ignored `by_tool` config. Fixed by adding the same `by_tool` lookup to the prompt hook.
Gap 6 (MEDIUM) — Malformed config (`"abc/m"`) raised `ValueError` at request time. Fixed with `_validate_config()` called at `init` — fails fast at startup.
Gap 7 (LOW) — Unsupported rate unit (`"60/d"`) raised `ValueError` at request time. Fixed by `_parse_rate()` raising early during `_validate_config()`.
Gap 8 (MEDIUM) — Any unexpected exception in the plugin propagated to the caller. Fixed with `try/except Exception` wrapping both hooks; logs and returns permissive result on unexpected error.
Gap 4 (sliding window) intentionally deferred — the fixed window burst-at-boundary is a known trade-off. The `test_fixed_window_burst_at_boundary` xfail test documents the gap.
Additional hardening
PII fix — violation descriptions previously included `user` and `tenant` identifiers in the human-readable string (e.g. `"Rate limit exceeded for tool search, user alice@example.com, or tenant acme-corp"`). This string is logged directly in permissive mode and embedded in `PluginViolationError` messages in enforce mode. Fixed to only include the tool/prompt name — structured metadata remains in the `details` dict.
Bypass resistance documented — three bypass vectors identified and tested:
README rewritten — the previous README described the old pre-fix implementation (in-memory only, Redis listed as a TODO). Now documents both backends, full configuration reference, response headers, multi-instance deployment guidance, and an accurate limitations table.
Architecture
Test results
Unit tests: 70 passed, 4 xfailed
Integration tests: 16/16 passed (`tests/integration/test_rate_limiter.py`, `--with-integration`)
Plugin framework tests: 53/53 passed
Performance profiling (`tests/performance/test_plugins_performance.py`, 1000 iterations):
Redis adds ~0.075ms per hook call — well under 1ms.
Multi-instance load test (`make benchmark-rate-limiter`, 3 gateway instances, 1 user at 60 req/min = 2× the 30/m limit, 120s):
*40% observed vs ~50% theoretical: the first 30 requests in each 60-second window are always allowed before blocking starts. Over a longer run it converges toward ~50%.
Redis MONITOR confirmed all 3 instances write to the same atomic counter:
```
[172.18.0.8] EVAL "rl:user:admin@example.com:60" "60"
[172.18.0.9] EVAL "rl:user:admin@example.com:60" "60"
[172.18.0.10] EVAL "rl:user:admin@example.com:60" "60"
```
Closes #3740