[FIX][AUTH]: Cross-replica cache invalidation for auth caches by marekdano · Pull Request #3309 · IBM/mcp-context-forge

marekdano · 2026-02-27T09:35:45Z

🐛 Bug-fix PR

📌 Summary

AuthCache published invalidation messages to mcpgw:auth:invalidate, but CacheInvalidationSubscriber only listened on mcpgw:cache:invalidate. In multi-replica deployments, when one replica invalidated auth data (e.g., after a role change or token revocation), other replicas never received the message and continued serving stale in-memory auth caches.

💡 Fix Description

Subscribe to both channels — Added _channels list containing both mcpgw:cache:invalidate and mcpgw:auth:invalidate. Updated start() to subscribe to all channels via await self._pubsub.subscribe(*self._channels) and stop() to unsubscribe from all.
Handle auth invalidation messages — Added 7 new message handlers in _process_invalidation():
- user:{email} — clears user cache, context cache, and team cache entries for the user
- revoke:{jti} — adds JTI to revoked set, clears revocation cache and matching context cache entries
- team:{email} — clears team cache and context cache for the user
- role:{email}:{team_id} — clears specific role cache entry
- team_roles:{team_id} — clears all role cache entries for a team
- teams:{email} — clears teams list cache entries for the user
- membership:{email} — clears team membership cache entries for the user

The startswith check ordering ensures team_roles: and teams: are matched before team:.

Tests: `tests/unit/mcpgateway/cache/test_cache_invalidation_subscriber.py`

Updated test_subscriber_initialization to assert _channels includes both channels
Fixed FakePubSub.subscribe/unsubscribe signatures to accept *_channels (varargs)
Added create_mock_auth_cache() helper
Added TestAuthCacheInvalidationSubscriber class with 11 tests covering all auth invalidation message types, channel subscription verification, and an end-to-end regression test

🧪 Verification

Check	Command	Status
Lint suite	`make lint`	✅
Unit tests	`make test`	✅
Coverage ≥ 80 %	`make coverage`	✅
Manual regression no longer fails	steps / screenshots

📐 MCP Compliance (if relevant)

Matches current MCP spec
No breaking change to MCP clients

✅ Checklist

Code formatted (make black isort pre-commit)
No secrets/credentials committed

crivetimihai · 2026-02-27T11:36:06Z

Thanks @marekdano. Critical fix for multi-replica deployments — the channel mismatch (mcpgw:auth:invalidate vs mcpgw:cache:invalidate) meant auth cache changes weren't propagated across replicas. The 7 new message handlers cover the key invalidation scenarios. Good test coverage.

jonpspri

Review: Approve

Reviewed the full diff (6 commits after rebase onto main), ran all linters and tests.

What was checked

Check	Result
Black / isort	Clean
flake8	Clean
pylint	10.00/10
bandit	0 issues
Unit tests (49)	All pass

Assessment

Core fix is correct. AuthCache publishes to mcpgw:auth:invalidate but the subscriber only listened on mcpgw:cache:invalidate — the channel mismatch meant auth invalidations never propagated across replicas. Subscribing to both channels and adding the 7 auth message handlers resolves the gap.

Security:

Channel-origin guard ensures auth-prefixed messages are only accepted from mcpgw:auth:invalidate, preventing spoofing via the cache channel.
Final commit closes a subtle bypass where channel="" (empty/falsy) would short-circuit the guard — now rejects unless channel is exactly mcpgw:auth:invalidate.
_MAX_REVOKED_JTIS cap (100K) prevents unbounded memory growth from pubsub flooding.

Correctness:

All 7 message types published by auth_cache (user:, revoke:, team:, role:, team_roles:, teams:, membership:) have matching handlers.
_AUTH_PREFIXES tuple orders team_roles: and teams: before team: to prevent prefix collision in startswith().
threading.Lock usage in sync _process_auth_invalidation called from async context is safe — only fast dict ops, no await while held.

Test coverage:

11 new tests covering all 7 handler types, channel guard deny/accept paths, _MAX_REVOKED_JTIS cap, prefix collision regressions, empty identifiers, and identifiers with colons.
End-to-end cross-replica regression test included.

Signed-off-by: Marek Dano <Marek.Dano@ibm.com>

- Extract _process_auth_invalidation() from _process_invalidation() to reduce method complexity (pylint too-many-statements/locals) - Add _evict_keys() static helper to deduplicate key-eviction loops - Remove dead _channel backward-compat attribute - Run Black on test file to fix pre-existing formatting violations Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: Jonathan Springer <jps@s390x.com>

- Cap _revoked_jtis at 100k entries to prevent unbounded memory growth from compromised Redis pubsub channels (DoS mitigation) - Fix FakePubSub.subscribe signatures in two exception-path tests to accept varargs, matching the production *self._channels call - Add prefix-collision regression tests proving teams:/team_roles: messages don't fall through to the team: handler - Add edge-case tests for empty identifiers, colon-containing identifiers, and unusual email formats Signed-off-by: Jonathan Springer <jps@s390x.com>

Auth-prefixed invalidation messages (user:, revoke:, team:, etc.) are now only accepted when they arrive on the mcpgw:auth:invalidate channel. Messages with auth prefixes on other channels are logged and ignored, preventing cross-channel confusion. Also fixes the _MAX_REVOKED_JTIS docstring to reference the correct method name (_load_revoked_jtis in auth_cache). Signed-off-by: Jonathan Springer <jps@s390x.com>

The previous guard `if channel and channel != "mcpgw:auth:invalidate"` short-circuited to False when channel was empty, allowing auth messages to be processed without channel verification. Changed to `if channel != "mcpgw:auth:invalidate"` so auth messages are only accepted from the correct channel regardless of how the method is called. Signed-off-by: Jonathan Springer <jps@s390x.com>

Add 3 tests that exercise the full _listen_loop -> _process_invalidation path with channel data from fake pubsub messages: - test_listen_loop_forwards_channel_to_process_invalidation: verifies channel kwarg is forwarded from pubsub message to _process_invalidation - test_listen_loop_auth_message_evicts_cache_end_to_end: full path from pubsub message through auth cache eviction - test_listen_loop_auth_message_rejected_on_wrong_channel_end_to_end: deny-path proving channel guard works through the listener Also fixes _MAX_REVOKED_JTIS docstring to accurately describe the reconciliation path (Redis L2 fallback on L1 miss, not the unused sync_revoked_tokens method). Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

The if/elif chain comment said "Order matters: team_roles/teams must be checked before team" but there is no actual startswith() prefix collision between these prefixes (teams: != team:, team_roles: != team:). Reword the comment to describe correct dispatch and rename the two tests from "prefix_collision" to "dispatches_to_correct_handler" so they describe what they actually verify. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

crivetimihai

Review: Approve

Rebased onto current main (clean, no conflicts), reviewed the full implementation, ran linting, tests, and verified on a live 3-replica deployment with Redis.

Verification

Check	Result
Black / isort / flake8 / bandit	Clean
Unit tests (52, up from 49)	All pass
Full cache module tests (440)	All pass
Live deployment (3 replicas + Redis + nginx LB)	All endpoints healthy
Redis pubsub `mcpgw:auth:invalidate` subscribers	72 (matches `mcpgw:cache:invalidate`)
All 7 auth invalidation message types via Redis	72 subscribers received each
Channel-origin guard (auth msgs on wrong channel)	Correctly rejected
Admin UI (login, dashboard, tools, tokens)	Working

Changes made during review (2 commits)

1. test: add listener-path regression tests for channel forwarding

Added 3 tests that exercise the full _listen_loop → _process_invalidation path with channel data from fake pubsub messages, closing a coverage gap identified during review:

test_listen_loop_forwards_channel_to_process_invalidation — verifies channel kwarg is forwarded from pubsub message
test_listen_loop_auth_message_evicts_cache_end_to_end — full path from pubsub message through auth cache eviction
test_listen_loop_auth_message_rejected_on_wrong_channel_end_to_end — deny-path proving channel guard works through the listener

Also fixed _MAX_REVOKED_JTIS docstring to accurately describe the reconciliation path (Redis L2 fallback on L1 miss via is_token_revoked() / get_auth_context(), not the unused sync_revoked_tokens() method).

2. refactor: clarify auth dispatch comment and test names

The comment "Order matters: team_roles/teams must be checked before team" was misleading — there is no actual startswith() prefix collision between teams:, team_roles:, and team: (the 5th character differs: s/_ vs :). Reworded the comment and renamed two test methods from prefix_collision to dispatches_to_correct_handler.

Assessment

Core fix is correct. All 7 publish calls in auth_cache.py have matching handlers that replicate the exact L1 eviction logic.
Security is sound. Channel-origin guard rejects auth messages on wrong channel; empty/missing channel fails closed; _MAX_REVOKED_JTIS caps memory growth.
Thread safety matches existing patterns. Short lock scopes with no await under lock.
Test coverage is comprehensive. 52 tests including listener-path, channel guard deny/accept, cap enforcement, dispatch correctness, edge cases, and end-to-end regression.

Note for follow-up

invalidate_team() in auth_cache.py:508 does _team_cache.pop(email, None) with a bare email key, but production code writes _team_cache keys as f"{user_email}:{sorted_teams}". This means the pop never matches a real entry. This is pre-existing behavior (not introduced by this PR) and the subscriber correctly mirrors it. Worth a separate fix.

* fix: cross-replica cache invalidation for auth caches Signed-off-by: Marek Dano <Marek.Dano@ibm.com> * refactor: extract auth invalidation into helper and fix formatting - Extract _process_auth_invalidation() from _process_invalidation() to reduce method complexity (pylint too-many-statements/locals) - Add _evict_keys() static helper to deduplicate key-eviction loops - Remove dead _channel backward-compat attribute - Run Black on test file to fix pre-existing formatting violations Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: add missing docstring params in _evict_keys Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: harden cross-replica revoke handler and improve test coverage - Cap _revoked_jtis at 100k entries to prevent unbounded memory growth from compromised Redis pubsub channels (DoS mitigation) - Fix FakePubSub.subscribe signatures in two exception-path tests to accept varargs, matching the production *self._channels call - Add prefix-collision regression tests proving teams:/team_roles: messages don't fall through to the team: handler - Add edge-case tests for empty identifiers, colon-containing identifiers, and unusual email formats Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: enforce channel-origin guard on auth invalidation messages Auth-prefixed invalidation messages (user:, revoke:, team:, etc.) are now only accepted when they arrive on the mcpgw:auth:invalidate channel. Messages with auth prefixes on other channels are logged and ignored, preventing cross-channel confusion. Also fixes the _MAX_REVOKED_JTIS docstring to reference the correct method name (_load_revoked_jtis in auth_cache). Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: close channel-guard bypass for empty channel in auth invalidation The previous guard `if channel and channel != "mcpgw:auth:invalidate"` short-circuited to False when channel was empty, allowing auth messages to be processed without channel verification. Changed to `if channel != "mcpgw:auth:invalidate"` so auth messages are only accepted from the correct channel regardless of how the method is called. Signed-off-by: Jonathan Springer <jps@s390x.com> * test: add listener-path regression tests for channel forwarding Add 3 tests that exercise the full _listen_loop -> _process_invalidation path with channel data from fake pubsub messages: - test_listen_loop_forwards_channel_to_process_invalidation: verifies channel kwarg is forwarded from pubsub message to _process_invalidation - test_listen_loop_auth_message_evicts_cache_end_to_end: full path from pubsub message through auth cache eviction - test_listen_loop_auth_message_rejected_on_wrong_channel_end_to_end: deny-path proving channel guard works through the listener Also fixes _MAX_REVOKED_JTIS docstring to accurately describe the reconciliation path (Redis L2 fallback on L1 miss, not the unused sync_revoked_tokens method). Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * refactor: clarify auth dispatch comment and test names The if/elif chain comment said "Order matters: team_roles/teams must be checked before team" but there is no actual startswith() prefix collision between these prefixes (teams: != team:, team_roles: != team:). Reword the comment to describe correct dispatch and rename the two tests from "prefix_collision" to "dispatches_to_correct_handler" so they describe what they actually verify. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Marek Dano <Marek.Dano@ibm.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Marek Dano <Marek.Dano@ibm.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* fix: cross-replica cache invalidation for auth caches Signed-off-by: Marek Dano <Marek.Dano@ibm.com> * refactor: extract auth invalidation into helper and fix formatting - Extract _process_auth_invalidation() from _process_invalidation() to reduce method complexity (pylint too-many-statements/locals) - Add _evict_keys() static helper to deduplicate key-eviction loops - Remove dead _channel backward-compat attribute - Run Black on test file to fix pre-existing formatting violations Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: add missing docstring params in _evict_keys Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: harden cross-replica revoke handler and improve test coverage - Cap _revoked_jtis at 100k entries to prevent unbounded memory growth from compromised Redis pubsub channels (DoS mitigation) - Fix FakePubSub.subscribe signatures in two exception-path tests to accept varargs, matching the production *self._channels call - Add prefix-collision regression tests proving teams:/team_roles: messages don't fall through to the team: handler - Add edge-case tests for empty identifiers, colon-containing identifiers, and unusual email formats Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: enforce channel-origin guard on auth invalidation messages Auth-prefixed invalidation messages (user:, revoke:, team:, etc.) are now only accepted when they arrive on the mcpgw:auth:invalidate channel. Messages with auth prefixes on other channels are logged and ignored, preventing cross-channel confusion. Also fixes the _MAX_REVOKED_JTIS docstring to reference the correct method name (_load_revoked_jtis in auth_cache). Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: close channel-guard bypass for empty channel in auth invalidation The previous guard `if channel and channel != "mcpgw:auth:invalidate"` short-circuited to False when channel was empty, allowing auth messages to be processed without channel verification. Changed to `if channel != "mcpgw:auth:invalidate"` so auth messages are only accepted from the correct channel regardless of how the method is called. Signed-off-by: Jonathan Springer <jps@s390x.com> * test: add listener-path regression tests for channel forwarding Add 3 tests that exercise the full _listen_loop -> _process_invalidation path with channel data from fake pubsub messages: - test_listen_loop_forwards_channel_to_process_invalidation: verifies channel kwarg is forwarded from pubsub message to _process_invalidation - test_listen_loop_auth_message_evicts_cache_end_to_end: full path from pubsub message through auth cache eviction - test_listen_loop_auth_message_rejected_on_wrong_channel_end_to_end: deny-path proving channel guard works through the listener Also fixes _MAX_REVOKED_JTIS docstring to accurately describe the reconciliation path (Redis L2 fallback on L1 miss, not the unused sync_revoked_tokens method). Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * refactor: clarify auth dispatch comment and test names The if/elif chain comment said "Order matters: team_roles/teams must be checked before team" but there is no actual startswith() prefix collision between these prefixes (teams: != team:, team_roles: != team:). Reword the comment to describe correct dispatch and rename the two tests from "prefix_collision" to "dispatches_to_correct_handler" so they describe what they actually verify. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Marek Dano <Marek.Dano@ibm.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Marek Dano <Marek.Dano@ibm.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: Yosief Eyob <yosiefogbazion@gmail.com>

marekdano requested review from crivetimihai, kevalmahajan and madhav165 as code owners February 27, 2026 09:35

crivetimihai added this to the Release 1.0.0-RC2 milestone Feb 27, 2026

crivetimihai changed the title ~~fix: cross-replica cache invalidation for auth caches~~ [FIX][AUTH]: Cross-replica cache invalidation for auth caches Feb 27, 2026

crivetimihai added bug Something isn't working security Improves security MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe labels Feb 27, 2026

marekdano mentioned this pull request Feb 27, 2026

[FIX][UI]: Display CRUD changes on Users and Teams pages without full reload #3157

Closed

4 tasks

jonpspri force-pushed the fix-cross-replica-cache-invalidation branch from c38981b to ecbf5c3 Compare March 1, 2026 00:41

jonpspri previously approved these changes Mar 1, 2026

View reviewed changes

crivetimihai added the merge-queue Rebased and ready to merge label Mar 3, 2026

jonpspri force-pushed the fix-cross-replica-cache-invalidation branch from ecbf5c3 to 6ab1700 Compare March 3, 2026 11:46

crivetimihai self-assigned this Mar 7, 2026

crivetimihai added the release-fix Critical bugfix required for the release label Mar 7, 2026

Marek Dano and others added 8 commits March 8, 2026 17:34

fix: cross-replica cache invalidation for auth caches

c3ed039

Signed-off-by: Marek Dano <Marek.Dano@ibm.com>

fix: add missing docstring params in _evict_keys

40f7ae2

Signed-off-by: Jonathan Springer <jps@s390x.com>

crivetimihai dismissed jonpspri’s stale review via 423a78f March 8, 2026 18:49

crivetimihai force-pushed the fix-cross-replica-cache-invalidation branch from 9a4b78c to 423a78f Compare March 8, 2026 18:49

crivetimihai approved these changes Mar 8, 2026

View reviewed changes

crivetimihai merged commit f3d4849 into main Mar 8, 2026
40 of 41 checks passed

crivetimihai deleted the fix-cross-replica-cache-invalidation branch March 8, 2026 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX][AUTH]: Cross-replica cache invalidation for auth caches#3309

[FIX][AUTH]: Cross-replica cache invalidation for auth caches#3309
crivetimihai merged 8 commits intomainfrom
fix-cross-replica-cache-invalidation

marekdano commented Feb 27, 2026

Uh oh!

crivetimihai commented Feb 27, 2026

Uh oh!

jonpspri left a comment

Uh oh!

crivetimihai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

marekdano commented Feb 27, 2026

🐛 Bug-fix PR

📌 Summary

💡 Fix Description

Tests: tests/unit/mcpgateway/cache/test_cache_invalidation_subscriber.py

🧪 Verification

📐 MCP Compliance (if relevant)

✅ Checklist

Uh oh!

crivetimihai commented Feb 27, 2026

Uh oh!

jonpspri left a comment

Choose a reason for hiding this comment

Review: Approve

What was checked

Assessment

Uh oh!

crivetimihai left a comment

Choose a reason for hiding this comment

Review: Approve

Verification

Changes made during review (2 commits)

Assessment

Note for follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tests: `tests/unit/mcpgateway/cache/test_cache_invalidation_subscriber.py`