Skip to content

Multi-Worker Session Affinity for SSE and Streamable HTTP#2536

Merged
crivetimihai merged 45 commits intomainfrom
sa-multi-worker
Feb 6, 2026
Merged

Multi-Worker Session Affinity for SSE and Streamable HTTP#2536
crivetimihai merged 45 commits intomainfrom
sa-multi-worker

Conversation

@madhav165
Copy link
Copy Markdown
Collaborator

@madhav165 madhav165 commented Jan 28, 2026

✨ Feature: Multi-Worker Session Affinity for SSE and Streamable HTTP

🔗 Epic / Issue

Implements ADR: 038-multi-worker-session-affinity.md

Closes #1986


🚀 Summary

Enables horizontal scaling with multi-worker deployments (e.g., gunicorn -w 4) while maintaining session affinity to upstream MCP servers. Routes all client requests to the same upstream session regardless of which worker receives the HTTP request, using Redis Pub/Sub for cross-worker coordination.


🔍 Problem

MCP Gateway supports horizontal scaling with multiple worker processes. However, without session affinity, a client's requests may hit different workers, causing:

  1. Connection inefficiency - Multiple upstream sessions created to the same backend
  2. Session state loss - Some backends maintain state per session
  3. Resource waste - Each session consumes memory and connections
  4. Protocol violations - MCP SDK expects session continuity

Example scenario:

Client → Load Balancer → Worker 1 (creates session ABC to backend)
Client → Load Balancer → Worker 2 (creates NEW session XYZ to backend)

Result: Two upstream sessions for one client, wasting resources and breaking stateful backends.

Goal: Route all requests from the same client to the same upstream MCP session, regardless of which worker receives the HTTP request.


✅ Solution

Implement unified session affinity using Redis for cross-worker coordination:

Architecture

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  WORKER_A   │    │  WORKER_B   │    │  WORKER_C   │
│             │    │             │    │             │
│ ┌─────────┐ │    │ ┌─────────┐ │    │ ┌─────────┐ │
│ │ Session │ │    │ │ Session │ │    │ │ Session │ │
│ │  Pool   │ │    │ │  Pool   │ │    │ │  Pool   │ │
│ └─────────┘ │    │ └─────────┘ │    │ └─────────┘ │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       └──────────────────┼──────────────────┘
                          │
                   ┌──────▼──────┐
                   │    Redis    │
                   │             │
                   │  - Pub/Sub  │  Ownership Registry:
                   │  - Keys     │  mcpgw:pool_owner:{session_id}
                   └─────────────┘

Key Components

  1. Session Ownership Registry (Redis SETNX)

    • First worker to handle a client claims ownership atomically
    • Subsequent workers check ownership and forward if needed
    • TTL-based cleanup (default: 300s)
  2. Transport-Specific Routing:

    • SSE Transport: Redis Pub/Sub message routing (broadcast()respond())
    • Streamable HTTP Transport: Redis Pub/Sub HTTP forwarding (forward_streamable_http_to_owner())
  3. Worker Identification: {hostname}:{pid} format ensures uniqueness across containers and processes

  4. Unified /rpc Endpoint: All tool/prompt/resource invocations route through single handler

Why Redis Pub/Sub?

  • Works universally (single-host and multi-host deployments)
  • Direct worker-to-worker messaging without HTTP routing
  • Atomic ownership with SETNX prevents race conditions
  • Each worker listens on unique channel: mcpgw:pool_{rpc|http}:{worker_id}

📝 Changes

Core Implementation Files

  1. mcpgateway/services/mcp_session_pool.py (~400 lines added)

    • register_session_mapping() (line 545): Atomic session ownership with Redis SETNX
    • _get_pool_session_owner() (line 1309): Retrieve session owner from Redis
    • forward_request_to_owner() (line 1348): SSE transport RPC forwarding via Redis Pub/Sub
    • start_rpc_listener() (line 1438): Listen on both RPC and HTTP Redis channels
    • _execute_forwarded_request() (line 1486): Execute forwarded RPC requests locally
    • _execute_forwarded_http_request() (line 1554): Execute forwarded HTTP requests locally
    • get_streamable_http_session_owner() (line 1545): Check Streamable HTTP session ownership
    • forward_streamable_http_to_owner() (line 1559): HTTP forwarding via Redis Pub/Sub
    • WORKER_ID (line 61): Worker identification {hostname}:{pid}
  2. mcpgateway/cache/session_registry.py (SSE transport)

    • _register_session_mapping() (line 900): Pre-register ownership before tool calls
    • broadcast() (line 961): Publish messages to session owner via Redis
    • respond() (line 1119): Listen for messages and execute via /rpc
    • generate_response() (line 1863): Internal /rpc call for tool invocation
  3. mcpgateway/transports/streamablehttp_transport.py (Streamable HTTP transport)

    • handle_streamable_http() (line 1252): Check session ownership and forward if needed
    • Routes to /rpc endpoint for session-bound requests
  4. mcpgateway/main.py

    • /rpc endpoint (line 5259): Unified handler for all JSON-RPC methods
    • Session affinity check before execution (line 5308)
    • Lifespan hook to start RPC listener on startup
  5. mcpgateway/config.py

    • mcpgateway_session_affinity_enabled (line 1368): Feature flag
    • mcpgateway_session_affinity_ttl (line 1369): Ownership TTL
    • mcpgateway_pool_rpc_forward_timeout (line 1371): Forward timeout
  6. docs/docs/architecture/adr/038-multi-worker-session-affinity.md (651 lines)

    • Complete architectural documentation
    • Sequence diagrams for both transports
    • Deployment compatibility matrix
    • Configuration and implementation details

Code Impact

  • ~600 lines new code
  • ~100 lines modified
  • Net: +500 lines across 6 files

Redis Data Structures

Key/Channel Purpose TTL
mcpgw:pool_owner:{session_id} Session ownership (value: WORKER_ID) Configurable (300s)
mcpgw:session_mapping:{...} Pool key for session Configurable
mcpgw:pool_rpc:{worker_id} Pub/Sub channel for RPC requests N/A
mcpgw:pool_rpc_response:{uuid} Pub/Sub channel for RPC responses N/A
mcpgw:pool_http:{worker_id} Pub/Sub channel for HTTP requests N/A
mcpgw:pool_http_response:{uuid} Pub/Sub channel for HTTP responses N/A

🧪 Testing

Manual Verification

# 1. Start gunicorn with 4 workers
gunicorn -w 4 -k uvicorn.workers.UvicornWorker mcpgateway.main:app --bind 0.0.0.0:4444

# 2. Enable session affinity
export MCPGATEWAY_SESSION_AFFINITY_ENABLED=true
export REDIS_URL=redis://localhost:6379

# 3. Create SSE session (Worker A)
curl -N http://localhost:4444/sse

# 4. Send tool call (may hit Worker B)
curl -X POST http://localhost:4444/message \
  -H "Content-Type: application/json" \
  -d '{"method":"tools/call","params":{"name":"my_tool"},"id":1}'

# 5. Verify logs show Redis forwarding
# Expected:
#   [WORKER_B] Broadcasting message to session via Redis
#   [WORKER_A] Received message, executing tool call
#   [WORKER_A] Sending response via SSE stream

Test Scenarios

SSE Transport

  • ✅ Client connects SSE stream to Worker A
  • ✅ Client sends tool call to Worker B (different worker)
  • ✅ Worker B broadcasts to Redis channel {session_id}
  • ✅ Worker A receives message and executes via /rpc
  • ✅ Response sent back through Worker A's SSE stream

Streamable HTTP Transport

  • ✅ Client sends initialize to Worker A, gets mcp-session-id
  • ✅ Worker A claims ownership via Redis SETNX
  • ✅ Client sends tools/call to Worker B with mcp-session-id header
  • ✅ Worker B checks Redis, sees Worker A owns session
  • ✅ Worker B forwards via Redis channel mcpgw:pool_http:host_a:1
  • ✅ Worker A executes and responds via Redis
  • ✅ Worker B returns response to client

Deployment Scenarios

  • ✅ Single host + gunicorn multi-worker
  • ✅ Docker single container + gunicorn
  • ✅ Kubernetes StatefulSet (multi-pod)
  • ✅ Kubernetes Deployment (replicas)
  • ✅ Docker Compose (multi-container)

Edge Cases

  • ✅ Session ownership atomicity (concurrent claims)
  • ✅ Timeout handling for forwarded requests
  • ✅ Binary request/response bodies (hex encoding)
  • ✅ Loop prevention (x-forwarded-internally header)
  • ✅ Redis unavailable fallback (execute locally)
  • ✅ Session TTL expiration and cleanup

🧪 Checks

  • make lint passes
  • make test passes
  • CHANGELOG updated
  • Tested with gunicorn multi-worker
  • Tested SSE transport session affinity
  • Tested Streamable HTTP transport session affinity
  • Verified Redis Pub/Sub forwarding
  • Verified atomic ownership with concurrent requests
  • ADR 038 documentation complete

📊 Performance Impact

Latency

  • Same worker (session owner): No added latency
  • Cross-worker (forwarded): +1-2ms (Redis roundtrip)

Resource Usage

  • Redis memory: ~200 bytes per session ownership key
  • Redis bandwidth: ~1-10KB per forwarded request (depends on payload size)
  • Binary encoding overhead: 2x payload size in Redis (hex encoding)

Scalability

  • Tested with 4 workers handling 1000 concurrent clients
  • Redis Pub/Sub handles 10k+ messages/sec easily
  • No significant performance degradation vs single-worker

📓 Architecture Diagrams

SSE Transport Flow

sequenceDiagram
    participant Client
    participant LB as Load Balancer
    participant WB as WORKER_B
    participant Redis
    participant WA as WORKER_A (SSE Owner)
    participant Backend as Backend MCP Server

    Note over Client,WA: 1. SSE Connection Established
    Client->>LB: GET /sse
    LB->>WA: Route to WORKER_A
    WA->>WA: Start respond() loop
    WA-->>Client: SSE Stream Connected

    Note over Client,Backend: 2. Tool Call (hits different worker)
    Client->>LB: POST /message {tools/call}
    LB->>WB: Route to WORKER_B
    WB->>Redis: SETNX pool_owner (fails - WORKER_A owns)
    WB->>Redis: Publish to session channel

    Note over WA: respond() receives message
    Redis-->>WA: Message received
    WA->>WA: POST /rpc (internal)
    WA->>Backend: Execute tool
    Backend-->>WA: Result
    WA-->>Client: SSE: {result}
Loading

Streamable HTTP Transport Flow

sequenceDiagram
    participant Client
    participant LB as Load Balancer
    participant WB as WORKER_B
    participant Redis
    participant WA as WORKER_A (Session Owner)
    participant Backend as Backend MCP Server

    Note over Client,Backend: 1. Initialize - Establishes Ownership
    Client->>LB: POST /mcp {initialize}
    LB->>WA: Route to WORKER_A
    WA->>Redis: SETNX pool_owner:ABC = host_a:1
    WA-->>Client: {mcp-session-id: ABC}

    Note over Client,Backend: 2. Subsequent Request - Different Worker
    Client->>LB: POST /mcp {tools/call, session: ABC}
    LB->>WB: Route to WORKER_B
    WB->>Redis: GET pool_owner:ABC → host_a:1 (not us!)
    WB->>Redis: Subscribe to response channel
    WB->>Redis: PUBLISH to pool_http:host_a:1

    Note over WA: start_rpc_listener() receives message
    Redis-->>WA: HTTP forward message
    WA->>WA: POST /rpc (internal)
    WA->>Backend: Execute tool
    Backend-->>WA: Result
    WA->>Redis: PUBLISH response
    Redis-->>WB: Response received
    WB-->>Client: HTTP Response {result}
Loading

🔧 Configuration

Environment Variables

# Enable session affinity (required for multi-worker)
MCPGATEWAY_SESSION_AFFINITY_ENABLED=true

# Session ownership TTL (seconds)
MCPGATEWAY_SESSION_AFFINITY_TTL=300

# Forwarded request timeout (seconds)
MCPGATEWAY_POOL_RPC_FORWARD_TIMEOUT=30

# Redis connection (required)
REDIS_URL=redis://localhost:6379

Startup

The RPC listener is automatically started during application lifespan:

# In main.py lifespan
if settings.mcpgateway_session_affinity_enabled:
    pool = get_mcp_session_pool()
    pool._rpc_listener_task = asyncio.create_task(pool.start_rpc_listener())

🔗 Related Documentation


✅ Benefits

  1. Horizontal Scaling: Deploy with gunicorn -w N without session issues
  2. Resource Efficiency: One upstream session per client (not per worker)
  3. Session State: Backends can maintain state across requests
  4. Universal Deployment: Works on single-host, multi-host, Kubernetes, Docker
  5. Atomic Ownership: Race-free session claiming with Redis SETNX
  6. Transparent: No client-side changes required

⚠️ Trade-offs

Positive

  • Enables true horizontal scaling with session affinity
  • Reuses upstream sessions efficiently (ADR-032)
  • Works transparently for both SSE and Streamable HTTP
  • Atomic ownership prevents race conditions
  • Universal deployment compatibility

Negative

  • Requires Redis for multi-worker deployments
  • Adds latency for cross-worker requests (~1-2ms)
  • Complex debugging: Requests may span multiple workers
  • Binary encoding overhead: 2x payload size in Redis (hex encoding)

Neutral

  • SDK session manager partially bypassed for Streamable HTTP
  • Both transports use Redis Pub/Sub (different payload formats)

🚀 Future Improvements

  1. Remove SDK session manager dependency entirely for Streamable HTTP
  2. Binary-safe Redis encoding (protobuf or msgpack instead of hex)
  3. Session migration on worker shutdown (graceful failover)
  4. Metrics dashboard for session affinity (ownership distribution, forward rates)
  5. Circuit breaker for failed forward attempts

🔐 Security Considerations

  • Session IDs are treated as opaque tokens (no sensitive data)
  • Redis channels use worker IDs (not user-controllable)
  • x-forwarded-internally header prevents external spoofing (internal-only)
  • Ownership TTL prevents indefinite session locking
  • No new authentication or authorization changes required

🎯 Success Criteria

  • ✅ Multi-worker deployments maintain single upstream session per client
  • ✅ SSE transport works with session affinity
  • ✅ Streamable HTTP transport works with session affinity
  • ✅ No "session not found" errors in logs
  • ✅ Resource usage reduced (fewer upstream sessions)
  • ✅ Works on all deployment scenarios (gunicorn, Docker, Kubernetes)
  • ✅ Performance acceptable (<5ms added latency for cross-worker)

@madhav165 madhav165 marked this pull request as draft January 28, 2026 06:09
@madhav165 madhav165 closed this Jan 28, 2026
@madhav165 madhav165 reopened this Jan 28, 2026
@madhav165 madhav165 force-pushed the sa-multi-worker branch 2 times, most recently from 46a3f26 to c087d95 Compare January 30, 2026 15:15
@madhav165 madhav165 marked this pull request as ready for review February 2, 2026 13:30
@madhav165 madhav165 changed the title Implement session affinity from downstream session to upstream session pool Multi-Worker Session Affinity for SSE and Streamable HTTP Feb 2, 2026
@madhav165 madhav165 marked this pull request as draft February 2, 2026 15:58
@madhav165 madhav165 marked this pull request as ready for review February 3, 2026 07:35
@madhav165 madhav165 force-pushed the sa-multi-worker branch 2 times, most recently from 624dbdf to 0da4496 Compare February 3, 2026 17:08
@crivetimihai
Copy link
Copy Markdown
Member

Solid feature with a well-written ADR and thorough implementation. The Redis Pub/Sub approach for cross-worker session coordination is the right pattern for horizontal scaling.

One cleanup needed: ~12 unrelated files under plugins/unified_pdp/ and tests/unit/plugins/ with only # -*- coding: utf-8 -*- header additions from an incorrect rebase. Please rebase cleanly against main and remove them.

@crivetimihai crivetimihai self-assigned this Feb 4, 2026
@madhav165
Copy link
Copy Markdown
Collaborator Author

madhav165 commented Feb 4, 2026

Removed unrelated commits to plugins/unified_pdp/ and tests/unit/plugins/

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
madhav165 and others added 17 commits February 5, 2026 22:27
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
…nity code

Convert print() statements to appropriate logger.debug()/logger.info()/logger.warning()
calls for proper log management in the multi-worker session affinity feature.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Add docstrings to _pool_owner_key, _rehydrate_content_items, and
send_with_capture to achieve 100% docstring coverage.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Fix darglint DAR101/DAR201 errors by adding missing parameter
and return documentation to docstrings.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai merged commit d0d0215 into main Feb 6, 2026
43 checks passed
@crivetimihai crivetimihai deleted the sa-multi-worker branch February 6, 2026 00:37
kcostell06 pushed a commit to kcostell06/mcp-context-forge that referenced this pull request Feb 24, 2026
* Add x-mcp-session-id to default identity headers
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Pass x-mcp-session-id to mcp_session_pool headers and prioritize if found

* wip sa

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* add e2e test

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* flake8 fix
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* remove plan

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* pylint fix
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Implement multi worker
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Implement multi worker for mcp session pool
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* linting fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Minor bug fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix critical bugs
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix sse session_id, add logging and fix test
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* fix url of rpc from nginx
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* add stateful sessions in http
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* WIP fixes to streamable http
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix streamable http
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Updated ADR
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Update ADR
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* black fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix failing doctests
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix more tests
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* flake8 fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* pylint fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* pylint fixes
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix streamable http for single gunicorn
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Revert base_url
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix test
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* revert replica count
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix bandit test
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* remove plan

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix bug for local
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Update ADR and remove print
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix lint issues
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Fix test
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* Remove accidental utf-8 headers from incorrect rebase

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* fix: replace debug print statements with logger calls in session affinity code

Convert print() statements to appropriate logger.debug()/logger.info()/logger.warning()
calls for proper log management in the multi-worker session affinity feature.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix: harden session affinity and redis event store

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* lint

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix: avoid broad exception in streamable http header parsing

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* lint

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* docs: add missing docstrings for interrogate compliance

Add docstrings to _pool_owner_key, _rehydrate_content_items, and
send_with_capture to achieve 100% docstring coverage.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix: add missing newline at end of redis_event_store.py

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* docs: complete docstrings with Args and Returns sections

Fix darglint DAR101/DAR201 errors by adding missing parameter
and return documentation to docstrings.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* lint

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Session affinity for stateful MCP workflows (REQ-005)

2 participants