Enable sticky sessions on operator-created Services#3986
Conversation
MCP servers use stateful session protocols (SSE, streamable-http). When replicas > 1, Kubernetes round-robin routing breaks sessions. Set SessionAffinity: ClientIP on all operator-created Services (MCPServer, MCPRemoteProxy, VirtualMCPServer) so requests from the same client consistently reach the same backend pod. Also add drift detection in serviceNeedsUpdate() and copy SessionAffinity in the ensureService update paths so existing Services get reconciled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document that all operator-created Services use SessionAffinity: ClientIP to support stateful MCP sessions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3986 +/- ##
==========================================
- Coverage 68.56% 68.51% -0.05%
==========================================
Files 437 437
Lines 44662 44674 +12
==========================================
- Hits 30621 30609 -12
- Misses 11657 11682 +25
+ Partials 2384 2383 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Should we make this configurable by the user? |
|
@eleftherias we could, though I don't know in what case you wouldn't want this. |
|
If the server is stateless then the stickiness isn't needed and isn't the best way to distribute the load. Also if they use an external load balancer this could cause conflicts with the policy they've configured. |
|
@eleftherias gotcha. In most cases the MCP server is stateful, though. And this is the case for our proxy as well. Should we then default to havin this enabled and have a flag to optionally disable this? |
|
Yes that makes sense |
|
@eleftherias should I add the configurability to a separate PR or to this one? |
|
Separate PR works |
PR #3986 hardcoded SessionAffinity: ClientIP on all operator-created Services. Review feedback identified this should be configurable: stateless MCP servers don't need stickiness and external load balancers may conflict. Add a sessionAffinity field (enum: ClientIP/None, default: ClientIP) to MCPServerSpec, MCPRemoteProxySpec, and VirtualMCPServerSpec. Wire the field through service creation and drift detection in all three controllers. Also fix a pre-existing bug in the MCPServer controller where the service update path did not copy Labels/Annotations, which could cause infinite reconciliation when those fields drifted. EmbeddingServer is intentionally excluded — embedding servers are stateless inference endpoints, not MCP protocol services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Enable sticky sessions on operator-created Services MCP servers use stateful session protocols (SSE, streamable-http). When replicas > 1, Kubernetes round-robin routing breaks sessions. Set SessionAffinity: ClientIP on all operator-created Services (MCPServer, MCPRemoteProxy, VirtualMCPServer) so requests from the same client consistently reach the same backend pod. Also add drift detection in serviceNeedsUpdate() and copy SessionAffinity in the ensureService update paths so existing Services get reconciled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update operator architecture docs for SessionAffinity Document that all operator-created Services use SessionAffinity: ClientIP to support stateful MCP sessions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #3986 hardcoded SessionAffinity: ClientIP on all operator-created Services. Review feedback identified this should be configurable: stateless MCP servers don't need stickiness and external load balancers may conflict. Add a sessionAffinity field (enum: ClientIP/None, default: ClientIP) to MCPServerSpec, MCPRemoteProxySpec, and VirtualMCPServerSpec. Wire the field through service creation and drift detection in all three controllers. Also fix a pre-existing bug in the MCPServer controller where the service update path did not copy Labels/Annotations, which could cause infinite reconciliation when those fields drifted. EmbeddingServer is intentionally excluded — embedding servers are stateless inference endpoints, not MCP protocol services. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
SessionAffinity: ClientIPon all three operator-created Service types (MCPServer, MCPRemoteProxy, VirtualMCPServer) so stateful MCP session protocols (SSE, streamable-http) work correctly when replicas > 1serviceNeedsUpdate()for each controller so existing Services withoutClientIPaffinity get reconciledSessionAffinityin theensureServiceupdate paths alongside existingPorts/TypecopiesTest plan
task lint-fix— 0 issuesgo test -race ./cmd/thv-operator/controllers/...— all passtask test— all pass (one pre-existing flaky test unrelated: Flaky test: TestServer_HealthMonitoring_Enabled #3985)kubectl get svc <name> -o jsonpath='{.spec.sessionAffinity}'returnsClientIP🤖 Generated with Claude Code