Skip to content

Enable sticky sessions on operator-created Services#3986

Merged
JAORMX merged 2 commits intomainfrom
feat/sticky-sessions
Mar 4, 2026
Merged

Enable sticky sessions on operator-created Services#3986
JAORMX merged 2 commits intomainfrom
feat/sticky-sessions

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 4, 2026

Summary

  • Set SessionAffinity: ClientIP on all three operator-created Service types (MCPServer, MCPRemoteProxy, VirtualMCPServer) so stateful MCP session protocols (SSE, streamable-http) work correctly when replicas > 1
  • Add drift detection in serviceNeedsUpdate() for each controller so existing Services without ClientIP affinity get reconciled
  • Copy SessionAffinity in the ensureService update paths alongside existing Ports/Type copies

Test plan

  • task lint-fix — 0 issues
  • go test -race ./cmd/thv-operator/controllers/... — all pass
  • task test — all pass (one pre-existing flaky test unrelated: Flaky test: TestServer_HealthMonitoring_Enabled #3985)
  • Deploy operator to cluster, create MCPServer/MCPRemoteProxy/VirtualMCPServer, verify kubectl get svc <name> -o jsonpath='{.spec.sessionAffinity}' returns ClientIP

🤖 Generated with Claude Code

MCP servers use stateful session protocols (SSE, streamable-http).
When replicas > 1, Kubernetes round-robin routing breaks sessions.
Set SessionAffinity: ClientIP on all operator-created Services
(MCPServer, MCPRemoteProxy, VirtualMCPServer) so requests from
the same client consistently reach the same backend pod.

Also add drift detection in serviceNeedsUpdate() and copy
SessionAffinity in the ensureService update paths so existing
Services get reconciled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Mar 4, 2026
Document that all operator-created Services use
SessionAffinity: ClientIP to support stateful MCP sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Mar 4, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 62.50000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.51%. Comparing base (c46ae47) to head (2cc6eb6).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...-operator/controllers/mcpremoteproxy_controller.go 0.00% 2 Missing and 1 partial ⚠️
...d/thv-operator/controllers/mcpserver_controller.go 40.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3986      +/-   ##
==========================================
- Coverage   68.56%   68.51%   -0.05%     
==========================================
  Files         437      437              
  Lines       44662    44674      +12     
==========================================
- Hits        30621    30609      -12     
- Misses      11657    11682      +25     
+ Partials     2384     2383       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eleftherias
Copy link
Copy Markdown
Member

Should we make this configurable by the user?

@JAORMX
Copy link
Copy Markdown
Collaborator Author

JAORMX commented Mar 4, 2026

@eleftherias we could, though I don't know in what case you wouldn't want this.

@eleftherias
Copy link
Copy Markdown
Member

If the server is stateless then the stickiness isn't needed and isn't the best way to distribute the load. Also if they use an external load balancer this could cause conflicts with the policy they've configured.

@JAORMX
Copy link
Copy Markdown
Collaborator Author

JAORMX commented Mar 4, 2026

@eleftherias gotcha. In most cases the MCP server is stateful, though. And this is the case for our proxy as well. Should we then default to havin this enabled and have a flag to optionally disable this?

@eleftherias
Copy link
Copy Markdown
Member

Yes that makes sense

@JAORMX
Copy link
Copy Markdown
Collaborator Author

JAORMX commented Mar 4, 2026

@eleftherias should I add the configurability to a separate PR or to this one?

@eleftherias
Copy link
Copy Markdown
Member

Separate PR works

@JAORMX JAORMX merged commit b7cf0f7 into main Mar 4, 2026
58 of 59 checks passed
@JAORMX JAORMX deleted the feat/sticky-sessions branch March 4, 2026 11:34
JAORMX added a commit that referenced this pull request Mar 4, 2026
PR #3986 hardcoded SessionAffinity: ClientIP on all operator-created
Services. Review feedback identified this should be configurable:
stateless MCP servers don't need stickiness and external load balancers
may conflict.

Add a sessionAffinity field (enum: ClientIP/None, default: ClientIP) to
MCPServerSpec, MCPRemoteProxySpec, and VirtualMCPServerSpec. Wire the
field through service creation and drift detection in all three
controllers.

Also fix a pre-existing bug in the MCPServer controller where the
service update path did not copy Labels/Annotations, which could cause
infinite reconciliation when those fields drifted.

EmbeddingServer is intentionally excluded — embedding servers are
stateless inference endpoints, not MCP protocol services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
reyortiz3 pushed a commit that referenced this pull request Mar 4, 2026
* Enable sticky sessions on operator-created Services

MCP servers use stateful session protocols (SSE, streamable-http).
When replicas > 1, Kubernetes round-robin routing breaks sessions.
Set SessionAffinity: ClientIP on all operator-created Services
(MCPServer, MCPRemoteProxy, VirtualMCPServer) so requests from
the same client consistently reach the same backend pod.

Also add drift detection in serviceNeedsUpdate() and copy
SessionAffinity in the ensureService update paths so existing
Services get reconciled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update operator architecture docs for SessionAffinity

Document that all operator-created Services use
SessionAffinity: ClientIP to support stateful MCP sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
eleftherias pushed a commit that referenced this pull request Mar 4, 2026
PR #3986 hardcoded SessionAffinity: ClientIP on all operator-created
Services. Review feedback identified this should be configurable:
stateless MCP servers don't need stickiness and external load balancers
may conflict.

Add a sessionAffinity field (enum: ClientIP/None, default: ClientIP) to
MCPServerSpec, MCPRemoteProxySpec, and VirtualMCPServerSpec. Wire the
field through service creation and drift detection in all three
controllers.

Also fix a pre-existing bug in the MCPServer controller where the
service update path did not copy Labels/Annotations, which could cause
infinite reconciliation when those fields drifted.

EmbeddingServer is intentionally excluded — embedding servers are
stateless inference endpoints, not MCP protocol services.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants