You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend deploymentForMCPServer in mcpserver_controller.go to set Deployment.Spec.Replicas from spec.replicas with nil-passthrough (preserving HPA compatibility) and set terminationGracePeriodSeconds on the pod spec. Extend the reconciler to enforce the stdio transport hard cap (force to 1 and set a status condition when spec.transport == stdio && spec.replicas > 1) and emit a Warning status condition when spec.replicas > 1 but session storage is absent or set to memory. These changes implement RC-3 and RC-14 of epic THV-0047 for the MCPServer resource.
Context
Epic THV-0047 adds horizontal scaling support to the ToolHive operator. TASK-001 (#267) and TASK-002 (#269) added the Replicas, BackendReplicas, and SessionStorage fields to the CRD specs; TASK-003 (#275) regenerated the deepcopy functions and CRD YAML manifests, making the new fields available at compile time. This task wires those fields into the MCPServer reconciler and deployment builder so they are actually respected at runtime.
The existing deploymentForMCPServer in mcpserver_controller.go hardcodes replicas := int32(1) and never sets terminationGracePeriodSeconds. The existing reconciler at lines 392–407 handles the stdio cap for externally-driven scaling (HPA or kubectl scale), but has no path for spec-driven replica values. This task adds both the spec-driven replica path and the session storage validation warning.
Dependencies: #275 (TASK-003 — Regenerate deepcopy and CRD manifests after scaling type changes) Blocks: TASK-008 (Unit Tests)
Acceptance Criteria
deploymentForMCPServer sets Deployment.Spec.Replicas to spec.replicas when spec.replicas is non-nil; when spec.replicas is nil, Deployment.Spec.Replicas is left nil (not set to any default), allowing external scaling tools (HPA, KEDA, kubectl scale) to manage replica counts freely
deploymentForMCPServer sets TerminationGracePeriodSeconds on PodSpec using the value from a named constant or a documented default (e.g., 30 seconds); the field must appear in the generated Deployment even when spec.replicas is nil
The reconciler enforces the stdio hard cap on the spec-driven path: if spec.transport == "stdio" and spec.replicas != nil && *spec.replicas > 1, the reconciler forces replicas to 1 in the created Deployment, sets a Warning status condition with a descriptive message, and requeues
The existing externally-driven stdio cap at mcpserver_controller.go lines 392–407 continues to function correctly and is not broken by this change — the two paths (spec-driven vs externally-driven) are distinct
The reconciler emits a Warning status condition when spec.replicas != nil && *spec.replicas > 1 and either spec.sessionStorage is nil or spec.sessionStorage.provider is "memory" — the deployment proceeds (no hard rejection); the condition is cleared when the configuration is corrected
On config-driven deployment updates, Spec.Replicas is not overwritten (only Spec.Template, Spec.Selector, Labels, and Annotations are updated), following the existing pattern at lines 452–469 — this preserves the nil-passthrough invariant on update
New status condition type constants and reason constants for the stdio cap and session storage warning are defined in cmd/thv-operator/api/v1alpha1/mcpserver_types.go, following the existing naming convention
go build ./cmd/thv-operator/... passes
go test ./cmd/thv-operator/... (excluding integration tests) passes, including all pre-existing tests in mcpserver_replicas_test.go
Code reviewed and approved
Technical Approach
Recommended Implementation
1. Add new condition type and reason constants to mcpserver_types.go
Before touching the controller, define the new condition types and reasons. Follow the existing const block pattern in mcpserver_types.go:
// Condition type for stdio replica cap enforcementconst (
ConditionStdioReplicaCapped="StdioReplicaCapped"
)
const (
ConditionReasonStdioReplicaCapped="StdioTransportCapAt1"
)
// Condition type for session storage warningconst (
ConditionSessionStorageWarning="SessionStorageWarning"
)
const (
ConditionReasonSessionStorageMissing="SessionStorageMissingForReplicas"ConditionReasonSessionStorageOK="SessionStorageConfigured"
)
2. Extend deploymentForMCPServer in mcpserver_controller.go
Replace the hardcoded replicas := int32(1) at line 934 with spec-driven nil-passthrough logic, and add terminationGracePeriodSeconds to the pod spec.
For replicas, the logic is:
If spec.replicas != nil and spec.transport == "stdio" and *spec.replicas > 1: use int32Ptr(1) (the spec-driven stdio cap is enforced here in the builder; a status condition is set in the reconciler before this call)
If spec.replicas != nil: use spec.replicas directly (pass the pointer through)
If spec.replicas == nil: set Deployment.Spec.Replicas to nil (omit it entirely)
In the appsv1.DeploymentSpec struct literal, change Replicas: &replicas to use the resolved value, which may be nil. Add TerminationGracePeriodSeconds on the pod spec:
// In the Deployment Spec.Template.Spec:TerminationGracePeriodSeconds: int64Ptr(defaultTerminationGracePeriodSeconds),
3. Extend the reconciler's pre-creation / pre-update path
Insert two new validation steps in Reconcile(), after the existing image and template validation and before the deploymentForMCPServer call. These are distinct from the existing externally-driven cap at lines 392–407 (which runs after the deployment is fetched):
Step A — Spec-driven stdio cap: If spec.transport == "stdio" and spec.replicas != nil && *spec.replicas > 1:
Set a Warning status condition using meta.SetStatusCondition
Continue (allow deployment creation with replicas forced to 1 inside deploymentForMCPServer)
Step B — Session storage validation: If spec.replicas != nil && *spec.replicas > 1:
If spec.sessionStorage == nil or spec.sessionStorage.provider != "redis": set a Warning status condition; do not block deployment
Otherwise: clear/set the condition to indicate session storage is configured
4. Update the deployment update path
No changes are needed to deploymentNeedsUpdate or the deployment update logic at lines 452–469. Those lines already avoid overwriting Spec.Replicas on update, which is the correct behavior. Verify this is true after the deployment builder changes.
Patterns & Frameworks
Existing stdio cap pattern — mcpserver_controller.go lines 392–407: externally-driven cap. The new spec-driven cap is a separate code path added in deploymentForMCPServer itself and in reconciler pre-creation validation. Do not merge the two paths.
Status condition pattern — meta.SetStatusCondition(&mcpServer.Status.Conditions, metav1.Condition{...}) followed by r.Status().Update(ctx, mcpServer). See setCABundleRefCondition / validateCABundleRef in mcpserver_controller.go lines 553–616 for the exact pattern to follow for helper functions.
Nil-passthrough replicas — The deploymentNeedsUpdate function at lines 1505+ already avoids overwriting Spec.Replicas on updates; deploymentForMCPServer must produce a nil Spec.Replicas when spec.replicas == nil so this invariant holds on initial creation too.
int32Ptr helper — Already defined at line 1859 of mcpserver_controller.go. Add a companion int64Ptr for TerminationGracePeriodSeconds.
Condition update after r.Status().Update — Always call r.Status().Update immediately after meta.SetStatusCondition; log but do not return on status update errors (follow the pattern of existing condition setters).
Code Pointers
cmd/thv-operator/controllers/mcpserver_controller.go line 934 — Hardcoded replicas := int32(1); this is the primary change point in deploymentForMCPServer
cmd/thv-operator/controllers/mcpserver_controller.go lines 1191–1263 — The full Deployment struct literal in deploymentForMCPServer; Spec.Replicas and Spec.Template.Spec.TerminationGracePeriodSeconds are set here
cmd/thv-operator/controllers/mcpserver_controller.go lines 392–407 — Existing externally-driven stdio cap; new spec-driven logic must not disturb this block
cmd/thv-operator/controllers/mcpserver_controller.go lines 452–469 — Deployment update path that preserves Spec.Replicas; verify it remains correct after builder changes
cmd/thv-operator/controllers/mcpserver_controller.go lines 553–616 — setCABundleRefCondition and validateCABundleRef — pattern for helper condition-setter functions and status update calls
cmd/thv-operator/controllers/mcpserver_controller.go line 1859 — int32Ptr helper; add int64Ptr companion here
cmd/thv-operator/api/v1alpha1/mcpserver_types.go lines 12–69 — Existing condition type and reason constants; new constants go in new const blocks here following the same convention
cmd/thv-operator/controllers/mcpserver_replicas_test.go — Existing replica tests (HPA-scenario); new spec-driven tests for TASK-008 extend this file using the same fake.NewClientBuilder pattern
Component Interfaces
New condition type and reason constants (in mcpserver_types.go):
// ConditionStdioReplicaCapped indicates the stdio transport replica count// was capped at 1 because spec.replicas was set to a value greater than 1.constConditionStdioReplicaCapped="StdioReplicaCapped"const (
// ConditionReasonStdioReplicaCapped is set when spec.replicas > 1 for a stdio transport.ConditionReasonStdioReplicaCapped="StdioTransportCapAt1"
)
// ConditionSessionStorageWarning indicates that replicas > 1 but session storage// is not configured with a Redis backend.constConditionSessionStorageWarning="SessionStorageWarning"const (
// ConditionReasonSessionStorageMissing is set when replicas > 1 and no Redis session storage is configured.ConditionReasonSessionStorageMissing="SessionStorageMissingForReplicas"// ConditionReasonSessionStorageConfigured is set when replicas > 1 and Redis session storage is configured.ConditionReasonSessionStorageConfigured="SessionStorageConfigured"
)
Replica resolution logic in deploymentForMCPServer (illustrative shape, not prescriptive):
// resolveDeploymentReplicas returns the replica pointer to set on Deployment.Spec.Replicas.// Returns nil when spec.replicas is nil (hands-off mode for HPA/KEDA).// Enforces stdio cap at 1 in the builder as defense-in-depth.funcresolveDeploymentReplicas(transportstring, specReplicas*int32) *int32 {
ifspecReplicas==nil {
returnnil
}
iftransport=="stdio"&&*specReplicas>1 {
returnint32Ptr(1)
}
returnspecReplicas
}
Session storage validation call site in Reconcile() (shape only):
// After existing image/template validation, before deploymentForMCPServer call:r.validateSessionStorageForReplicas(ctx, mcpServer)
func (r*MCPServerReconciler) validateSessionStorageForReplicas(
ctx context.Context, mcpServer*mcpv1alpha1.MCPServer,
) {
ifmcpServer.Spec.Replicas==nil||*mcpServer.Spec.Replicas<=1 {
return
}
ifmcpServer.Spec.SessionStorage==nil||mcpServer.Spec.SessionStorage.Provider!="redis" {
meta.SetStatusCondition(&mcpServer.Status.Conditions, metav1.Condition{
Type: mcpv1alpha1.ConditionSessionStorageWarning,
Status: metav1.ConditionTrue,
Reason: mcpv1alpha1.ConditionReasonSessionStorageMissing,
Message: "replicas > 1 but sessionStorage.provider is not redis; sessions are not shared across replicas",
ObservedGeneration: mcpServer.Generation,
})
} else {
meta.SetStatusCondition(&mcpServer.Status.Conditions, metav1.Condition{
Type: mcpv1alpha1.ConditionSessionStorageWarning,
Status: metav1.ConditionFalse,
Reason: mcpv1alpha1.ConditionReasonSessionStorageConfigured,
Message: "Redis session storage is configured",
ObservedGeneration: mcpServer.Generation,
})
}
iferr:=r.Status().Update(ctx, mcpServer); err!=nil {
log.FromContext(ctx).Error(err, "Failed to update MCPServer status after session storage validation")
}
}
Testing Strategy
Tests for this task's behaviors are written in TASK-008. The existing mcpserver_replicas_test.go covers the externally-driven HPA scenario; TASK-008 extends it with spec-driven cases. For reference, the test pattern to follow:
Unit Tests
TestSpecDrivenReplicasNil — deploymentForMCPServer with spec.replicas == nil produces a Deployment with Spec.Replicas == nil
TestSpecDrivenStdioCapAt1 — reconciler with spec.transport = "stdio" and spec.replicas = 3 creates a Deployment with Spec.Replicas = 1 and sets the StdioReplicaCapped status condition
TestSessionStorageWarningConditionSet — reconciler with spec.replicas = 2 and no sessionStorage sets the SessionStorageWarning condition with status True
TestSessionStorageWarningConditionCleared — reconciler with spec.replicas = 2 and sessionStorage.provider = "redis" sets the SessionStorageWarning condition with status False
TestTerminationGracePeriodSet — deploymentForMCPServer produces a Deployment with Spec.Template.Spec.TerminationGracePeriodSeconds set to the expected default value
Integration Tests
Not in scope for this task; integration tests are covered in TASK-008
Edge Cases
Verify that the externally-driven stdio cap at lines 392–407 is not triggered when spec.replicas == nil and the live deployment has 1 replica (no spurious requeue)
Verify that a deployment update (config change with existing 3-replica deployment) does not overwrite Spec.Replicas when spec.replicas == nil
Verify that setting spec.replicas = nil after previously having spec.replicas = 2 does not set Deployment.Spec.Replicas to any value (nil passthrough on update)
Out of Scope
RunConfig ConfigMap population with backendReplicas and Redis config — covered by TASK-005
VirtualMCPServer deployment builder replica and terminationGracePeriodSeconds changes — covered by TASK-006
Redis password env var injection into the proxyrunner pod — covered by TASK-005 (as part of RunConfig) and TASK-006 (vMCP pod)
Application-level session storage implementation in proxyrunner — separate epic
Horizontal Pod Autoscaler (HPA) object creation or management
Redis deployment, provisioning, or lifecycle
CLI, UI, or any changes outside cmd/thv-operator/
Changes to mcpserver_runconfig.go, virtualmcpserver_deployment.go, or virtualmcpserver_vmcpconfig.go
Description
Extend
deploymentForMCPServerinmcpserver_controller.goto setDeployment.Spec.Replicasfromspec.replicaswith nil-passthrough (preserving HPA compatibility) and setterminationGracePeriodSecondson the pod spec. Extend the reconciler to enforce the stdio transport hard cap (force to 1 and set a status condition whenspec.transport == stdio && spec.replicas > 1) and emit aWarningstatus condition whenspec.replicas > 1but session storage is absent or set tomemory. These changes implement RC-3 and RC-14 of epic THV-0047 for the MCPServer resource.Context
Epic THV-0047 adds horizontal scaling support to the ToolHive operator. TASK-001 (#267) and TASK-002 (#269) added the
Replicas,BackendReplicas, andSessionStoragefields to the CRD specs; TASK-003 (#275) regenerated the deepcopy functions and CRD YAML manifests, making the new fields available at compile time. This task wires those fields into the MCPServer reconciler and deployment builder so they are actually respected at runtime.The existing
deploymentForMCPServerinmcpserver_controller.gohardcodesreplicas := int32(1)and never setsterminationGracePeriodSeconds. The existing reconciler at lines 392–407 handles the stdio cap for externally-driven scaling (HPA orkubectl scale), but has no path for spec-driven replica values. This task adds both the spec-driven replica path and the session storage validation warning.Dependencies: #275 (TASK-003 — Regenerate deepcopy and CRD manifests after scaling type changes)
Blocks: TASK-008 (Unit Tests)
Acceptance Criteria
deploymentForMCPServersetsDeployment.Spec.Replicastospec.replicaswhenspec.replicasis non-nil; whenspec.replicasis nil,Deployment.Spec.Replicasis left nil (not set to any default), allowing external scaling tools (HPA, KEDA,kubectl scale) to manage replica counts freelydeploymentForMCPServersetsTerminationGracePeriodSecondsonPodSpecusing the value from a named constant or a documented default (e.g., 30 seconds); the field must appear in the generatedDeploymenteven whenspec.replicasis nilspec.transport == "stdio"andspec.replicas != nil && *spec.replicas > 1, the reconciler forces replicas to 1 in the createdDeployment, sets aWarningstatus condition with a descriptive message, and requeuesmcpserver_controller.golines 392–407 continues to function correctly and is not broken by this change — the two paths (spec-driven vs externally-driven) are distinctWarningstatus condition whenspec.replicas != nil && *spec.replicas > 1and eitherspec.sessionStorageis nil orspec.sessionStorage.provideris"memory"— the deployment proceeds (no hard rejection); the condition is cleared when the configuration is correctedSpec.Replicasis not overwritten (onlySpec.Template,Spec.Selector,Labels, andAnnotationsare updated), following the existing pattern at lines 452–469 — this preserves the nil-passthrough invariant on updatecmd/thv-operator/api/v1alpha1/mcpserver_types.go, following the existing naming conventiongo build ./cmd/thv-operator/...passesgo test ./cmd/thv-operator/...(excluding integration tests) passes, including all pre-existing tests inmcpserver_replicas_test.goTechnical Approach
Recommended Implementation
1. Add new condition type and reason constants to
mcpserver_types.goBefore touching the controller, define the new condition types and reasons. Follow the existing
constblock pattern inmcpserver_types.go:2. Extend
deploymentForMCPServerinmcpserver_controller.goReplace the hardcoded
replicas := int32(1)at line 934 with spec-driven nil-passthrough logic, and addterminationGracePeriodSecondsto the pod spec.For replicas, the logic is:
spec.replicas != nilandspec.transport == "stdio"and*spec.replicas > 1: useint32Ptr(1)(the spec-driven stdio cap is enforced here in the builder; a status condition is set in the reconciler before this call)spec.replicas != nil: usespec.replicasdirectly (pass the pointer through)spec.replicas == nil: setDeployment.Spec.Replicasto nil (omit it entirely)In the
appsv1.DeploymentSpecstruct literal, changeReplicas: &replicasto use the resolved value, which may be nil. AddTerminationGracePeriodSecondson the pod spec:Define the constant and helper:
3. Extend the reconciler's pre-creation / pre-update path
Insert two new validation steps in
Reconcile(), after the existing image and template validation and before thedeploymentForMCPServercall. These are distinct from the existing externally-driven cap at lines 392–407 (which runs after the deployment is fetched):Step A — Spec-driven stdio cap: If
spec.transport == "stdio"andspec.replicas != nil && *spec.replicas > 1:Warningstatus condition usingmeta.SetStatusConditiondeploymentForMCPServer)Step B — Session storage validation: If
spec.replicas != nil && *spec.replicas > 1:spec.sessionStorage == nilorspec.sessionStorage.provider != "redis": set aWarningstatus condition; do not block deployment4. Update the deployment update path
No changes are needed to
deploymentNeedsUpdateor the deployment update logic at lines 452–469. Those lines already avoid overwritingSpec.Replicason update, which is the correct behavior. Verify this is true after the deployment builder changes.Patterns & Frameworks
mcpserver_controller.golines 392–407: externally-driven cap. The new spec-driven cap is a separate code path added indeploymentForMCPServeritself and in reconciler pre-creation validation. Do not merge the two paths.meta.SetStatusCondition(&mcpServer.Status.Conditions, metav1.Condition{...})followed byr.Status().Update(ctx, mcpServer). SeesetCABundleRefCondition/validateCABundleRefinmcpserver_controller.golines 553–616 for the exact pattern to follow for helper functions.deploymentNeedsUpdatefunction at lines 1505+ already avoids overwritingSpec.Replicason updates;deploymentForMCPServermust produce a nilSpec.Replicaswhenspec.replicas == nilso this invariant holds on initial creation too.int32Ptrhelper — Already defined at line 1859 ofmcpserver_controller.go. Add a companionint64PtrforTerminationGracePeriodSeconds.r.Status().Updateimmediately aftermeta.SetStatusCondition; log but do not return on status update errors (follow the pattern of existing condition setters).Code Pointers
cmd/thv-operator/controllers/mcpserver_controller.goline 934 — Hardcodedreplicas := int32(1); this is the primary change point indeploymentForMCPServercmd/thv-operator/controllers/mcpserver_controller.golines 1191–1263 — The fullDeploymentstruct literal indeploymentForMCPServer;Spec.ReplicasandSpec.Template.Spec.TerminationGracePeriodSecondsare set herecmd/thv-operator/controllers/mcpserver_controller.golines 392–407 — Existing externally-driven stdio cap; new spec-driven logic must not disturb this blockcmd/thv-operator/controllers/mcpserver_controller.golines 452–469 — Deployment update path that preservesSpec.Replicas; verify it remains correct after builder changescmd/thv-operator/controllers/mcpserver_controller.golines 553–616 —setCABundleRefConditionandvalidateCABundleRef— pattern for helper condition-setter functions and status update callscmd/thv-operator/controllers/mcpserver_controller.goline 1859 —int32Ptrhelper; addint64Ptrcompanion herecmd/thv-operator/api/v1alpha1/mcpserver_types.golines 12–69 — Existing condition type and reason constants; new constants go in newconstblocks here following the same conventioncmd/thv-operator/controllers/mcpserver_replicas_test.go— Existing replica tests (HPA-scenario); new spec-driven tests for TASK-008 extend this file using the samefake.NewClientBuilderpatternComponent Interfaces
New condition type and reason constants (in
mcpserver_types.go):Replica resolution logic in
deploymentForMCPServer(illustrative shape, not prescriptive):Session storage validation call site in
Reconcile()(shape only):Testing Strategy
Tests for this task's behaviors are written in TASK-008. The existing
mcpserver_replicas_test.gocovers the externally-driven HPA scenario; TASK-008 extends it with spec-driven cases. For reference, the test pattern to follow:Unit Tests
TestSpecDrivenReplicasNil—deploymentForMCPServerwithspec.replicas == nilproduces aDeploymentwithSpec.Replicas == nilTestSpecDrivenReplicas1—deploymentForMCPServerwithspec.replicas = 1producesSpec.Replicas = 1TestSpecDrivenReplicas3SseTransport—deploymentForMCPServerwithspec.replicas = 3,spec.transport = "sse"producesSpec.Replicas = 3TestSpecDrivenStdioCapAt1— reconciler withspec.transport = "stdio"andspec.replicas = 3creates aDeploymentwithSpec.Replicas = 1and sets theStdioReplicaCappedstatus conditionTestSessionStorageWarningConditionSet— reconciler withspec.replicas = 2and nosessionStoragesets theSessionStorageWarningcondition with statusTrueTestSessionStorageWarningConditionCleared— reconciler withspec.replicas = 2andsessionStorage.provider = "redis"sets theSessionStorageWarningcondition with statusFalseTestTerminationGracePeriodSet—deploymentForMCPServerproduces aDeploymentwithSpec.Template.Spec.TerminationGracePeriodSecondsset to the expected default valueIntegration Tests
Edge Cases
spec.replicas == niland the live deployment has 1 replica (no spurious requeue)Spec.Replicaswhenspec.replicas == nilspec.replicas = nilafter previously havingspec.replicas = 2does not setDeployment.Spec.Replicasto any value (nil passthrough on update)Out of Scope
backendReplicasand Redis config — covered by TASK-005terminationGracePeriodSecondschanges — covered by TASK-006cmd/thv-operator/mcpserver_runconfig.go,virtualmcpserver_deployment.go, orvirtualmcpserver_vmcpconfig.goReferences
cmd/thv-operator/controllers/mcpserver_controller.golines 392–407cmd/thv-operator/controllers/mcpserver_controller.golines 452–469cmd/thv-operator/controllers/mcpserver_controller.golines 553–616cmd/thv-operator/controllers/mcpserver_replicas_test.go