Skip to content

Add tests for horizontal scaling operator behaviors#4388

Merged
yrobla merged 3 commits intomainfrom
issue-4220
Mar 27, 2026
Merged

Add tests for horizontal scaling operator behaviors#4388
yrobla merged 3 commits intomainfrom
issue-4220

Conversation

@yrobla
Copy link
Copy Markdown
Contributor

@yrobla yrobla commented Mar 26, 2026

Summary

Adds test coverage for the horizontal scaling operator behaviors introduced in
the companion PRs (THV-0047). The existing unit tests were added incrementally
with each implementation task; this PR closes gaps at three levels:

Unit tests (cmd/thv-operator/controllers/):

  • TestConfigMapContent_SessionStorage: verifies ensureVmcpConfigConfigMap
    produces a ConfigMap YAML with sessionStorage populated when
    provider=redis, and absent when nil or memory.
  • TestDeploymentForVirtualMCPServer_WithRedisPassword: verifies
    deploymentForVirtualMCPServer injects THV_SESSION_REDIS_PASSWORD as a
    SecretKeyRef env var and never as a plaintext value.

Integration tests (envtest — cmd/thv-operator/test-integration/):

  • MCPServer: ScalingConfig (backend_replicas + session_redis) is written
    to the RunConfig ConfigMap when backendReplicas and Redis sessionStorage
    are set; omitted when absent.
  • VirtualMCPServer: spec.replicas=3 is reflected in Deployment.Spec.Replicas;
    nil spec.replicas preserves an externally-set replica count across
    reconciliations (HPA-compatible contract, gated on status.observedGeneration).

E2E tests (real Kind cluster — test/e2e/thv-operator/virtualmcp/):

  • virtualmcpserver_scaling_test.go: two ordered contexts covering the full
    cluster-level lifecycle:
    • Static: replicas=2 at creation → 2 ready pods, SessionStorageWarning
      condition set.
    • Scale lifecycle (1→2→1): Deployment replicas update in-place, 2 pods come
      up, SessionStorageWarning fires on scale-up and clears on scale-down.

Fixes #4220

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe): testing

Test plan

  • Unit tests (task test)
  • Integration tests (task operator-test-integration)
  • E2E tests (task thv-operator-e2e-test-run)
  • Linting (task lint-fix)

Does this introduce a user-facing change?

No.

Large PR Justification

This adds complete testing to the feature referenced. Cannot be split.

Add SessionStorageConfig to pkg/vmcp/config.Config and populate it in
the VirtualMCPServer converter when spec.sessionStorage.provider is
"redis". This allows vMCP pods to discover Redis connection parameters
(address, db, keyPrefix) at startup via the existing config injection
path. The Redis password is excluded from the config type and is
injected separately as the THV_SESSION_REDIS_PASSWORD environment
variable by the deployment builder.

Closes: #4214
@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Mar 26, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.49%. Comparing base (f7ab64a) to head (2d549d7).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4388   +/-   ##
=======================================
  Coverage   69.48%   69.49%           
=======================================
  Files         485      485           
  Lines       49576    49585    +9     
=======================================
+ Hits        34450    34460   +10     
+ Misses      12461    12457    -4     
- Partials     2665     2668    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 26, 2026
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 26, 2026
@yrobla yrobla requested review from Copilot March 26, 2026 16:06
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated coverage around horizontal scaling behaviors in the ToolHive operator, focusing on VirtualMCPServer session storage configuration and Redis password injection, plus integration/e2e coverage for replica behavior and RunConfig scaling fields.

Changes:

  • Added VirtualMCPServer e2e scenario coverage for scaling (replicas 1↔2) and SessionStorageWarning condition behavior.
  • Added envtest integration coverage for VirtualMCPServer replica handling (explicit replicas + nil/HPA-compat behavior) and MCPServer RunConfig scaling config population.
  • Added controller-level unit tests verifying vMCP ConfigMap sessionStorage YAML content and Redis password SecretKeyRef env var injection.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/e2e/thv-operator/virtualmcp/virtualmcpserver_scaling_test.go New real-cluster e2e coverage for VMCP replica scaling and SessionStorageWarning behavior
cmd/thv-operator/test-integration/virtualmcp/virtualmcpserver_replicas_integration_test.go New envtest integration checks for VMCP replica passthrough + HPA-compat preservation
cmd/thv-operator/test-integration/mcp-server/mcpserver_runconfig_integration_test.go Adds integration assertions for RunConfig ScalingConfig presence/absence based on spec
cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig_test.go Adds unit test for ConfigMap YAML sessionStorage section population/omission
cmd/thv-operator/controllers/virtualmcpserver_deployment_test.go Adds unit test ensuring Redis password is injected via SecretKeyRef env var (no plaintext)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yrobla yrobla changed the title Add controller-level unit tests for horizontal scaling behaviors Add tests for horizontal scaling operator behaviors Mar 26, 2026
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 26, 2026
Unit tests (controllers package):
- TestConfigMapContent_SessionStorage: verifies that ensureVmcpConfigConfigMap
  produces a ConfigMap YAML with the sessionStorage section populated when
  provider=redis, and absent when nil or memory.
- TestDeploymentForVirtualMCPServer_WithRedisPassword: verifies that
  deploymentForVirtualMCPServer injects THV_SESSION_REDIS_PASSWORD as a
  SecretKeyRef env var and never as a plaintext value.

Integration tests (envtest with real CRDs + controller loop):
- MCPServer: ScalingConfig (backend_replicas + session_redis) is written to
  the RunConfig ConfigMap when backendReplicas and Redis sessionStorage are
  set; omitted when absent.
- VirtualMCPServer: spec.replicas is reflected in Deployment.Spec.Replicas;
  nil replicas produces a nil Deployment replicas field for HPA compatibility.

E2E tests (Kind cluster via thv-operator-e2e-test-run):
- VirtualMCPServer created with replicas=2 runs 2 ready pods and sets
  SessionStorageWarning condition when no Redis is configured.
- Scale lifecycle (1→2→1): verifies Deployment replicas update in-place,
  2 pods come up, SessionStorageWarning fires on scale-up and clears on
  scale-down.

Closes: #4220

Fix ignored AddToScheme errors in unit tests

Replace `_ = *.AddToScheme(scheme)` with `require.NoError(t, *.AddToScheme(scheme))`
in virtualmcpserver_deployment_test.go and virtualmcpserver_vmcpconfig_test.go so
that scheme registration failures cause an immediate, clear test failure rather than
a confusing error later with a partially configured scheme.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 26, 2026
…ation

Annotation changes do not increment metadata.generation on Kubernetes resources.
The previous trigger used `vmcp.Annotations["test/trigger-reconcile"] = "true"`,
which left generation unchanged, so `triggerGeneration = vmcp.Generation + 1`
was always one ahead of ObservedGeneration and the Eventually timed out.

Fix: trigger via `vmcp.Spec.ServiceType = "ClusterIP"` (the default value,
semantically a no-op) which IS a spec change and DOES increment Generation.
After the Update, controller-runtime populates the object in-place with the
server response, so `vmcp.Generation` is already the post-increment value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot removed the size/L Large PR: 600-999 lines changed label Mar 26, 2026
@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Mar 26, 2026
jerm-dro
jerm-dro previously approved these changes Mar 26, 2026
@jerm-dro jerm-dro self-requested a review March 26, 2026 20:46
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: The tests verify that the infrastructure scales correctly (replica counts, pod readiness, SessionStorageWarning conditions), but they don't send any MCP traffic through the scaled-out vMCP. Is this an intentional gap, or is cross-pod request handling planned for a follow-up?

Without a request hitting a different pod than the one that created the session, we're testing that pods exist but not that horizontal scaling actually works end-to-end. A few approaches that would demonstrate "any pod can handle any request":

  • Scale to 2 with Redis, create a session via tools/list, delete the pod that handled it, then send tools/call on the same session and verify it succeeds on the surviving pod
  • Scale to 0 and back to 1, then send a request — proves a completely fresh pod with no in-memory state can serve traffic
  • Send N requests through the Service and verify responses come back successfully (doesn't prove session affinity, but proves all pods are functional)

No preference on which approach — just looking for something that closes the loop beyond infrastructure assertions.

Base automatically changed from issue-4214 to main March 27, 2026 08:16
@yrobla yrobla dismissed jerm-dro’s stale review March 27, 2026 08:16

The base branch was changed.

@yrobla yrobla requested a review from amirejaz as a code owner March 27, 2026 08:16
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot dismissed their stale review March 27, 2026 08:20

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 27, 2026
@yrobla yrobla merged commit a771117 into main Mar 27, 2026
43 of 45 checks passed
@yrobla yrobla deleted the issue-4220 branch March 27, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write unit tests for horizontal scaling operator behaviors (THV-0047)

5 participants