Skip to content

Fix flaky E2E lifecycle tests: tool discovery race#4107

Merged
JAORMX merged 1 commit intomainfrom
fix/flaky-e2e-lifecycle-tool-discovery
Mar 11, 2026
Merged

Fix flaky E2E lifecycle tests: tool discovery race#4107
JAORMX merged 1 commit intomainfrom
fix/flaky-e2e-lifecycle-tool-discovery

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 11, 2026

Summary

  • The "E2E Tests Lifecycle" workflow is flaky because tool discovery is session-scoped: when a backend isn't fully ready at session creation time, it's silently skipped, producing incomplete tool lists. Tests then fail because they assert on specific tool names outside the Eventually retry loop (or have no retry at all).
  • Move all tool-name assertions inside Eventually loops so each retry creates a fresh MCP session, triggering fresh backend discovery. Add reusable helper functions to reduce boilerplate across tests.

Type of change

  • Bug fix

Test plan

  • Linting (task lint-fix)
  • Manual testing (describe below)

These are E2E operator tests that require a KIND cluster. The fix is structural (moving assertions inside retry loops) and does not change test semantics — only retry behavior. Full verification requires the "E2E Tests Lifecycle" CI workflow to pass across all K8s versions.

Changes

File Change
test/e2e/thv-operator/virtualmcp/wait_for_tools_helpers.go New file: WaitForExpectedTools, WaitForExpectedToolsWithAuth, ToolsContainAll, ToolsContainSubstring, ToolsHavePrefix helpers
test/e2e/thv-operator/virtualmcp/virtualmcp_toolconfig_test.go 4 It blocks: move tool-name assertions inside WaitForExpectedTools
test/e2e/thv-operator/virtualmcp/virtualmcp_conflict_resolution_test.go 7 It blocks: wrap with WaitForExpectedTools (previously had NO retry)
test/e2e/thv-operator/virtualmcp/virtualmcp_optimizer_circuit_breaker_test.go 1 It block: wrap callFindTool for echo + fetch inside Eventually
test/e2e/thv-operator/virtualmcp/virtualmcp_auth_discovery_test.go 1 It block: replace InitializeMCPClientWithRetries + immediate assert with WaitForExpectedToolsWithAuth

Does this introduce a user-facing change?

No

Special notes for reviewers

  • WaitForExpectedToolsWithAuth uses ginkgo.DeferCleanup to ensure the last MCP client is closed even if Eventually exhausts retries and Ginkgo panics before the caller can defer Close().
  • TestToolListingAndCall (in helpers.go) has the same single-shot pattern but is partially mitigated by Ordered suites where preceding It blocks warm up backends. Worth tracking as follow-up but not blocking for this PR.
  • Other test files (virtualmcp_aggregation_*.go, virtualmcp_optimizer_test.go, etc.) also use single-shot client creation. These haven't been observed to flake and can be addressed in a follow-up.

Generated with Claude Code

Move tool-name assertions inside Eventually retry loops so tests
retry with new MCP sessions until all backends are fully discovered.

The root cause is that tool discovery is session-scoped: when a
backend isn't ready when a session is created, it's silently skipped,
producing incomplete tool lists. Tests then fail because they assert
on specific tool names outside the retry loop (or have no retry at
all).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Mar 11, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.62%. Comparing base (144a4da) to head (9e82616).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4107      +/-   ##
==========================================
- Coverage   68.68%   68.62%   -0.07%     
==========================================
  Files         453      453              
  Lines       46011    46011              
==========================================
- Hits        31604    31575      -29     
- Misses      11967    11998      +31     
+ Partials     2440     2438       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX merged commit de6e434 into main Mar 11, 2026
56 of 58 checks passed
@JAORMX JAORMX deleted the fix/flaky-e2e-lifecycle-tool-discovery branch March 11, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants