Skip to content

Rebalance E2E test CI matrix from 4 to 8 buckets#4074

Merged
rdimitrov merged 2 commits intomainfrom
rebalance-e2e-test-matrix
Mar 10, 2026
Merged

Rebalance E2E test CI matrix from 4 to 8 buckets#4074
rdimitrov merged 2 commits intomainfrom
rebalance-e2e-test-matrix

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 10, 2026

Summary

  • The api E2E bucket had 191 test specs crammed into one 15m CI job — nearly 4x the ~53-test safe ceiling given compilation overhead. This caused flaky timeouts and random test kills.
  • Three test files (telemetry_metrics_validation, sse_endpoint_rewrite, network_isolation) had labels that didn't match any CI matrix filter, so they never ran in CI.
  • Splits the 4-bucket matrix into 8 balanced buckets (33–57 specs each), fixes the orphaned tests, and preserves backward compatibility so parent labels (mcp, api, core, etc.) still work for local development.

Type of change

  • Refactoring (no behavior change)

Test plan

  • Manual testing (describe below)

Verified all 21 modified test files compile (go build ./test/e2e/...). Confirmed no test file is orphaned by checking every Describe label against the CI matrix filters. Label changes are additive (sub-labels added alongside existing parent labels), so no test logic is affected.

Changes

File Change
.github/workflows/e2e-tests.yml Replace 4-entry matrix with 8 balanced buckets
test/e2e/telemetry_metrics_validation_e2e_test.go Add "middleware" label (was orphaned from CI)
test/e2e/sse_endpoint_rewrite_test.go Add "proxy" label (was orphaned from CI)
test/e2e/network_isolation_test.go Add "proxy" label (was orphaned from CI)
test/e2e/fetch_mcp_server_test.go Add "mcp-run" sub-label
test/e2e/osv_mcp_server_test.go Add "mcp-run" sub-label
test/e2e/osv_streamable_http_mcp_server_test.go Add "mcp-protocol" sub-label
test/e2e/remote_mcp_server_test.go Add "mcp-protocol" sub-label
test/e2e/protocol_builds_e2e_test.go Add "mcp-protocol" sub-label
test/e2e/inspector_test.go Add "mcp-protocol" sub-label
test/e2e/inspector_autocleanup_test.go Add "mcp-protocol" sub-label
test/e2e/api_registry_test.go Add "api-registry" sub-label
test/e2e/api_workloads_test.go Add "api-workloads" sub-label
test/e2e/api_workload_lifecycle_test.go Add "api-workloads" sub-label
test/e2e/api_clients_test.go Add "api-clients" sub-label
test/e2e/api_clients_validation_test.go Add "api-clients" sub-label
test/e2e/api_skills_test.go Add "api-clients" sub-label
test/e2e/api_discovery_test.go Add "api-misc" sub-label
test/e2e/api_groups_test.go Add "api-misc" sub-label
test/e2e/api_healthcheck_test.go Add "api-misc" sub-label
test/e2e/api_version_test.go Add "api-misc" sub-label
test/e2e/api_secrets_test.go Add "api-misc" sub-label
test/e2e/README.md Updated docs to reflect new 8-bucket structure

Special notes for reviewers

Bucket distribution:

Bucket Specs Before
core 57 57 (unchanged)
mcp-run 33 was part of mcp (70)
mcp-protocol 37 was part of mcp (70)
proxy-mw 53 42 + 11 orphaned
api-registry 41 was part of api (191)
api-workloads 56 was part of api (191)
api-clients 44 was part of api (191)
api-misc 50 was part of api (191)

CI cost tradeoff: 8 parallel jobs instead of 4 (more billable minutes from fixed overhead), but the longest job drops from 191 specs to 57 specs, so wall-clock time should decrease significantly.

Generated with Claude Code

@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 10, 2026
amirejaz
amirejaz previously approved these changes Mar 10, 2026
ChrisJBurns
ChrisJBurns previously approved these changes Mar 10, 2026
reyortiz3
reyortiz3 previously approved these changes Mar 10, 2026
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.49%. Comparing base (4bc341c) to head (9b466d1).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4074      +/-   ##
==========================================
- Coverage   68.56%   68.49%   -0.07%     
==========================================
  Files         446      446              
  Lines       45574    45574              
==========================================
- Hits        31246    31215      -31     
- Misses      11914    11947      +33     
+ Partials     2414     2412       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JAORMX and others added 2 commits March 10, 2026 16:37
The `api` bucket had 191 specs (nearly 4x the ~53-test comfort zone for
the 15m timeout), and 3 test files were orphaned from CI entirely
because their labels didn't match any matrix filter.

Split the matrix into 8 balanced buckets (33-57 specs each), fix the
orphaned tests, and preserve backward compatibility so the original
parent labels (mcp, api, core, etc.) still work for local development.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The proxy-mw bucket was timing out at 15 minutes after 3 previously
orphaned test files were added to it. Split into two buckets (proxy
with ~25 specs, middleware+stability with ~28 specs) so each fits
comfortably within the timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JAORMX JAORMX dismissed stale reviews from reyortiz3, ChrisJBurns, and amirejaz via 9b466d1 March 10, 2026 14:51
@JAORMX JAORMX force-pushed the rebalance-e2e-test-matrix branch from 901db47 to 9b466d1 Compare March 10, 2026 14:51
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
@rdimitrov rdimitrov merged commit c598eda into main Mar 10, 2026
115 of 124 checks passed
@rdimitrov rdimitrov deleted the rebalance-e2e-test-matrix branch March 10, 2026 17:20
@gkatz2
Copy link
Copy Markdown
Contributor

gkatz2 commented Mar 10, 2026

FYI — the new proxy E2E bucket is failing on main and on PRs (e.g., #4063) because sse_endpoint_rewrite_test.go uses CLI flags that don't exist:

  1. Lines 94, 224, 330: thv run --remote-url <url> — there is no --remote-url flag. Remote URLs are passed as a positional argument. Error: requires at least 1 arg(s), only received 0
  2. Line 418: thv list --registry — there is no --registry flag. Error: unknown flag: --registry

These tests were orphaned from CI before this PR brought them back, so the bugs have been there since the tests were written in January (#3210). Separate from the port allocation race being addressed in #4078.

ChrisJBurns added a commit that referenced this pull request Mar 10, 2026
The SSE endpoint rewrite tests were orphaned from CI until #4074
rebalanced the E2E buckets. Now that they run, all 4 fail:

- 3 tests used `--remote-url` which is not a CLI flag; the URL should
  be passed as the positional argument (the run command auto-detects
  URLs and treats them as remote servers).
- 1 test used `thv list --registry` which is not valid; the correct
  invocation is `thv registry list`.

Both flags were silently ignored because Cobra's UnknownFlags whitelist
is enabled on the run command, causing the "requires at least 1 arg"
error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@gkatz2 Yep, I had the PR in the works ready https://github.com/stacklok/toolhive/pull/4080/changes. Hopefully this removes the errors.

JAORMX added a commit that referenced this pull request Mar 11, 2026
These SSE endpoint rewrite tests were orphaned from CI until the matrix
rebalance (#4074), then had stale flags fixed (#4080), but have never
actually passed. Four root causes are addressed:

1. GenerateMCPServerURL omits /sse for remote SSE URLs with empty path,
   causing waitForInitializeSuccess to GET / instead of /sse (404).
2. getSSERewriteConfig reads X-Forwarded-* from the outbound request
   which has auto-injected headers from SetXForwarded(), converting
   relative endpoint URLs into absolute ones with the proxy's own host.
3. Test 3 mock server lacks a 404 fallback for non-/sse paths.
4. Test 4 (OSV + endpoint-prefix) fails due to #3372; skip it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aponcedeleonch pushed a commit that referenced this pull request Mar 11, 2026
Fix deterministic E2E proxy SSE endpoint rewrite test failures

These SSE endpoint rewrite tests were orphaned from CI until the matrix
rebalance (#4074), then had stale flags fixed (#4080), but have never
actually passed. Four root causes are addressed:

1. GenerateMCPServerURL omits /sse for remote SSE URLs with empty path,
   causing waitForInitializeSuccess to GET / instead of /sse (404).
2. getSSERewriteConfig reads X-Forwarded-* from the outbound request
   which has auto-injected headers from SetXForwarded(), converting
   relative endpoint URLs into absolute ones with the proxy's own host.
3. Test 3 mock server lacks a 404 fallback for non-/sse paths.
4. Test 4 (OSV + endpoint-prefix) fails due to #3372; skip it.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants