Skip to content

[wasm][coreclr] Batch CoreCLR library test suites on Helix#126157

Merged
radekdoulik merged 13 commits intomainfrom
batch-wasm-coreclr-library-tests
Apr 10, 2026
Merged

[wasm][coreclr] Batch CoreCLR library test suites on Helix#126157
radekdoulik merged 13 commits intomainfrom
batch-wasm-coreclr-library-tests

Conversation

@radekdoulik
Copy link
Copy Markdown
Member

@radekdoulik radekdoulik commented Mar 26, 2026

Note

This PR description was AI/Copilot-generated.

Summary

Reduce Helix queue pressure by grouping ~172 individual WASM CoreCLR library test work items into ~21 batched work items (88% reduction), saving 56% of total machine time.

Verified CI Results

All numbers from the latest CI run (build 1361877), compared against an unbatched baseline from the same queue:

Metric Unbatched Batched Change
Work items 172 21 -88%
Tests run 259,527 259,527 identical
Tests passed 256,963 256,963 identical
Tests failed 0 0
Total machine time 437m (7.3h) 195m (3.2h) -56%
Longest 30m13s well within timeout (160m for 8-suite batch) batch
Queue slots consumed 172 21 -151

Per-Batch Breakdown (latest run)

Batch Suites Duration Batch Suites Duration
Batch-4 8 30m13s Batch-14 9 7m16s
Batch-6 8 19m07s Batch-1 8 7m12s
Batch-5 8 18m23s Batch-12 9 7m06s
Batch-8 9 17m16s Batch-9 9 6m57s
Batch--1 1 13m29s Batch-15 9 5m59s
Batch-0 7 10m37s Batch-3 8 5m42s
Batch-10 9 10m16s Batch-2 8 5m40s
Batch-19 9 5m33s Batch-17 9 4m30s
Batch-13 9 5m03s Batch-7 8 3m38s
Batch-11 9 4m42s Batch-18 9 3m19s
Batch-16 9 2m48s

Where the Savings Come From

The 56% machine time reduction (~245 minutes saved) comes from eliminating Helix per-work-item overhead:

Overhead source Per item Items eliminated Total saved
Helix scheduling + payload download ~45s 151 ~113m
Artifact upload + reporting ~30s 151 ~75m
Queue slot acquisition ~15s 151 ~38m
Total ~1.5m 151 ~227m

Chrome and xharness are not re-downloaded per they are in the Helix correlation payload (shared per machine). Chrome restarts per suite within batches (~1s each), so reusing Chrome would only save ~3 minutes total (not worth xharness changes).item

Changes

  • eng/testing/WasmBatchRunner.sh (new): Batch runner script that extracts and runs multiple test suites sequentially within a single Helix work item, with per-suite result isolation via separate HELIX_WORKITEM_UPLOAD_ROOT directories. Includes error handling for failed extractions (preserving actual unzip exit code) and restores HELIX_WORKITEM_UPLOAD_ROOT after the batch loop for Helix post-commands.
  • src/libraries/sendtohelix-browser.targets (modified):
    • WasmBatchLibraryTests property (defaults true for CoreCLR+Chrome, false otherwise)
    • _AddBatchedWorkItemsForLibraryTests target: stages batch directories, zips them (PayloadArchive), and constructs HelixCommand via Replace with a build-time validation guard
    • Sample apps excluded from batching, kept as individual work items
    • Original target gated on WasmBatchLibraryTests != true
  • src/tasks/HelixTestTasks/ (new project):
    • GroupWorkItems compiled MSBuild task: greedy bin-packing by file size, large suites (>50MB) stay solo
    • ComputeBatchTimeout compiled MSBuild task: 20 min/suite, 30 min minimum (accounts for WASM startup overhead + heaviest suites like Cryptography at ~17m)
    • Non-shipping task assembly, referenced via UsingTask from sendtohelix-browser.targets

Design

The batch runner is a thin loop wrapper around each suite's existing RunTests.sh. It does not duplicate any test setup all xharness/Chrome/WASM configuration stays inside each suite's generated runner script. The coupling surface is minimal:logic

  • HelixCommand prefix (env vars, dev-certs) is preserved via .Replace() on the ./RunTests.sh suffix
  • A build-time <Error> guard validates the replacement succeeded
  • Each suite's HELIX_WORKITEM_UPLOAD_ROOT is isolated to a per-suite subdirectory
  • Extracted suite directories are cleaned up between runs to free disk space
  • Stale batch staging directories are cleaned via <RemoveDir> before each build

Opt-out

Disable with /p:WasmBatchLibraryTests=false to fall back to individual work items.

@radekdoulik radekdoulik added this to the Future milestone Mar 26, 2026
@radekdoulik radekdoulik added NO-REVIEW Experimental/testing PR, do NOT review it area-Infrastructure-coreclr labels Mar 26, 2026
Copilot AI review requested due to automatic review settings March 26, 2026 16:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Helix queue pressure for WASM CoreCLR library testing by batching many individual test-suite work items into a smaller number of larger work items, using a batch runner to execute multiple suites sequentially with isolated result uploads.

Changes:

  • Add a WASM batch runner script to unzip and run multiple test suites sequentially inside one Helix work item.
  • Extend sendtohelix-browser.targets to optionally generate batched Helix work items via an MSBuild bin-packing step and per-batch timeout computation.
  • Adjust browser/CoreCLR Helix and xharness timeouts, and update the browser/CoreCLR test exclusion list.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/libraries/tests.proj Updates the browser/CoreCLR disabled-test list (significant exclusion removals).
src/libraries/sendtohelixhelp.proj Increases default Helix work item timeout for browser/CoreCLR.
src/libraries/sendtohelix-browser.targets Adds batching mode, grouping/timeout tasks, and a new target to emit batched Helix work items.
eng/testing/tests.wasm.targets Increases xharness timeout default for CoreCLR WASM test runs.
eng/testing/WasmBatchRunner.sh New script to run multiple suite zips in one work item with per-suite upload directories.

radekdoulik and others added 2 commits March 26, 2026 18:06
Reduce Helix queue pressure by grouping ~172 individual WASM CoreCLR
library test work items into ~23 batched work items (87% reduction).

Changes:
- Add eng/testing/WasmBatchRunner.sh: batch runner that extracts and
  runs multiple test suites sequentially within a single work item,
  with per-suite result isolation
- Add greedy bin-packing inline MSBuild task (_GroupWorkItems) that
  distributes test archives into balanced batches by file size
- Add _AddBatchedWorkItemsForLibraryTests target gated on
  WasmBatchLibraryTests property (defaults true for CoreCLR+Chrome)
- Sample apps excluded from batching, kept as individual work items
- Can be disabled with /p:WasmBatchLibraryTests=false

Expected impact:
- 172 → ~23 Helix work items (87% queue pressure reduction)
- ~6% machine time savings (~26 minutes)
- Longest batch ~18 minutes (well-balanced bin-packing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused EXECUTION_DIR variable from WasmBatchRunner.sh
- Use PayloadArchive (ZIP) instead of PayloadDirectory to pass
  sendtohelixhelp.proj validation
- Use HelixCommand with RunTests.sh→WasmBatchRunner.sh substitution
  to preserve env var setup and pre-commands

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radekdoulik radekdoulik force-pushed the batch-wasm-coreclr-library-tests branch from cf67805 to a751466 Compare March 26, 2026 17:09
@github-actions

This comment has been minimized.

Batch--1 (1 item) and Batch-5 (8 items) timed out in CI because
the 2min/suite formula was too aggressive. System.IO.Compression
alone takes 11m, System.Security.Cryptography takes 17m, and
Microsoft.Bcl.Memory takes 6m. With 19/21 batches passing and
the longest at 17m24s, a 30m minimum provides adequate headroom.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 22:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

@github-actions

This comment has been minimized.

@radekdoulik
Copy link
Copy Markdown
Member Author

/azp run runtime

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

- Restore HELIX_WORKITEM_UPLOAD_ROOT after batch loop for post-commands
- Clean up extracted suite directories to free disk between suites
- Remove stale batch staging directory before creating new batches
- Fix stale timeout comment to match actual values (20m/suite, 30m min)
- Remove dead V8/Firefox conditions (batching only runs for Chrome)
- Remove unused System.Linq import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

@radekdoulik
Copy link
Copy Markdown
Member Author

copilot's analysis:

● 🎉 All 21/21 batches PASSED!

  ┌─────────────────────┬────────────────────┬─────────────────┬─────────────────────┐
  │ Metric              │ This run           │ Previous run    │ Unbatched baseline  │
  ├─────────────────────┼────────────────────┼─────────────────┼─────────────────────┤
  │ Work items          │ 21                 │ 21              │ 172                 │
  ├─────────────────────┼────────────────────┼─────────────────┼─────────────────────┤
  │ Passed              │ 21/21              │ 19/21           │ —                   │
  ├─────────────────────┼────────────────────┼─────────────────┼─────────────────────┤
  │ Total machine time  │ 192m (3.2 hrs)     │ 183m (3.0 hrs)  │ 437m (7.3 hrs)      │
  ├─────────────────────┼────────────────────┼─────────────────┼─────────────────────┤
  │ Longest batch       │ Batch-5: 24m42s    │ Batch-8: 17m24s │ —                   │
  ├─────────────────────┼────────────────────┼─────────────────┼─────────────────────┤
  │ Wall-clock (approx) │ ~25m               │ ~25m            │ ~25m + queue spread │
  └─────────────────────┴────────────────────┴─────────────────┴─────────────────────┘

  Summary: All batches pass with the new timeout formula. Total machine time is 192 minutes — 56% savings vs unbatched (437m). Batch-5 is the longest at ~25m, well within the 30m minimum timeout. The previous
  run's 2 timeout failures are resolved.

@radekdoulik
Copy link
Copy Markdown
Member Author

The tests counts look good too:

● Here are the results across all 3 runs:

  ┌──────────────────┬────────────────────┬──────────────────┬─────────────────┐
  │                  │ Unbatched baseline │ Previous (19/21) │ Current (21/21) │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Work items       │ 172                │ 21               │ 21              │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Suites           │ 172                │ 169              │ 172             │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Tests run        │ 259,527            │ 253,647          │ 259,527         │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Passed           │ 256,963            │ 251,910          │ 256,963         │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Failed           │ 0                  │ 0                │ 0               │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Skipped          │ 2,564              │ 1,737            │ 2,564           │
  ├──────────────────┼────────────────────┼──────────────────┼─────────────────┤
  │ Machine time     │ 437m (7.3h)        │ 183m (3.0h)      │ 192m (3.2h)     │
  └──────────────────┴────────────────────┴──────────────────┴─────────────────┘

  Key takeaways:

   - Current run matches the unbatched baseline exactly: 259,527 tests run, 256,963 passed, 0 failed
   - Previous run was missing 3 suites (5,880 tests) due to 2 batch timeouts — now fully resolved
   - 56% machine time savings with zero test loss

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 19:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 7, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 9, 2026 21:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Note

This review was generated by Copilot (Claude Opus 4.6 + Claude Sonnet 4.6 multi-model review).

🤖 Copilot Code Review — PR #126157

Holistic Assessment

Motivation: Reducing Helix queue pressure from ~172 to ~21 work items (88% reduction) is a well-justified infrastructure improvement. Helix per-work-item overhead (scheduling, payload download, artifact upload, queue slot acquisition) is real and documented. The claimed 56% total machine time savings (~245 minutes) is plausible and supported by the per-batch breakdown in the PR description.

Approach: Greedy bin-packing (Longest Processing Time) by file size is a well-known heuristic for makespan minimization and is appropriate here. The WasmBatchRunner.sh wrapper is a thin loop around each suite's existing RunTests.sh, keeping the coupling surface minimal. The HelixCommand.Replace('./RunTests.sh', ...) approach correctly preserves the env var setup and dev-certs chain from the original command. Gating behind WasmBatchLibraryTests (defaulting true only for CoreCLR+Chrome) limits blast radius, and the opt-out via /p:WasmBatchLibraryTests=false is a good escape hatch.

Summary: ⚠️ Needs Human Review. The implementation is structurally sound and addresses all feedback from previous review rounds (timeout calibration, HELIX_WORKITEM_UPLOAD_ROOT restoration, stale batch staging cleanup). No blocking correctness bugs were found by the primary reviewer. However, one sub-agent flagged a potential timeout regression for large solo batches that warrants human verification, and a few robustness improvements are recommended. A human reviewer should confirm the timeout behavior for isolated large suites (>50 MB) against actual CI data.


Detailed Findings

⚠️ Timeout for Large Solo Batches — Verify Against Actual CI Data

Files: ComputeBatchTimeout.cs, GroupWorkItems.cs

Large items (>50 MB) are isolated into solo batches with negative BatchIds. Because each solo batch contains exactly one suite, ComputeBatchTimeout computes Math.Max(30, 1 * 20) = 30 minutes. The pre-batching per-item timeout for browser CoreCLR is 90 minutes (sendtohelixhelp.proj:42).

If any of the large solo suites currently needs more than 30 minutes (e.g., System.Security.Cryptography.Tests runs ~17 min, but CI variability exists), this is a regression. The PR description's per-batch breakdown shows Batch--1 at 13m29s — well under 30 min — but a human should verify this holds under CI load variability.

Consider either: passing the original _workItemTimeout to ComputeBatchTimeout for count == 1 solo batches, or verifying the 30-minute minimum is adequate for the largest suites with margin.

(Flagged by Claude Sonnet 4.6)


✅ HelixCommand Substitution — Correct

HelixCommand for non-Windows browser tests is constructed as <prefix> && dotnet dev-certs https && ./RunTests.sh (from sendtohelixhelp.proj:207-223). Since IncludeHelixCorrelationPayload is false in sendtohelix-browser.targets:57, there is no --runtime-path suffix — so HelixCommand always ends with exactly ./RunTests.sh. The .Replace('./RunTests.sh', 'chmod +x WasmBatchRunner.sh && ./WasmBatchRunner.sh') correctly targets this literal string. The build-time <Error> guard (line 347) validates the replacement succeeded.

(Confirmed by both Claude Opus 4.6 and Claude Sonnet 4.6)


✅ Batch Runner Script — Well-Structured

WasmBatchRunner.sh correctly:

  • Preserves ORIGINAL_UPLOAD_ROOT before the loop and restores it after (line 80)
  • Isolates each suite's test results via per-suite HELIX_WORKITEM_UPLOAD_ROOT subdirectories
  • Captures unzip exit codes and handles extraction failures gracefully
  • Cleans up extracted suite directories after each run (rm -rf "$suiteDir") to manage disk pressure
  • Runs all suites even if some fail, then reports aggregate results
  • Exits 1 if any suite failed, 0 otherwise

The for zipFile in "$BATCH_DIR"/*.zip glob correctly matches only suite ZIPs (not WasmBatchRunner.sh itself).

(Confirmed by both reviewers)


✅ Bin-Packing Algorithm — Sound

GroupWorkItems.cs implements standard LPT greedy bin-packing: sort items by size descending, assign each to the batch with the smallest current total. Large items (>50 MB) are isolated into solo batches with negative IDs. The File.Exists check gracefully handles missing files (defaults to size 0). The defensive clamp on BatchSize and LargeThreshold prevents bad MSBuild property values from causing exceptions.


✅ Build Integration — Correct

The HelixTestTasksAssemblyPath in Directory.Build.props includes $(NetCoreAppToolCurrent) in the path, matching the default AppendTargetFrameworkToOutputPath=true behavior of the project. The pretest.proj conditionally builds it for TargetOS == 'browser'. The UsingTask declarations correctly reference the assembly path.


✅ Stale Staging Cleanup — Addressed

RemoveDir Directories="$(IntermediateOutputPath)helix-batches/" (line 329) cleans stale batch staging from previous runs before creating fresh directories. This prevents stale ZIPs from previous groupings from being repackaged.


💡 Add Existence Check for RunTests.sh

File: eng/testing/WasmBatchRunner.sh

If a suite ZIP is malformed or lacks RunTests.sh, the chmod +x RunTests.sh silently fails and ./RunTests.sh exits 127 with a cryptic shell error. Adding an explicit check would improve CI diagnostics:

if [[ ! -f "RunTests.sh" ]]; then
    echo "ERROR: RunTests.sh not found in $suiteDir after extraction"
    suiteExitCode=1
    # ... handle as failure
fi

This is a robustness improvement, not blocking — in practice all test suite ZIPs contain RunTests.sh.

(Flagged by Claude Sonnet 4.6)


💡 Consider Log.LogWarning for Missing ZIP Files

File: GroupWorkItems.cs

if (File.Exists(item.ItemSpec))
    size = new FileInfo(item.ItemSpec).Length;

A missing ZIP is silently treated as 0 bytes and packed into a batch. If a test archive wasn't built (e.g., due to a prior build failure), a Log.LogWarning when File.Exists returns false would make this visible at build time instead of at Helix runtime.

(Flagged by Claude Sonnet 4.6)


💡 Consider Explicit <Timeout> in HelixWorkItem

File: sendtohelix-browser.targets (lines 354-357)

<HelixWorkItem Include="@(_WasmTimedBatchItem)">
    <PayloadArchive>...</PayloadArchive>
    <Command>$(_WasmBatchHelixCommand)</Command>
    <!-- Timeout flows through from _WasmTimedBatchItem metadata -->
</HelixWorkItem>

The Timeout metadata flows through via MSBuild item metadata passthrough, which works correctly. However, every other HelixWorkItem in this file has an explicit <Timeout> child element. Adding <Timeout>%(Timeout)</Timeout> would make the intent explicit and match the surrounding style.

(Flagged by Claude Sonnet 4.6)

Generated by Code Review for issue #126157 ·

@radekdoulik
Copy link
Copy Markdown
Member Author

Lets get this in to relieve CI. I will do another batch of improvements in followup PR, where I would like to enable it on mono legs too.

@radekdoulik radekdoulik merged commit 6adf63b into main Apr 10, 2026
170 of 176 checks passed
@radekdoulik radekdoulik deleted the batch-wasm-coreclr-library-tests branch April 10, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants