ci: safely split slow CLI coverage suites

## Problem Statement

The `cli-tests` job in `CI / Pull Request` currently takes close to six minutes, with nearly all of that time spent in the `Run CLI coverage` step rather than checkout, dependency install, or build setup.

From PR #4887's passing run (`CI / Pull Request`, run `27055172682`, job `79858083592`):

- `Install dependencies`: about 11 seconds
- `Build TypeScript plugin`: effectively negligible
- `Run CLI coverage`: about 5 minutes 36 seconds
- Inside that shell step, clean + `npm run build:cli` + sourcemap validation took about 10 seconds, leaving roughly 5 minutes 25 seconds for `npx vitest run --project cli --coverage ...`

The `cli` Vitest project is broad: it currently covers about 505 test files and about 6,500 test cases. Local timing without coverage, even on a run that hit local environment-sensitive failures, still took about 4 minutes 28 seconds. The cost appears to be the existing subprocess-heavy and timeout/retry-heavy CLI test surface, plus V8 coverage overhead.

The biggest local timing hotspots were:

- `test/cli.test.ts`
- `test/sandbox-connect-inference.test.ts`
- `test/nemoclaw-start.test.ts`
- `test/policies.test.ts`
- `test/onboard-selection.test.ts`
- `test/onboard.test.ts`

This is not a correctness regression from PR #4887, but the workflow split made the long-running CLI coverage job more visible.

## Proposed Design

Split or shard the slow CLI coverage work only after preserving the current behavior contract. The goal should be lower wall-clock time without changing which checks run, which tests are selected, or how coverage ratchets are enforced.

Suggested safe rollout:

1. Add measurement before changing behavior.
   - Split the current `Run CLI coverage` shell block into separately timed CI steps, or emit explicit timing markers for build, sourcemap validation, Vitest, and coverage ratchet.
   - Optionally upload Vitest JSON timing output as an artifact for the `cli` project.
   - Keep this as a no-behavior-change PR so baseline timing is visible on GitHub Actions.

2. Classify the expensive CLI test surface by ownership and failure mode.
   - Identify subprocess-heavy CLI command tests, retry/timeout simulation tests, and lower-level unit-style CLI tests.
   - Candidate heavy suites based on current timing: `test/cli.test.ts`, `test/sandbox-connect-inference.test.ts`, and `test/nemoclaw-start.test.ts`.
   - Avoid moving tests based only on filename convenience; preserve the current semantic coverage of CLI command behavior, sandbox lifecycle handling, inference preflights, and retry/timeouts.

3. Introduce explicit Vitest projects or CI shards.
   - Example shape: keep fast root/unit-style CLI tests in one project/job and move subprocess/timeout-heavy integration-style CLI tests into another project/job.
   - Alternatively use Vitest sharding if test selection remains deterministic and easy to audit.
   - Ensure every test currently selected by `--project cli` is selected exactly once unless a duplicate is intentionally removed.

4. Preserve aggregate coverage semantics.
   - Do not enforce separate per-shard coverage ratchets that could hide aggregate regressions.
   - Produce coverage reports from each shard and merge them before running the existing `scripts/check-coverage-ratchet.ts` checks, or otherwise prove the ratchet sees the same aggregate coverage data it sees today.
   - Keep both CLI coverage threshold files in force: `ci/coverage-threshold-cli-summary.json` and `ci/coverage-threshold-cli-files.json`.

5. Prove equivalence before deleting the old path.
   - On the migration PR, run the old monolithic `npx vitest run --project cli --coverage ...` path and the proposed split path at least once, then compare:
     - test file selection
     - total test count
     - failed/skipped test behavior
     - coverage summary
     - coverage ratchet result
   - After equivalence is demonstrated, remove the duplicate monolithic run so CI does not stay permanently more expensive.

6. Keep required-check behavior explicit.
   - Update the final PR aggregate `checks` job so all split CLI jobs are required when code changes are present.
   - Confirm docs-only PR routing is unchanged and does not start the CLI jobs.
   - Add or update workflow contract tests so the final aggregate cannot pass if one CLI shard is skipped, renamed, or omitted unexpectedly.

## Alternatives Considered

- Only optimize individual slow tests. This may still be worthwhile, especially for timeout/retry simulations, but it will take longer to pay down and does not reduce wall-clock time as reliably as parallelizing independent work.
- Shard by test count alone. This is simpler, but can make ownership and coverage debugging harder if related CLI command tests move between shards unpredictably.
- Split coverage thresholds per shard. This is risky because a shard-local threshold can pass while aggregate CLI coverage behavior changes.
- Exclude slow suites from PR CI. This should not be done; the current behavior checks are valuable and should remain part of the code-change gate.

## Acceptance Criteria

- The split keeps all checks and tests that the current `cli-tests` job performs for code PRs.
- Each existing `cli` project test is either selected exactly once or explicitly documented as a deliberate duplicate removal.
- Aggregate CLI coverage ratchets still run against equivalent merged coverage data.
- The final PR aggregate `checks` job requires every new CLI shard/job for code changes.
- Docs-only PR behavior remains unchanged.
- A migration PR includes before/after timing for the same commit or equivalent commit pair.
- Wall-clock time for the CLI coverage portion is reduced without weakening the gate.

## Category

Testing

## Checklist

- [x] I searched existing issues and this is not a duplicate
- [x] This is a design proposal, not a "please build this" request


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: safely split slow CLI coverage suites #4892

Problem Statement

Proposed Design

Alternatives Considered

Acceptance Criteria

Category

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ci: safely split slow CLI coverage suites #4892

Description

Problem Statement

Proposed Design

Alternatives Considered

Acceptance Criteria

Category

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions