Skip to content

test(e2e): retire docker-unreachable gateway script#5119

Merged
cv merged 4 commits into
mainfrom
codex/e2e-5113-docker-unreachable-retire
Jun 10, 2026
Merged

test(e2e): retire docker-unreachable gateway script#5119
cv merged 4 commits into
mainfrom
codex/e2e-5113-docker-unreachable-retire

Conversation

@cv

@cv cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Retires the legacy Docker-unreachable gateway-start shell regression now that the caller-level contract is covered by a focused Vitest process test. The new test drives startGateway() through a PATH-shimmed openshell binary and proves the fallback exits before health polling, status/info probes, diagnostics, or cleanup.

Related Issue

Closes #5113
Refs #5098
Refs #4355
Refs #2347

Changes

  • Adds test/onboard-gateway-docker-unreachable.test.ts with the former shell trace assertions as a focused Vitest process regression.
  • Removes the stale it.todo coverage-gap block from src/lib/onboard/gateway-start-failure-integration.test.ts and points helper-level coverage to the new caller-level test.
  • Deletes test/e2e/test-docker-unreachable-gateway-start.sh and removes its regression-e2e.yaml dispatch lane.
  • Removes the deleted shell script from legacy-inventory.json so the inventory continues to track current direct legacy entrypoints; the retirement rationale is captured in this PR and linked issue trail.
  • Adds a workflow-contract test proving the retired job is not advertised or selected.
  • Lowers the test/onboard.test.ts legacy line budget after moving the regression out.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm run build:cli

  • npx vitest run --project cli test/onboard-gateway-docker-unreachable.test.ts src/lib/onboard/gateway-start-failure-integration.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/regression-e2e-workflow.test.ts --silent=false --reporter=default

  • HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign XDG_CONFIG_HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign/.config npx prek run --files .github/workflows/regression-e2e.yaml ci/test-file-size-budget.json test/onboard-gateway-docker-unreachable.test.ts test/onboard.test.ts src/lib/onboard/gateway-start-failure-integration.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/migration/legacy-inventory.json test/regression-e2e-workflow.test.ts

  • HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign XDG_CONFIG_HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign/.config npx prek run --files test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/migration/legacy-inventory.json

  • HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign XDG_CONFIG_HOME=/home/cvillela/.cache/nemoclaw-makecheck-nosign/.config npm run test-size:check

  • git diff --check

  • npx prek run --all-files passes

  • npm test passes

  • Tests added or updated for new or changed behavior

  • No secrets, API keys, or credentials committed

  • Docs updated for user-facing behavior changes

  • npm run docs builds without warnings (doc changes only)

  • Doc pages follow the style guide (doc changes only)

  • New doc pages include SPDX header and frontmatter (new pages only)


Signed-off-by: Carlos Villela cvillela@nvidia.com

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 10, 2026
@cv cv self-assigned this Jun 10, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR retires the shell-based docker-unreachable-gateway-start-e2e regression test by removing its workflow job definition, replacing it with a higher-fidelity Vitest integration test, marking the legacy script as retired in the migration inventory, and adding validation to ensure clean retirement.

Changes

Docker-unreachable test retirement and Vitest migration

Layer / File(s) Summary
Remove docker-unreachable job from regression E2E workflow
.github/workflows/regression-e2e.yaml, ci/test-file-size-budget.json
The workflow dispatch jobs input no longer lists docker-unreachable-gateway-start-e2e. Job selection outputs and conditional branch are removed. The entire job definition is deleted. File size budget for test/onboard.test.ts is adjusted from 4874 to 4783 lines.
Add Vitest integration test for docker-unreachable failure path
test/onboard-gateway-docker-unreachable.test.ts, test/onboard.test.ts, src/lib/onboard/gateway-start-failure-integration.test.ts
New Vitest suite spawns startGateway in a child process with mocked openshell and p-retry, verifying fast-fail exit code 1, recovery guidance in stderr, absence of health polling and status probes, and trace-based absence of forbidden post-start commands. Old equivalent test removed from test/onboard.test.ts. Integration test documentation updated to clarify helper-level contract coverage strategy and two-layer approach.
Mark legacy script as retired in inventory system
test/e2e-scenario/migration/legacy-inventory.json, test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts
Script entry transitioned from not-migrated to retired with deletionReady: true and deletionApprovalIssue: "#5098". Deletion criteria now accept either #4357 or #5098 as approval. Tests enforce non-retired scripts exist and retired scripts do not; new test validates deletion evidence fields (retiredReason, deletionApprovalIssue, notes required).
Add test validating workflow retirement
test/regression-e2e-workflow.test.ts
New regression test loads regression workflow YAML and asserts the retired identifier docker-unreachable-gateway-start-e2e does not appear in dispatch input description, job keys, or job-selection script.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5109: Retired the shell-based test originally protecting issue #2347; this PR completes the migration to Vitest integration coverage.
  • NVIDIA/NemoClaw#5052: Both PRs modify the E2E migration-inventory validation logic and legacy script retirement tracking in e2e-migration-inventory.test.ts and legacy-inventory.json.

Suggested labels

area: ci, chore

Poem

🐰 A test that once ran through bash and shell,
Now thrives as Vitest—TS does it well!
The docker-unreachable path finds light,
In child processes and traces, burning bright.
Shell scripts retired, their job now complete,
The inventory knows: deletion's sweet! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses linked issue #5113 by removing the legacy shell test and adding a caller-level Vitest test, but does not extract the proposed attemptGatewayStart helper or convert the three it.todo contracts into concrete tests. Extract attemptGatewayStart as a dependency-injected helper from startGatewayWithOptions, convert the three it.todo entries into concrete it() blocks that assert the specified conditions, and verify startGatewayWithOptions behavior remains unchanged.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: retiring the docker-unreachable gateway shell script test in favor of a focused Vitest process regression.
Out of Scope Changes check ✅ Passed All changes are scoped to retiring the legacy shell script and setting up retirement infrastructure; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/e2e-5113-docker-unreachable-retire

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: gateway-health-honest-e2e, gateway-drift-preflight-e2e

Dispatch hint: gateway-health-honest-e2e,gateway-drift-preflight-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None.

Optional E2E

  • gateway-health-honest-e2e (medium): Optional adjacent sandbox-lifecycle regression check through the modified regression-e2e workflow selector. It exercises gateway-start failure honesty, but the PR only changes tests/workflow coverage, not runtime gateway code.
  • gateway-drift-preflight-e2e (medium): Optional smoke of another sandbox-lifecycle regression lane to confirm the edited regression workflow still dispatches remaining selected jobs correctly.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: regression-e2e.yaml
  • jobs input: gateway-health-honest-e2e,gateway-drift-preflight-e2e

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No Vitest scenario E2E run is needed. The only scenario-suite file changed is the legacy migration/deletion inventory, which does not affect live scenario runtime support, manifests, typed scenario catalog, expected-state definitions, shared fixtures, or the e2e-vitest-scenarios workflow. Other changed files are regression/legacy E2E workflow or non-scenario unit/integration tests and are outside this advisor's scope.

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/migration/legacy-inventory.json

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Consider writing more tests for
  • **Runtime validation** — Regression workflow dispatch selector keeps all remaining lanes selectable when inputs.jobs is empty and the retired docker-unreachable-gateway-start-e2e name no longer maps to an output.. The PR changes workflow/infrastructure surfaces and retires a shell E2E entrypoint, but it adds focused caller-level Vitest coverage and a static workflow contract test. No additional blocking test gap was identified.
  • **Runtime validation** — Caller-level Docker-unreachable test rejects post-gateway-start gateway select commands if future cleanup paths start selecting or mutating the failed gateway after the abort.. The PR changes workflow/infrastructure surfaces and retires a shell E2E entrypoint, but it adds focused caller-level Vitest coverage and a static workflow contract test. No additional blocking test gap was identified.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

cv added 2 commits June 10, 2026 00:37
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Direction on migration evidence and legacy-inventory.json

Cross-linking the post-#5106 plan from #5098: #5098 (comment)

For this PR specifically, I would not treat legacy-inventory.json as the durable audit ledger we have to grow forever. The useful invariant is simpler:

A PR that deletes a legacy E2E script must show replacement Vitest coverage or explain why the contract is intentionally retired.

This PR can satisfy that in the PR body, linked issue, and review trail. If we keep a repo-level guard, I’d prefer a lightweight deletion guard over a detailed static inventory. A constantly mutating JSON file that tracks every script’s status is likely to drift and contradict the live issue/PR state.

So my recommendation for this PR is:

  • keep the replacement coverage/rationale clear in the PR;
  • do not expand legacy-inventory.json into a richer source of truth;
  • if the current inventory tests force extra bookkeeping, consider replacing that machinery in a follow-up with a smaller guard that only catches script deletions lacking explicit evidence.

This keeps the migration aligned with the one-E2E-system target: Vitest is the harness; shell remains a boundary inside tests only where needed; GitHub issues/PRs remain the source of truth for migration decisions.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed this direction in 33e9601: the deleted shell script is no longer kept as a retired direct-entry row in legacy-inventory.json, and the inventory test is back to requiring direct legacy entries to exist. The deletion rationale now lives in the PR body / linked issue trail, while the branch keeps the focused Vitest caller-level regression and the workflow-contract test for the removed lane.

@cv cv merged commit 44376ec into main Jun 10, 2026
36 checks passed
@cv cv deleted the codex/e2e-5113-docker-unreachable-retire branch June 10, 2026 08:07
@cv cv added the v0.0.63 Release target label Jun 10, 2026
jyaunches added a commit that referenced this pull request Jun 10, 2026
Renames test/fixtures/strict-tool-call-probe-driver.cjs to
.ts and spawns it via tsx to satisfy the codebase-growth
guardrail that forbids newly added .js/.cjs/.mjs files.

The driver body stays JS-shaped (CommonJS-style) because the
embedded `node -e` strings must remain plain JS for the spawned
child processes, and dist/lib/* targets are CJS modules. Uses
`createRequire(import.meta.url)` plus `import.meta.url`-derived
__dirname/__filename so tsx's ESM-default execution can still
require() the built artifacts. `@ts-nocheck` keeps the surface
identical to the retired bash heredoc.

Verified: `tsx test/fixtures/strict-tool-call-probe-driver.ts`
prints all four `[PASS]` markers; `npx vitest run --project cli
test/strict-tool-call-probe.test.ts test/regression-e2e-workflow.test.ts`
passes 3/3; `npx biome check` clean; `tsc --noEmit -p
tsconfig.cli.json` clean for touched files; file-size budget passes.

Refs #5098, #4537, #4349, #5119.
jyaunches added a commit that referenced this pull request Jun 10, 2026
## Summary
Retires the legacy `test/e2e/test-strict-tool-call-probe.sh` regression
and replaces it with a focused Vitest process test that drives the same
caller-level Local Ollama strict Chat Completions tool-call validation
behavior against a hermetic mock endpoint.

The legacy script was process-level mock-driven (`node -e ...` against
in-process module mocks plus a localhost stub) — the same shape
`nemoclaw-e2e-legacy-migrate` Phase 0 explicitly refuses, with a
recommendation to follow #5119's retirement pattern: author the
replacement directly as a Vitest test in `test/`, delete the bash
script, remove the regression-e2e job.

## Related Issue
Refs #5098
Refs #4537
Refs #4349
Refs #5119

## Changes
- Adds `test/strict-tool-call-probe.test.ts` — a Vitest test that spawns
`test/fixtures/strict-tool-call-probe-driver.ts` via `tsx` and asserts
the four legacy `[PASS]` markers (success, onboarding caller,
transient-502 retry, plain-text fail-closed).
- Adds `test/fixtures/strict-tool-call-probe-driver.ts` containing the
former inline `node -e` block from the bash script. The driver is
authored as TypeScript (rather than `.cjs`) per the codebase-growth
guardrail forbidding newly added `.js`/`.cjs`/`.mjs` files; body is
JS-shaped because the embedded `node -e` strings must remain plain
CommonJS for the spawned children, and the dist/lib/* targets are CJS
modules. Uses `createRequire(import.meta.url)` plus
`fileURLToPath`-derived `__dirname` so tsx's ESM-default execution can
still load the CJS dist artifacts. `// @ts-nocheck` keeps the surface
identical to the retired bash heredoc.
- Driver runs in a fresh `tsx` child process so its `curl`-based
validation subprocess is identical to production runtime conditions and
is not affected by Vitest's worker pool, fetch shim, or signal handling.
- Deletes `test/e2e/test-strict-tool-call-probe.sh`.
- Removes the `strict-tool-call-probe-e2e` job, its selector branch,
output declaration, and `Valid:` description entry from
`.github/workflows/regression-e2e.yaml`.
- Adds a workflow-contract test in
`test/regression-e2e-workflow.test.ts` proving the retired job is not
advertised or selected (mirrors the docker-unreachable contract test
added in #5119).
- Removes the deleted shell script's row from
`test/e2e-scenario/migration/legacy-inventory.json` (mirrors #5119's
inventory hygiene); the retirement rationale is captured in this PR and
the linked issue trail.

### Why not live E2E scenario fixtures?
This probe asserts payload shape and retry behavior on production caller
code against a localhost OpenAI-compatible stub. It does not need a live
sandbox lifecycle, scenario registry entry, shared fixture, or matrix
runner. Keeping it as a focused process Vitest test preserves the same
caller boundary without adding another live E2E surface.


### Simplicity check
- Test shape: focused process Vitest test.
- New shared helpers: none.
- New live E2E fixtures/registry/ledger: none.
- Workflow changes: remove the retired regression lane.
- Pre-#5126 compatibility: the `legacy-inventory.json` row is removed
only because the current deletion gate requires the inventory to match
existing `test/e2e/test-*.sh` entry points.

### Build dependency
This test consumes built artifacts from `dist/lib/...` (matches the
existing CLI-shard convention). CI runs `npm run build:cli` before
vitest. For local standalone runs, run `npm run build:cli` first;
without it, the wrapper fails fast with a clear `npm run build:cli`
diagnostic before launching the driver.

## Type of Change
- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- `npm run build:cli`
- `node_modules/.bin/tsx test/fixtures/strict-tool-call-probe-driver.ts`
(legacy parity check — all four `[PASS]` markers print; matches `bash
test/e2e/test-strict-tool-call-probe.sh` output line-for-line)
- `npx vitest run --project cli test/strict-tool-call-probe.test.ts
test/regression-e2e-workflow.test.ts
test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts
test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts`
→ 10/10 passing
- `npx tsc --noEmit -p tsconfig.cli.json` clean for the touched files
- `npx biome check test/strict-tool-call-probe.test.ts
test/fixtures/strict-tool-call-probe-driver.ts` clean
- `npx tsx scripts/check-test-file-size-budget.ts` passes
- YAML syntax check on `.github/workflows/regression-e2e.yaml` passes

- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Removed strict tool-call probe regression test lane from the
continuous integration workflow
* Updated test infrastructure with modernized validation for Chat
Completions tool-call enforcement on local integration paths
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-authored-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jyaunches added a commit that referenced this pull request Jun 10, 2026
## Summary
Retires the legacy caller-level onboard inference smoke bash regression
now that the same contract is covered by a focused Vitest process test.

## Related Issues
Refs #3253
Refs #4349
Refs #5098
Refs #5119

## Changes
- Adds `test/onboard-inference-smoke.test.ts`, which drives
`setupInference()` in a subprocess with PATH-shimmed `openshell` and
`curl`.
- Verifies onboard probes `/chat/completions`, fails closed on upstream
HTTP 503, surfaces provider/model/API base/credential diagnostics, and
does not print route success after smoke failure.
- Deletes `test/e2e/test-onboard-inference-smoke.sh`.
- Removes the retired `onboard-inference-smoke-e2e` lane from
`regression-e2e.yaml`.
- Updates the post-#5126 frozen legacy E2E shell allowlist and
regression workflow contract test to keep the retired lane out of
dispatch selection.

## Simplicity check
- Test shape: focused process Vitest test.
- New shared helpers: none.
- New framework/registry/ledger: none.
- Workflow changes: remove retired regression lane and update frozen
shell/workflow contract allowlists.

## Verification
- `npm run build:cli`
- `npx vitest run test/onboard-inference-smoke.test.ts`
- `npx tsc --noEmit -p jsconfig.json`
- `npx vitest run --project cli test/onboard-inference-smoke.test.ts
test/regression-e2e-workflow.test.ts test/e2e-script-workflow.test.ts
--silent=false --reporter=default`
- `git diff --check`

Notes:
- A local full hook/all-files run was attempted and failed on existing
environment-dependent failures unrelated to this change (missing nested
`nemoclaw/dist` / `nemoclaw/node_modules/json5`, Docker daemon
unavailable, and several local timeout-sensitive tests). CI should run
the repository's normal validation in the correct environment.

## Type of Change
- [x] Test-only change
- [x] CI/workflow cleanup for retired legacy coverage
- [x] No secrets, API keys, or credentials committed


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Migrated onboard inference smoke test from shell script to
Vitest-based test for improved reliability and isolation.

* **Chores**
* Removed `onboard-inference-smoke-e2e` from regression workflow
selectable jobs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure v0.0.63 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract attemptGatewayStart helper from startGatewayWithOptions for caller-level testability

2 participants