Skip to content

test(e2e): retire test-strict-tool-call-probe.sh#5153

Merged
jyaunches merged 5 commits into
mainfrom
e2e-retire/test-strict-tool-call-probe
Jun 10, 2026
Merged

test(e2e): retire test-strict-tool-call-probe.sh#5153
jyaunches merged 5 commits into
mainfrom
e2e-retire/test-strict-tool-call-probe

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Retires the legacy test/e2e/test-strict-tool-call-probe.sh regression and replaces it with a focused Vitest process test that drives the same caller-level Local Ollama strict Chat Completions tool-call validation behavior against a hermetic mock endpoint.

The legacy script was process-level mock-driven (node -e ... against in-process module mocks plus a localhost stub) — the same shape nemoclaw-e2e-legacy-migrate Phase 0 explicitly refuses, with a recommendation to follow #5119's retirement pattern: author the replacement directly as a Vitest test in test/, delete the bash script, remove the regression-e2e job.

Related Issue

Refs #5098
Refs #4537
Refs #4349
Refs #5119

Changes

  • Adds test/strict-tool-call-probe.test.ts — a Vitest test that spawns test/fixtures/strict-tool-call-probe-driver.ts via tsx and asserts the four legacy [PASS] markers (success, onboarding caller, transient-502 retry, plain-text fail-closed).
  • Adds test/fixtures/strict-tool-call-probe-driver.ts containing the former inline node -e block from the bash script. The driver is authored as TypeScript (rather than .cjs) per the codebase-growth guardrail forbidding newly added .js/.cjs/.mjs files; body is JS-shaped because the embedded node -e strings must remain plain CommonJS for the spawned children, and the dist/lib/* targets are CJS modules. Uses createRequire(import.meta.url) plus fileURLToPath-derived __dirname so tsx's ESM-default execution can still load the CJS dist artifacts. // @ts-nocheck keeps the surface identical to the retired bash heredoc.
  • Driver runs in a fresh tsx child process so its curl-based validation subprocess is identical to production runtime conditions and is not affected by Vitest's worker pool, fetch shim, or signal handling.
  • Deletes test/e2e/test-strict-tool-call-probe.sh.
  • Removes the strict-tool-call-probe-e2e job, its selector branch, output declaration, and Valid: description entry from .github/workflows/regression-e2e.yaml.
  • Adds a workflow-contract test in test/regression-e2e-workflow.test.ts proving the retired job is not advertised or selected (mirrors the docker-unreachable contract test added in test(e2e): retire docker-unreachable gateway script #5119).
  • Removes the deleted shell script's row from test/e2e-scenario/migration/legacy-inventory.json (mirrors test(e2e): retire docker-unreachable gateway script #5119's inventory hygiene); the retirement rationale is captured in this PR and the linked issue trail.

Why not live E2E scenario fixtures?

This probe asserts payload shape and retry behavior on production caller code against a localhost OpenAI-compatible stub. It does not need a live sandbox lifecycle, scenario registry entry, shared fixture, or matrix runner. Keeping it as a focused process Vitest test preserves the same caller boundary without adding another live E2E surface.

Simplicity check

  • Test shape: focused process Vitest test.
  • New shared helpers: none.
  • New live E2E fixtures/registry/ledger: none.
  • Workflow changes: remove the retired regression lane.
  • Pre-test(e2e): simplify legacy migration tracking #5126 compatibility: the legacy-inventory.json row is removed only because the current deletion gate requires the inventory to match existing test/e2e/test-*.sh entry points.

Build dependency

This test consumes built artifacts from dist/lib/... (matches the existing CLI-shard convention). CI runs npm run build:cli before vitest. For local standalone runs, run npm run build:cli first; without it, the wrapper fails fast with a clear npm run build:cli diagnostic before launching the driver.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm run build:cli

  • node_modules/.bin/tsx test/fixtures/strict-tool-call-probe-driver.ts (legacy parity check — all four [PASS] markers print; matches bash test/e2e/test-strict-tool-call-probe.sh output line-for-line)

  • npx vitest run --project cli test/strict-tool-call-probe.test.ts test/regression-e2e-workflow.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts → 10/10 passing

  • npx tsc --noEmit -p tsconfig.cli.json clean for the touched files

  • npx biome check test/strict-tool-call-probe.test.ts test/fixtures/strict-tool-call-probe-driver.ts clean

  • npx tsx scripts/check-test-file-size-budget.ts passes

  • YAML syntax check on .github/workflows/regression-e2e.yaml passes

  • npx prek run --all-files passes

  • npm test passes

  • Tests added or updated for new or changed behavior

  • No secrets, API keys, or credentials committed

  • Docs updated for user-facing behavior changes

  • npm run docs builds without warnings (doc changes only)

Summary by CodeRabbit

  • Chores
    • Removed strict tool-call probe regression test lane from the continuous integration workflow
    • Updated test infrastructure with modernized validation for Chat Completions tool-call enforcement on local integration paths

Replaces the legacy bash regression for the Local Ollama strict Chat
Completions tool-call validation path with a focused Vitest process test
that drives the same caller-level behavior against a hermetic mock.

Per #5119's retirement pattern: caller-level mock-driven probes belong in
test/, not in the scenario-framework or regression-e2e bash workflow. The
typed scenario framework's expected-state model targets steady-state and
disruption-recovery shapes; this probe asserts payload shape and retry
behavior on an in-process module against a localhost stub, which is a
unit-shaped contract — wrong fit for the scenario matrix.

Refs #5098, #4537, #4349, #5119.

Changes:
- Adds test/strict-tool-call-probe.test.ts, a Vitest test that spawns
  test/fixtures/strict-tool-call-probe-driver.cjs and asserts the four
  legacy [PASS] markers (success, onboarding caller, transient-502
  retry, plain-text fail-closed). Driver runs in a fresh node process
  to keep curl-probe behavior identical to production runtime
  conditions and avoid Vitest worker-pool / fetch-shim interference.
- Adds test/fixtures/strict-tool-call-probe-driver.cjs containing the
  former inline `node -e` block from the bash script verbatim, with
  require paths anchored to REPO_ROOT.
- Deletes test/e2e/test-strict-tool-call-probe.sh.
- Removes the strict-tool-call-probe-e2e job, its selector branch,
  output declaration, and `Valid:` description entry from
  .github/workflows/regression-e2e.yaml.
- Adds a workflow-contract test in test/regression-e2e-workflow.test.ts
  proving the retired job is not advertised or selected.
- Removes the deleted shell script's row from
  test/e2e-scenario/migration/legacy-inventory.json (mirrors #5119's
  inventory hygiene); the retirement rationale is captured in this
  commit and the linked issue trail.
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No existing E2E job is required: this PR changes only CI workflow/test wiring and migrates a hermetic regression probe from the regression E2E workflow into Vitest coverage. There are no production changes to installer/onboarding, sandbox lifecycle, credentials, network policy, inference routing, deployment, or real assistant user flows.

Optional E2E

  • None.

New E2E recommendations

  • None.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • None. Changes retire/migrate a legacy regression E2E lane into non-scenario Vitest tests outside test/e2e-scenario/ and do not touch the Vitest scenario workflow, scenario registry/runtime support, live scenario entry point, or shared scenario fixtures.

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • None.

Renames test/fixtures/strict-tool-call-probe-driver.cjs to
.ts and spawns it via tsx to satisfy the codebase-growth
guardrail that forbids newly added .js/.cjs/.mjs files.

The driver body stays JS-shaped (CommonJS-style) because the
embedded `node -e` strings must remain plain JS for the spawned
child processes, and dist/lib/* targets are CJS modules. Uses
`createRequire(import.meta.url)` plus `import.meta.url`-derived
__dirname/__filename so tsx's ESM-default execution can still
require() the built artifacts. `@ts-nocheck` keeps the surface
identical to the retired bash heredoc.

Verified: `tsx test/fixtures/strict-tool-call-probe-driver.ts`
prints all four `[PASS]` markers; `npx vitest run --project cli
test/strict-tool-call-probe.test.ts test/regression-e2e-workflow.test.ts`
passes 3/3; `npx biome check` clean; `tsc --noEmit -p
tsconfig.cli.json` clean for touched files; file-size budget passes.

Refs #5098, #4537, #4349, #5119.
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d4d8c809-8112-457d-94a0-f1ee900d6975

📥 Commits

Reviewing files that changed from the base of the PR and between 1e2c800 and d8bdcf2.

📒 Files selected for processing (1)
  • test/e2e-script-workflow.test.ts

📝 Walkthrough

Walkthrough

The PR removes the shell-based strict-tool-call-probe-e2e regression job from the GitHub Actions workflow and replaces it with a hermetic TypeScript driver and Vitest test. The workflow allowlist, job outputs, conditionals, and job definition are updated. The legacy bash script is removed from the test inventory.

Changes

Strict Tool-Call Probe Testing Migration

Layer / File(s) Summary
Retire strict-tool-call-probe-e2e from regression workflow
.github/workflows/regression-e2e.yaml
Remove strict-tool-call-probe-e2e from the workflow dispatch allowlist, job selection outputs, conditional job-routing logic, and job definition block.
Refactor strict-tool-call-probe driver to TypeScript
test/fixtures/strict-tool-call-probe-driver.ts
Replace bash harness with TypeScript/Node driver that sets environment variables, loads compiled dist/lib helpers via createRequire, spawns the onboarding caller child process with stable REPO_ROOT working directory, reformats success logging, and removes legacy bash completion logic.
Add Vitest test for strict-tool-call-probe validation
test/strict-tool-call-probe.test.ts
New Vitest test file that spawns the TypeScript driver via tsx subprocess, precondition-checks required dist/lib artifacts, and asserts successful exit with [PASS] markers covering strict validation, retry behavior on transient 502, and failure-closed scenarios.
Update migration inventory and add workflow contract assertion
test/e2e-script-workflow.test.ts, test/regression-e2e-workflow.test.ts
Remove legacy script entry from migration inventory and add new regression E2E workflow test asserting the strict-tool-call-probe lane is absent from dispatch options, job keys, and selection script.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

area: ci, chore

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 A probe that once lived in the bash,
Now TypeScript runs fast, with no crash—
Strict tools enforced, with test markers bright,
Vitest harnesses validate each night.
One job retired, the workflow runs light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and specifically describes the main change: retiring the legacy bash test script test-strict-tool-call-probe.sh and replacing it with a focused Vitest test.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-retire/test-strict-tool-call-probe

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Consider writing more tests for
  • **Runtime validation** — Evaluate the regression workflow selector with jobs=strict-tool-call-probe-e2e and assert no retired strict-tool-call-probe job is selected and the selector does not accidentally choose all remaining jobs.. The replacement unit/subprocess coverage is strong and the workflow contract test statically verifies the retired lane is no longer advertised or selected. Because the PR changes a manually dispatched workflow selector, targeted runtime-style validation would further reduce workflow-dispatch confidence risk without relying on external E2E job status.
  • **Runtime validation** — Evaluate the regression workflow selector with an empty jobs input and assert all remaining non-retired regression lanes are still selected after removing strict-tool-call-probe-e2e.. The replacement unit/subprocess coverage is strong and the workflow contract test statically verifies the retired lane is no longer advertised or selected. Because the PR changes a manually dispatched workflow selector, targeted runtime-style validation would further reduce workflow-dispatch confidence risk without relying on external E2E job status.
  • **Runtime validation** — Run test/strict-tool-call-probe.test.ts without the required top-level built CLI artifacts and assert the failure message includes the explicit npm run build:cli guidance.. The replacement unit/subprocess coverage is strong and the workflow contract test statically verifies the retired lane is no longer advertised or selected. Because the PR changes a manually dispatched workflow selector, targeted runtime-style validation would further reduce workflow-dispatch confidence risk without relying on external E2E job status.
  • **Runtime validation** — If partially stale dist/ directories are a supported local state, remove a downstream dist module used by the driver child process and assert the failure remains actionable rather than a raw module-resolution trace.. The replacement unit/subprocess coverage is strong and the workflow contract test statically verifies the retired lane is no longer advertised or selected. Because the PR changes a manually dispatched workflow selector, targeted runtime-style validation would further reduce workflow-dispatch confidence risk without relying on external E2E job status.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@jyaunches jyaunches added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 10, 2026
@jyaunches jyaunches marked this pull request as ready for review June 10, 2026 16:02
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Coordination note now that #5126 is green: this PR is still based on the pre-#5126 migration shape.

On the intended base, test/e2e-scenario/migration/legacy-inventory.json no longer exists and should not be reintroduced. We also should not use a special Legacy E2E deletion evidence PR-body block. The replacement shape for deleting a legacy script is deterministic instead:

  • port the behavioral coverage into Vitest
  • delete the legacy test/e2e/test-*.sh entry point only when replacement coverage/wiring lands in the same PR
  • update the frozen top-level script allowlist in test/e2e-script-workflow.test.ts
  • if the script is wired in regression-e2e.yaml, remove/replace that lane and update test/regression-e2e-workflow.test.ts

This appears to overlap #5145, which already carries the strict-tool-call slice on the #5126-based draft stack. My recommendation is to close this as superseded by #5145 unless it contains a distinct assertion that should be ported there.

@jyaunches

jyaunches commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Migration-principle review for #5098: the core direction here is aligned — this is a focused Vitest retirement rather than forcing a caller-level mock probe into live E2E.

Reference from test/e2e-scenario/docs/MIGRATION.md: Vitest is the harness, fixtures are support code, and GitHub PRs/issues are migration source of truth. For caller-level mock/local-stub probes, the simple replacement is focused test/ coverage plus bash lane deletion.

Recommended path:

  • Keep this as the test(e2e): retire docker-unreachable gateway script #5119 pattern: focused test/strict-tool-call-probe.test.ts coverage + delete test/e2e/test-strict-tool-call-probe.sh + remove the regression lane + update the existing workflow contract test.
  • Do not add live E2E fixture, registry scenario, or migration framework work.

Cleanup requested before merge/rebase:

  • Drop legacy-inventory.json changes after test(e2e): simplify legacy migration tracking #5126; GitHub PRs/issues are the source of truth now.
  • Replace “scenario framework” wording with “live Vitest / fixtures” or simply “not a live E2E migration target.”
  • Add a compact Simplicity check section:
    • Test shape: focused process Vitest test.
    • New shared helpers: none.
    • New framework/registry/ledger: none.
    • Workflow changes: remove retired regression lane.

@prekshivyas prekshivyas self-assigned this Jun 10, 2026
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/fixtures/strict-tool-call-probe-driver.ts (1)

1-1: ⚡ Quick win

Consider removing @ts-nocheck and adding proper types to the TypeScript portions.

The @ts-nocheck directive disables type checking for the entire file. While the embedded CommonJS string templates (lines 91-162, 232-297) are JS-shaped, the wrapper code—imports, helper functions, async operations, and spawn calls—should be type-checked. String literals aren't type-checked anyway, so the directive is broader than necessary and removes type safety from genuine TypeScript code that would benefit from it.

♻️ Suggested approach

Remove the @ts-nocheck directive and add explicit type annotations for the key functions:

-// `@ts-nocheck`
 // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Then add return types and parameter types to key functions, for example:

function assertStrictPayload(payload: any): void {
  // ...
}

async function validate(
  endpoint: string,
  recoveryCalls: any[] = []
): Promise<{ ok: boolean; api?: string; retry?: string }> {
  // ...
}

If type errors surface, address them rather than suppressing all checking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/fixtures/strict-tool-call-probe-driver.ts` at line 1, Remove the
top-level "`@ts-nocheck`" and add explicit TypeScript types to the actual
TypeScript code in this file: delete the directive, keep the embedded CommonJS
template string literals as-is, and annotate functions such as
assertStrictPayload (give payload a typed parameter and void return), validate
(typed endpoint: string, recoveryCalls: any[] default, and a Promise<{ok:
boolean; api?: string; retry?: string}> return), and any helper/async spawn
wrappers with appropriate parameter and return types; fix any resulting type
errors instead of suppressing them so the wrapper/imports/async logic is
type-checked while leaving the string templates untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/fixtures/strict-tool-call-probe-driver.ts`:
- Line 1: Remove the top-level "`@ts-nocheck`" and add explicit TypeScript types
to the actual TypeScript code in this file: delete the directive, keep the
embedded CommonJS template string literals as-is, and annotate functions such as
assertStrictPayload (give payload a typed parameter and void return), validate
(typed endpoint: string, recoveryCalls: any[] default, and a Promise<{ok:
boolean; api?: string; retry?: string}> return), and any helper/async spawn
wrappers with appropriate parameter and return types; fix any resulting type
errors instead of suppressing them so the wrapper/imports/async logic is
type-checked while leaving the string templates untouched.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b903a069-e0b8-4118-81cd-b0a3b43c4edf

📥 Commits

Reviewing files that changed from the base of the PR and between 5f60ed6 and 1e2c800.

📒 Files selected for processing (5)
  • .github/workflows/regression-e2e.yaml
  • test/e2e-scenario/migration/legacy-inventory.json
  • test/fixtures/strict-tool-call-probe-driver.ts
  • test/regression-e2e-workflow.test.ts
  • test/strict-tool-call-probe.test.ts
💤 Files with no reviewable changes (1)
  • test/e2e-scenario/migration/legacy-inventory.json

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (code + 9-cat security). Retires test/e2e/test-strict-tool-call-probe.sh + its CI lane, porting the node heredoc verbatim into a test/fixtures/ driver run by a new Vitest test.

Approve — faithful 1:1 migration with no coverage loss: the driver body is byte-identical to the legacy heredoc (all 4 scenarios + the strict-payload assertions: model / tool_choice=required / max_tokens / stream=false / temperature=0 / sessions_send), all referenced symbols resolve, and the workflow/inventory removals are consistent with no dangling refs (verified against head). Security: all 9 pass — loopback-bound mock server, no new egress, repo-internal createRequire paths (no traversal), @ts-nocheck scoped to the test fixture only.

Nit: the "fixtures excluded from the test glob" comment is slightly off — it's the include pattern (*.test.*) that prevents discovery, not an exclude. Effective behavior is correct.

This PR deletes test/e2e/test-strict-tool-call-probe.sh, so the frozen LEGACY_E2E_SHELL_ALLOWLIST in e2e-script-workflow.test.ts must drop it too (it was failing 71 != 72). Not in the nightly allowlist, so no other change needed.

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jyaunches jyaunches merged commit f4003fc into main Jun 10, 2026
39 of 42 checks passed
@jyaunches jyaunches deleted the e2e-retire/test-strict-tool-call-probe branch June 10, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants