test(e2e): retire test-strict-tool-call-probe.sh#5153
Conversation
Replaces the legacy bash regression for the Local Ollama strict Chat Completions tool-call validation path with a focused Vitest process test that drives the same caller-level behavior against a hermetic mock. Per #5119's retirement pattern: caller-level mock-driven probes belong in test/, not in the scenario-framework or regression-e2e bash workflow. The typed scenario framework's expected-state model targets steady-state and disruption-recovery shapes; this probe asserts payload shape and retry behavior on an in-process module against a localhost stub, which is a unit-shaped contract — wrong fit for the scenario matrix. Refs #5098, #4537, #4349, #5119. Changes: - Adds test/strict-tool-call-probe.test.ts, a Vitest test that spawns test/fixtures/strict-tool-call-probe-driver.cjs and asserts the four legacy [PASS] markers (success, onboarding caller, transient-502 retry, plain-text fail-closed). Driver runs in a fresh node process to keep curl-probe behavior identical to production runtime conditions and avoid Vitest worker-pool / fetch-shim interference. - Adds test/fixtures/strict-tool-call-probe-driver.cjs containing the former inline `node -e` block from the bash script verbatim, with require paths anchored to REPO_ROOT. - Deletes test/e2e/test-strict-tool-call-probe.sh. - Removes the strict-tool-call-probe-e2e job, its selector branch, output declaration, and `Valid:` description entry from .github/workflows/regression-e2e.yaml. - Adds a workflow-contract test in test/regression-e2e-workflow.test.ts proving the retired job is not advertised or selected. - Removes the deleted shell script's row from test/e2e-scenario/migration/legacy-inventory.json (mirrors #5119's inventory hygiene); the retirement rationale is captured in this commit and the linked issue trail.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: None Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
Renames test/fixtures/strict-tool-call-probe-driver.cjs to .ts and spawns it via tsx to satisfy the codebase-growth guardrail that forbids newly added .js/.cjs/.mjs files. The driver body stays JS-shaped (CommonJS-style) because the embedded `node -e` strings must remain plain JS for the spawned child processes, and dist/lib/* targets are CJS modules. Uses `createRequire(import.meta.url)` plus `import.meta.url`-derived __dirname/__filename so tsx's ESM-default execution can still require() the built artifacts. `@ts-nocheck` keeps the surface identical to the retired bash heredoc. Verified: `tsx test/fixtures/strict-tool-call-probe-driver.ts` prints all four `[PASS]` markers; `npx vitest run --project cli test/strict-tool-call-probe.test.ts test/regression-e2e-workflow.test.ts` passes 3/3; `npx biome check` clean; `tsc --noEmit -p tsconfig.cli.json` clean for touched files; file-size budget passes. Refs #5098, #4537, #4349, #5119.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR removes the shell-based ChangesStrict Tool-Call Probe Testing Migration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
|
Coordination note now that #5126 is green: this PR is still based on the pre-#5126 migration shape. On the intended base,
This appears to overlap #5145, which already carries the strict-tool-call slice on the #5126-based draft stack. My recommendation is to close this as superseded by #5145 unless it contains a distinct assertion that should be ported there. |
|
Migration-principle review for #5098: the core direction here is aligned — this is a focused Vitest retirement rather than forcing a caller-level mock probe into live E2E. Reference from Recommended path:
Cleanup requested before merge/rebase:
|
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/fixtures/strict-tool-call-probe-driver.ts (1)
1-1: ⚡ Quick winConsider removing
@ts-nocheckand adding proper types to the TypeScript portions.The
@ts-nocheckdirective disables type checking for the entire file. While the embedded CommonJS string templates (lines 91-162, 232-297) are JS-shaped, the wrapper code—imports, helper functions, async operations, and spawn calls—should be type-checked. String literals aren't type-checked anyway, so the directive is broader than necessary and removes type safety from genuine TypeScript code that would benefit from it.♻️ Suggested approach
Remove the
@ts-nocheckdirective and add explicit type annotations for the key functions:-// `@ts-nocheck` // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.Then add return types and parameter types to key functions, for example:
function assertStrictPayload(payload: any): void { // ... } async function validate( endpoint: string, recoveryCalls: any[] = [] ): Promise<{ ok: boolean; api?: string; retry?: string }> { // ... }If type errors surface, address them rather than suppressing all checking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/fixtures/strict-tool-call-probe-driver.ts` at line 1, Remove the top-level "`@ts-nocheck`" and add explicit TypeScript types to the actual TypeScript code in this file: delete the directive, keep the embedded CommonJS template string literals as-is, and annotate functions such as assertStrictPayload (give payload a typed parameter and void return), validate (typed endpoint: string, recoveryCalls: any[] default, and a Promise<{ok: boolean; api?: string; retry?: string}> return), and any helper/async spawn wrappers with appropriate parameter and return types; fix any resulting type errors instead of suppressing them so the wrapper/imports/async logic is type-checked while leaving the string templates untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/fixtures/strict-tool-call-probe-driver.ts`:
- Line 1: Remove the top-level "`@ts-nocheck`" and add explicit TypeScript types
to the actual TypeScript code in this file: delete the directive, keep the
embedded CommonJS template string literals as-is, and annotate functions such as
assertStrictPayload (give payload a typed parameter and void return), validate
(typed endpoint: string, recoveryCalls: any[] default, and a Promise<{ok:
boolean; api?: string; retry?: string}> return), and any helper/async spawn
wrappers with appropriate parameter and return types; fix any resulting type
errors instead of suppressing them so the wrapper/imports/async logic is
type-checked while leaving the string templates untouched.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b903a069-e0b8-4118-81cd-b0a3b43c4edf
📒 Files selected for processing (5)
.github/workflows/regression-e2e.yamltest/e2e-scenario/migration/legacy-inventory.jsontest/fixtures/strict-tool-call-probe-driver.tstest/regression-e2e-workflow.test.tstest/strict-tool-call-probe.test.ts
💤 Files with no reviewable changes (1)
- test/e2e-scenario/migration/legacy-inventory.json
prekshivyas
left a comment
There was a problem hiding this comment.
Reviewed (code + 9-cat security). Retires test/e2e/test-strict-tool-call-probe.sh + its CI lane, porting the node heredoc verbatim into a test/fixtures/ driver run by a new Vitest test.
✅ Approve — faithful 1:1 migration with no coverage loss: the driver body is byte-identical to the legacy heredoc (all 4 scenarios + the strict-payload assertions: model / tool_choice=required / max_tokens / stream=false / temperature=0 / sessions_send), all referenced symbols resolve, and the workflow/inventory removals are consistent with no dangling refs (verified against head). Security: all 9 pass — loopback-bound mock server, no new egress, repo-internal createRequire paths (no traversal), @ts-nocheck scoped to the test fixture only.
Nit: the "fixtures excluded from the test glob" comment is slightly off — it's the include pattern (*.test.*) that prevents discovery, not an exclude. Effective behavior is correct.
This PR deletes test/e2e/test-strict-tool-call-probe.sh, so the frozen LEGACY_E2E_SHELL_ALLOWLIST in e2e-script-workflow.test.ts must drop it too (it was failing 71 != 72). Not in the nightly allowlist, so no other change needed. Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Retires the legacy
test/e2e/test-strict-tool-call-probe.shregression and replaces it with a focused Vitest process test that drives the same caller-level Local Ollama strict Chat Completions tool-call validation behavior against a hermetic mock endpoint.The legacy script was process-level mock-driven (
node -e ...against in-process module mocks plus a localhost stub) — the same shapenemoclaw-e2e-legacy-migratePhase 0 explicitly refuses, with a recommendation to follow #5119's retirement pattern: author the replacement directly as a Vitest test intest/, delete the bash script, remove the regression-e2e job.Related Issue
Refs #5098
Refs #4537
Refs #4349
Refs #5119
Changes
test/strict-tool-call-probe.test.ts— a Vitest test that spawnstest/fixtures/strict-tool-call-probe-driver.tsviatsxand asserts the four legacy[PASS]markers (success, onboarding caller, transient-502 retry, plain-text fail-closed).test/fixtures/strict-tool-call-probe-driver.tscontaining the former inlinenode -eblock from the bash script. The driver is authored as TypeScript (rather than.cjs) per the codebase-growth guardrail forbidding newly added.js/.cjs/.mjsfiles; body is JS-shaped because the embeddednode -estrings must remain plain CommonJS for the spawned children, and the dist/lib/* targets are CJS modules. UsescreateRequire(import.meta.url)plusfileURLToPath-derived__dirnameso tsx's ESM-default execution can still load the CJS dist artifacts.// @ts-nocheckkeeps the surface identical to the retired bash heredoc.tsxchild process so itscurl-based validation subprocess is identical to production runtime conditions and is not affected by Vitest's worker pool, fetch shim, or signal handling.test/e2e/test-strict-tool-call-probe.sh.strict-tool-call-probe-e2ejob, its selector branch, output declaration, andValid:description entry from.github/workflows/regression-e2e.yaml.test/regression-e2e-workflow.test.tsproving the retired job is not advertised or selected (mirrors the docker-unreachable contract test added in test(e2e): retire docker-unreachable gateway script #5119).test/e2e-scenario/migration/legacy-inventory.json(mirrors test(e2e): retire docker-unreachable gateway script #5119's inventory hygiene); the retirement rationale is captured in this PR and the linked issue trail.Why not live E2E scenario fixtures?
This probe asserts payload shape and retry behavior on production caller code against a localhost OpenAI-compatible stub. It does not need a live sandbox lifecycle, scenario registry entry, shared fixture, or matrix runner. Keeping it as a focused process Vitest test preserves the same caller boundary without adding another live E2E surface.
Simplicity check
legacy-inventory.jsonrow is removed only because the current deletion gate requires the inventory to match existingtest/e2e/test-*.shentry points.Build dependency
This test consumes built artifacts from
dist/lib/...(matches the existing CLI-shard convention). CI runsnpm run build:clibefore vitest. For local standalone runs, runnpm run build:clifirst; without it, the wrapper fails fast with a clearnpm run build:clidiagnostic before launching the driver.Type of Change
Verification
npm run build:clinode_modules/.bin/tsx test/fixtures/strict-tool-call-probe-driver.ts(legacy parity check — all four[PASS]markers print; matchesbash test/e2e/test-strict-tool-call-probe.shoutput line-for-line)npx vitest run --project cli test/strict-tool-call-probe.test.ts test/regression-e2e-workflow.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts→ 10/10 passingnpx tsc --noEmit -p tsconfig.cli.jsonclean for the touched filesnpx biome check test/strict-tool-call-probe.test.ts test/fixtures/strict-tool-call-probe-driver.tscleannpx tsx scripts/check-test-file-size-budget.tspassesYAML syntax check on
.github/workflows/regression-e2e.yamlpassesnpx prek run --all-filespassesnpm testpassesTests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docsbuilds without warnings (doc changes only)Summary by CodeRabbit