test(e2e): add onboard inference smoke guard for #3253 by jyaunches · Pull Request #3595 · NVIDIA/NemoClaw

jyaunches · 2026-05-15T14:24:14Z

Summary

Adds a failing-test-first regression guard for #3253.

onboard-inference-smoke-e2e asserts that onboard must not accept a provider/model route that is merely configured but cannot serve a real chat completion. The test simulates a configured OpenShell inference route whose runtime /chat/completions path would return HTTP 503.

Expected status

This guard is expected to fail on unfixed main-equivalent code. That is intentional: it is the executable acceptance criterion for the #3253 fix.

Expected failure fragment on unfixed code:

setupInference() accepted a configured route without proving the chat/completions path; onboard would later print Installation complete while the first real request returns HTTP 503 (#3253)

Workflow

Job: onboard-inference-smoke-e2e
Workflow: .github/workflows/regression-e2e.yaml
Dispatch:

gh workflow run regression-e2e.yaml --repo NVIDIA/NemoClaw -f jobs=onboard-inference-smoke-e2e --ref issue-3253-onboard-inference-smoke-test

This job is not scheduled nightly. Promotion to nightly should be an explicit later decision after the fix lands and the guard is stable.

Related: #3253

Summary by CodeRabbit

Tests
- Added an end-to-end onboarding inference smoke test to detect misconfigured or runtime-broken inference routes and verify actionable diagnostics on failure.
Documentation
- Added an E2E regression guard to acceptance criteria and documented how to run the onboarding-inference smoke check.
Chores
- CI workflow extended with a new dispatch option and regression job to run the smoke test and upload logs on failures.

Adds a failing E2E test that demonstrates the bug tracked by #3253. Until the fix lands, the regression-e2e onboard-inference-smoke-e2e job will fail. This is intentional: the failing test is the executable acceptance criterion. Related: #3253

coderabbitai · 2026-05-15T14:24:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0b491a25-6b80-4391-b80d-72cadcf8cfef

📥 Commits

Reviewing files that changed from the base of the PR and between 1dc8ba0 and ed4ca99.

📒 Files selected for processing (2)

ACCEPTANCE.md
test/e2e/docs/parity-inventory.generated.json

✅ Files skipped from review due to trivial changes (1)

test/e2e/docs/parity-inventory.generated.json

📝 Walkthrough

Walkthrough

Adds a gated regression job onboard-inference-smoke-e2e, documents the acceptance guard, and provides a Bash E2E smoke test that monkey-patches Node module loading to simulate HTTP 503 from an OpenAI-compatible provider, verifying setupInference() rejects broken endpoints and surfaces diagnostics.

Changes

Onboard inference setup error handling E2E test

Layer / File(s)	Summary
Workflow job registration and selection logic `.github/workflows/regression-e2e.yaml`	`workflow_dispatch` validation updated to include `onboard-inference-smoke-e2e`; `select_regression_jobs` exports `onboard_inference_smoke` and computes it from dispatch input; new gated job `onboard-inference-smoke-e2e` runs the smoke script and uploads logs on failure.
Acceptance criteria documentation `ACCEPTANCE.md`	Adds a “Definition of Done — E2E Regression Guard” describing the `onboard-inference-smoke-e2e` job, how to run it, and the expected RED/GREEN behavior tied to the `#3253` HTTP 503 failure mode.
E2E test structure and environment setup `test/e2e/test-onboard-inference-smoke.sh`	Test header, strict Bash initialization and logging, repo path setup, `npm ci` when needed, and `npm run build:cli` before running the smoke Node heredoc with NEMOCLAW_* env vars.
E2E test module loading monkey-patch `test/e2e/test-onboard-inference-smoke.sh`	In the Node heredoc, patches `Module._load` to stub `./adapters/openshell/resolve`, intercept `./runner`, and stub provider/registry modules to simulate successful setup but force chat/completions requests to return HTTP 503.
E2E test execution, outcome markers and assertions `test/e2e/test-onboard-inference-smoke.sh`	Calls `onboard.setupInference()` with test inputs, prints resolved/rejected markers and recorded calls, captures exit code, asserts the route was rejected (not accepted) and that logs include actionable diagnostics with 503/upstream context.
Test registry and parity metadata `test/e2e/docs/parity-inventory.generated.json`, `test/e2e/docs/parity-map.yaml`	Adds the new script entry with four assertions and updates totals; inserts the script into parity map with deferred status and bucket metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#3410: Both PRs extend the same regression-e2e.yaml workflow’s manual dispatch/regression job selection mechanism by adding a new gated E2E guard job.
NVIDIA/NemoClaw#3411: Extends workflow_dispatch selection outputs and adds a gated regression E2E job under the same regression dispatcher; shares workflow pattern and structure.

Suggested labels

enhancement: testing, enhancement: inference

Suggested reviewers

cv
ericksoa

Poem

🐰 A rabbit hops in with a patch and a test,
Poking at modules to see what’s suppressed.
When 503 thunders and routes go astray,
The smoke test will shout and point out the way.
Logs bloom with hints — the fix can’t delay.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding an end-to-end regression guard for onboard inference smoke testing related to issue `#3253`.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-3253-onboard-inference-smoke-test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-15T14:24:55Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: onboard-inference-smoke-e2e

Dispatch hint: onboard-inference-smoke-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No merge-blocking E2E is required because the PR changes only E2E/CI coverage, parity metadata, and acceptance documentation; it does not alter installer, onboarding, credentials, sandbox lifecycle, security policy, deployment, or runtime inference code. The new regression job can be run manually as an optional validation of the guard itself.

Optional E2E

onboard-inference-smoke-e2e (low): Run optionally to validate the newly added regression workflow wiring and script behavior. This job is a failing-test-first guard for [Brev][Onboard] Onboard reports SUCCESS without exercising inference path — config bugs reach end users instead of being caught #3253 and may be expected to fail until the product fix lands, so it should not be treated as merge-blocking for this coverage-only PR.

New E2E recommendations

None.

Dispatch hint

Workflow: regression-e2e.yaml
jobs input: onboard-inference-smoke-e2e

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ACCEPTANCE.md`:
- Line 1: The top-level heading uses H2 ("## Definition of Done — E2E Regression
Guard") which violates MD041; change it to an H1 by replacing the leading "##"
with "#" so the document starts with "# Definition of Done — E2E Regression
Guard" to satisfy markdownlint.
- Around line 7-8: Remove the emoji from the technical acceptance bullet lines
by replacing "🔴 RED" and "🟢 GREEN" with plain text "RED" and "GREEN"
respectively (update the two lines that currently read "🔴 RED on
main-equivalent unfixed code: ..." and "🟢 GREEN on the fix branch: pending");
also scan other bullets in ACCEPTANCE.md for any remaining emoji in technical
prose and convert them to plain text markers to comply with the "No emoji in
technical prose" guideline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5d6fe31-5652-4ef9-a6a2-ba18e2da2fb7

📥 Commits

Reviewing files that changed from the base of the PR and between 88e3936 and da3f9fc.

📒 Files selected for processing (3)

.github/workflows/regression-e2e.yaml
ACCEPTANCE.md
test/e2e/test-onboard-inference-smoke.sh

…ference-smoke-test # Conflicts: # test/e2e/docs/parity-inventory.generated.json

wscurran · 2026-05-15T15:11:17Z

✨ Related open issues:

#3253 [Brev][Onboard] Onboard reports SUCCESS without exercising inference path — config bugs reach end users instead of being caught

…ference-smoke-test # Conflicts: # test/e2e/docs/parity-inventory.generated.json

Removes leftover /vd_spec workflow artifact committed by PR #3595.\n\nChecked recent open and merged PRs for root spec artifacts (ACCEPTANCE.md and related vd_spec-style files); only ACCEPTANCE.md was present on main.  ## Summary by CodeRabbit * **Documentation** * Removed outdated acceptance criteria documentation to streamline and maintain documentation clarity for users.

## Summary Fixes #3253 by adding a post-route onboard inference smoke check before setup reports success. After `openshell inference set` and route verification, OpenAI-compatible onboard providers now run a one-shot chat completions probe against the configured endpoint/model. If the probe fails, onboard exits before the success summary and prints actionable diagnostics: provider, model, API base, credential env, and upstream probe error. ## Validation - `npm run build:cli` - `bash test/e2e/test-onboard-inference-smoke.sh` - `npm test -- --run test/onboard.test.ts` ## Regression guard Guard PR #3595 is merged. The fix is complete when this branch flips `onboard-inference-smoke-e2e` green: ```bash gh workflow run regression-e2e.yaml --repo NVIDIA/NemoClaw -f jobs=onboard-inference-smoke-e2e --ref fix-3253-onboard-inference-smoke ``` Related: #3253, #3595  ## Summary by CodeRabbit * **New Features** * Added an extra onboard inference smoke check during startup to validate configured inference endpoints and credentials. * Introduced lightweight probing that resolves credentials, skips probes during test runs, logs redacted error context on failure, and stops startup if verification fails.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3603)

docs: record e2e acceptance for issue 3253

da3f9fc

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread ACCEPTANCE.md Outdated

Comment thread ACCEPTANCE.md Outdated

jyaunches added 4 commits May 15, 2026 10:59

fix(e2e): register inference smoke guard

fe4f804

style(e2e): format inference smoke guard

55798f6

Merge remote-tracking branch 'origin/main' into issue-3253-onboard-in…

b6026b2

…ference-smoke-test # Conflicts: # test/e2e/docs/parity-inventory.generated.json

fix(e2e): mark inference smoke guard executable

6af542d

wscurran added CI/CD labels May 15, 2026

jyaunches added 2 commits May 15, 2026 11:22

Merge remote-tracking branch 'origin/main' into issue-3253-onboard-in…

1dc8ba0

…ference-smoke-test # Conflicts: # test/e2e/docs/parity-inventory.generated.json

fix(e2e): refresh regression guard metadata

ed4ca99

cv approved these changes May 15, 2026

View reviewed changes

jyaunches merged commit da1ad74 into main May 15, 2026
28 checks passed

This was referenced May 15, 2026

chore: remove vd spec artifact #3602

Merged

fix(onboard): smoke test inference before success #3603

Merged

coderabbitai Bot mentioned this pull request May 15, 2026

test(e2e): add model router routed guard for #3255 #3601

Merged

12 tasks

This was referenced May 18, 2026

test(e2e): add openclaw plugin EXDEV guard for #3513 #3761

Merged

feat(e2e): migrate baseline onboarding coverage #3897

Merged

wscurran added area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance and removed CI/CD labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): add onboard inference smoke guard for #3253#3595

test(e2e): add onboard inference smoke guard for #3253#3595
jyaunches merged 8 commits into
mainfrom
issue-3253-onboard-inference-smoke-test

jyaunches commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jyaunches commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Expected status

Workflow

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyaunches commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading