Skip to content

test(e2e): add onboard inference smoke guard for #3253#3595

Merged
jyaunches merged 8 commits into
mainfrom
issue-3253-onboard-inference-smoke-test
May 15, 2026
Merged

test(e2e): add onboard inference smoke guard for #3253#3595
jyaunches merged 8 commits into
mainfrom
issue-3253-onboard-inference-smoke-test

Conversation

@jyaunches

@jyaunches jyaunches commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a failing-test-first regression guard for #3253.

onboard-inference-smoke-e2e asserts that onboard must not accept a provider/model route that is merely configured but cannot serve a real chat completion. The test simulates a configured OpenShell inference route whose runtime /chat/completions path would return HTTP 503.

Expected status

This guard is expected to fail on unfixed main-equivalent code. That is intentional: it is the executable acceptance criterion for the #3253 fix.

Expected failure fragment on unfixed code:

setupInference() accepted a configured route without proving the chat/completions path; onboard would later print Installation complete while the first real request returns HTTP 503 (#3253)

Workflow

  • Job: onboard-inference-smoke-e2e
  • Workflow: .github/workflows/regression-e2e.yaml
  • Dispatch:
gh workflow run regression-e2e.yaml --repo NVIDIA/NemoClaw -f jobs=onboard-inference-smoke-e2e --ref issue-3253-onboard-inference-smoke-test

This job is not scheduled nightly. Promotion to nightly should be an explicit later decision after the fix lands and the guard is stable.

Related: #3253

Summary by CodeRabbit

  • Tests

    • Added an end-to-end onboarding inference smoke test to detect misconfigured or runtime-broken inference routes and verify actionable diagnostics on failure.
  • Documentation

    • Added an E2E regression guard to acceptance criteria and documented how to run the onboarding-inference smoke check.
  • Chores

    • CI workflow extended with a new dispatch option and regression job to run the smoke test and upload logs on failures.

Review Change Stack

Adds a failing E2E test that demonstrates the bug tracked by #3253.

Until the fix lands, the regression-e2e onboard-inference-smoke-e2e job will fail. This is intentional: the failing test is the executable acceptance criterion.

Related: #3253
@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0b491a25-6b80-4391-b80d-72cadcf8cfef

📥 Commits

Reviewing files that changed from the base of the PR and between 1dc8ba0 and ed4ca99.

📒 Files selected for processing (2)
  • ACCEPTANCE.md
  • test/e2e/docs/parity-inventory.generated.json
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/docs/parity-inventory.generated.json

📝 Walkthrough

Walkthrough

Adds a gated regression job onboard-inference-smoke-e2e, documents the acceptance guard, and provides a Bash E2E smoke test that monkey-patches Node module loading to simulate HTTP 503 from an OpenAI-compatible provider, verifying setupInference() rejects broken endpoints and surfaces diagnostics.

Changes

Onboard inference setup error handling E2E test

Layer / File(s) Summary
Workflow job registration and selection logic
.github/workflows/regression-e2e.yaml
workflow_dispatch validation updated to include onboard-inference-smoke-e2e; select_regression_jobs exports onboard_inference_smoke and computes it from dispatch input; new gated job onboard-inference-smoke-e2e runs the smoke script and uploads logs on failure.
Acceptance criteria documentation
ACCEPTANCE.md
Adds a “Definition of Done — E2E Regression Guard” describing the onboard-inference-smoke-e2e job, how to run it, and the expected RED/GREEN behavior tied to the #3253 HTTP 503 failure mode.
E2E test structure and environment setup
test/e2e/test-onboard-inference-smoke.sh
Test header, strict Bash initialization and logging, repo path setup, npm ci when needed, and npm run build:cli before running the smoke Node heredoc with NEMOCLAW_* env vars.
E2E test module loading monkey-patch
test/e2e/test-onboard-inference-smoke.sh
In the Node heredoc, patches Module._load to stub ./adapters/openshell/resolve, intercept ./runner, and stub provider/registry modules to simulate successful setup but force chat/completions requests to return HTTP 503.
E2E test execution, outcome markers and assertions
test/e2e/test-onboard-inference-smoke.sh
Calls onboard.setupInference() with test inputs, prints resolved/rejected markers and recorded calls, captures exit code, asserts the route was rejected (not accepted) and that logs include actionable diagnostics with 503/upstream context.
Test registry and parity metadata
test/e2e/docs/parity-inventory.generated.json, test/e2e/docs/parity-map.yaml
Adds the new script entry with four assertions and updates totals; inserts the script into parity map with deferred status and bucket metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3410: Both PRs extend the same regression-e2e.yaml workflow’s manual dispatch/regression job selection mechanism by adding a new gated E2E guard job.
  • NVIDIA/NemoClaw#3411: Extends workflow_dispatch selection outputs and adds a gated regression E2E job under the same regression dispatcher; shares workflow pattern and structure.

Suggested labels

enhancement: testing, enhancement: inference

Suggested reviewers

  • cv
  • ericksoa

Poem

🐰 A rabbit hops in with a patch and a test,
Poking at modules to see what’s suppressed.
When 503 thunders and routes go astray,
The smoke test will shout and point out the way.
Logs bloom with hints — the fix can’t delay.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding an end-to-end regression guard for onboard inference smoke testing related to issue #3253.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-3253-onboard-inference-smoke-test

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: onboard-inference-smoke-e2e

Dispatch hint: onboard-inference-smoke-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because the PR changes only E2E/CI coverage, parity metadata, and acceptance documentation; it does not alter installer, onboarding, credentials, sandbox lifecycle, security policy, deployment, or runtime inference code. The new regression job can be run manually as an optional validation of the guard itself.

Optional E2E

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: regression-e2e.yaml
  • jobs input: onboard-inference-smoke-e2e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ACCEPTANCE.md`:
- Line 1: The top-level heading uses H2 ("## Definition of Done — E2E Regression
Guard") which violates MD041; change it to an H1 by replacing the leading "##"
with "#" so the document starts with "# Definition of Done — E2E Regression
Guard" to satisfy markdownlint.
- Around line 7-8: Remove the emoji from the technical acceptance bullet lines
by replacing "🔴 RED" and "🟢 GREEN" with plain text "RED" and "GREEN"
respectively (update the two lines that currently read "🔴 RED on
main-equivalent unfixed code: ..." and "🟢 GREEN on the fix branch: pending");
also scan other bullets in ACCEPTANCE.md for any remaining emoji in technical
prose and convert them to plain text markers to comply with the "No emoji in
technical prose" guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5d6fe31-5652-4ef9-a6a2-ba18e2da2fb7

📥 Commits

Reviewing files that changed from the base of the PR and between 88e3936 and da3f9fc.

📒 Files selected for processing (3)
  • .github/workflows/regression-e2e.yaml
  • ACCEPTANCE.md
  • test/e2e/test-onboard-inference-smoke.sh

Comment thread ACCEPTANCE.md Outdated
Comment thread ACCEPTANCE.md Outdated
@wscurran

Copy link
Copy Markdown
Contributor

jyaunches added 2 commits May 15, 2026 11:22
…ference-smoke-test

# Conflicts:
#	test/e2e/docs/parity-inventory.generated.json
@jyaunches jyaunches merged commit da1ad74 into main May 15, 2026
28 checks passed
cv pushed a commit that referenced this pull request May 15, 2026
Removes leftover /vd_spec workflow artifact committed by PR
#3595.\n\nChecked recent open and merged PRs for root spec artifacts
(ACCEPTANCE.md and related vd_spec-style files); only ACCEPTANCE.md was
present on main.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Removed outdated acceptance criteria documentation to streamline and
maintain documentation clarity for users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
jyaunches added a commit that referenced this pull request May 15, 2026
## Summary

Fixes #3253 by adding a post-route onboard inference smoke check before
setup reports success.

After `openshell inference set` and route verification,
OpenAI-compatible onboard providers now run a one-shot chat completions
probe against the configured endpoint/model. If the probe fails, onboard
exits before the success summary and prints actionable diagnostics:
provider, model, API base, credential env, and upstream probe error.

## Validation

- `npm run build:cli`
- `bash test/e2e/test-onboard-inference-smoke.sh`
- `npm test -- --run test/onboard.test.ts`

## Regression guard

Guard PR #3595 is merged. The fix is complete when this branch flips
`onboard-inference-smoke-e2e` green:

```bash
gh workflow run regression-e2e.yaml --repo NVIDIA/NemoClaw -f jobs=onboard-inference-smoke-e2e --ref fix-3253-onboard-inference-smoke
```

Related: #3253, #3595


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an extra onboard inference smoke check during startup to
validate configured inference endpoints and credentials.
* Introduced lightweight probing that resolves credentials, skips probes
during test runs, logs redacted error context on failure, and stops
startup if verification fails.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3603)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance and removed CI/CD labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants