fix(onboard): suppress provider cleanup probe output by sandl99 · Pull Request #5030 · NVIDIA/NemoClaw

sandl99 · 2026-06-09T06:48:31Z

Summary

Suppress expected OpenShell provider-cleanup probe output during onboarding so fresh sandbox creation no longer prints repeated sandbox not found errors before the real create step. The cleanup still captures and classifies OpenShell failures internally, preserving the stale-provider recovery behavior added for rebuild and resume flows.

Related Issue

Closes #5025

Changes

Add suppressOutput: true to captured provider detach/delete cleanup calls in src/lib/onboard/sandbox-provider-cleanup.ts.
Add regression coverage proving tolerated missing-sandbox detach probes are quiet while still attempted for every provider suffix.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional verification:

npm run build:cli passes
npx vitest run test/sandbox-provider-cleanup.test.ts passes
npx prek run --all-files was attempted but failed in unrelated existing CLI hook failures/timeouts outside this change.

Signed-off-by: San Dang sdang@nvidia.com

Summary by CodeRabbit

Tests
- Added test coverage for sandbox provider cleanup operations to validate proper handling when sandboxes are missing.
Refactor
- Enhanced command output handling in sandbox provider cleanup processes for improved reliability.

Signed-off-by: San Dang <sdang@nvidia.com>

coderabbitai · 2026-06-09T06:48:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0b1d8337-652d-4815-958d-71efec7f5323

📥 Commits

Reviewing files that changed from the base of the PR and between 3048632 and e1baa08.

📒 Files selected for processing (2)

src/lib/onboard/sandbox-provider-cleanup.ts
test/sandbox-provider-cleanup.test.ts

📝 Walkthrough

Walkthrough

Provider cleanup functions now pass suppressOutput: true when detaching or deleting sandbox providers. The change affects detachSandboxProviders, recoverAttachedProvider, and deleteProviderWithRecovery, and is validated by a new test case for missing sandbox tolerance.

Changes

Output suppression for provider cleanup

Layer / File(s)	Summary
Suppress detach and delete command output `src/lib/onboard/sandbox-provider-cleanup.ts`	Add `suppressOutput: true` option to four `runOpenshell` calls in `detachSandboxProviders`, `recoverAttachedProvider` (detach), and `deleteProviderWithRecovery` (two delete attempts). These calls already ignore errors and pipe stdout/stderr.
Test output suppression with missing sandbox tolerance `test/sandbox-provider-cleanup.test.ts`	Add test case verifying that when `tolerateMissingSandbox: true` is set, detach operations with missing sandboxes do not fail and include both `ignoreError: true` and `suppressOutput: true` in the options.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

NVIDIA/NemoClaw#4943: The main PR's change to pass suppressOutput: true through runOpenshell in detachSandboxProviders, recoverAttachedProvider, and deleteProviderWithRecovery is directly aligned with the retrieved PR's sandbox-provider cleanup implementation of those same functions and their detach/delete invocation paths.

Suggested labels

area: sandbox, bug-fix

Poem

🐰 When sandboxes are lost without a trace,
We quietly clean without leaving a face,
Suppress the errors, let silence prevail,
A gentle goodbye with a whisper, not wail! 🌙

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: suppressing provider cleanup probe output during onboarding, which directly addresses the issue of repeated 'sandbox not found' errors.
Linked Issues check	✅ Passed	The pull request fully implements the requirements from issue `#5025` by suppressing the noisy 'sandbox not found' error output during provider cleanup probes while maintaining internal failure tracking and recovery behavior.
Out of Scope Changes check	✅ Passed	All changes are directly related to the objective of suppressing provider cleanup probe output; modifications to sandbox-provider-cleanup.ts and test coverage are appropriately scoped to the issue.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/quiet-onboard-provider-cleanup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T06:52:07Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Provider cleanup quieting around existing OpenShell sandbox/provider lifecycle workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `src/lib/onboard/sandbox-provider-cleanup.ts` documents OpenShell as the source boundary and adds suppression at the cleanup subprocess calls; `test/sandbox-provider-cleanup.test.ts` covers only the tolerated missing-sandbox detach suppression case.
Add coverage for all newly quiet provider cleanup paths (test/sandbox-provider-cleanup.test.ts:172): The patch adds `suppressOutput: true` to `detachSandboxProviders()`, `recoverAttachedProvider()`, and both `deleteProviderWithRecovery()` provider-delete attempts, but the added regression test only asserts option forwarding for `detachSandboxProviders()`. This leaves the credential-adjacent provider-delete recovery path unguarded against future regressions, and the linked issue's upgrade/onboard terminal-noise acceptance is only proven at the unit option-forwarding level rather than by runtime terminal-output validation.
- Recommendation: Add behavior-specific tests asserting `recoverAttachedProvider()` forwards `suppressOutput: true` while still collecting non-tolerated failures, and `deleteProviderWithRecovery()` suppresses the initial delete, recovery detach, and retry delete calls. Consider a targeted runtime/integration validation that a fresh onboard tolerated missing-sandbox cleanup captures OpenShell `NotFound` stderr without writing it to user terminal output.
- Evidence: Production changes add `suppressOutput: true` at `src/lib/onboard/sandbox-provider-cleanup.ts:118`, `:177`, `:271`, and `:283`. The new test beginning around `test/sandbox-provider-cleanup.test.ts:172` checks only tolerated missing-sandbox detach probes over `SANDBOX_PROVIDER_SUFFIXES`; existing recovery/delete tests assert argv and outcomes but not options.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — recoverAttachedProvider passes suppressOutput true on each recovery detach while still returning non-tolerated failures. This is sandbox/OpenShell infrastructure code where the user-visible bug is terminal output from subprocess stderr. Unit option-forwarding coverage is useful but currently incomplete across changed paths, and runtime-style validation would better prove the linked onboarding noise is gone.
**Runtime validation** — deleteProviderWithRecovery passes suppressOutput true on the initial provider delete when no recovery is needed. This is sandbox/OpenShell infrastructure code where the user-visible bug is terminal output from subprocess stderr. Unit option-forwarding coverage is useful but currently incomplete across changed paths, and runtime-style validation would better prove the linked onboarding noise is gone.
**Runtime validation** — deleteProviderWithRecovery passes suppressOutput true on initial delete, recovery detach, and retry delete after an attached-sandbox diagnostic. This is sandbox/OpenShell infrastructure code where the user-visible bug is terminal output from subprocess stderr. Unit option-forwarding coverage is useful but currently incomplete across changed paths, and runtime-style validation would better prove the linked onboarding noise is gone.
**Runtime validation** — fresh onboard tolerated missing-sandbox provider cleanup captures NotFound stderr without writing it to user terminal output. This is sandbox/OpenShell infrastructure code where the user-visible bug is terminal output from subprocess stderr. Unit option-forwarding coverage is useful but currently incomplete across changed paths, and runtime-style validation would better prove the linked onboarding noise is gone.
**Add coverage for all newly quiet provider cleanup paths** — Add behavior-specific tests asserting `recoverAttachedProvider()` forwards `suppressOutput: true` while still collecting non-tolerated failures, and `deleteProviderWithRecovery()` suppresses the initial delete, recovery detach, and retry delete calls. Consider a targeted runtime/integration validation that a fresh onboard tolerated missing-sandbox cleanup captures OpenShell `NotFound` stderr without writing it to user terminal output.
**Acceptance clause:** Issue "NotFound" error while onboarding #5025 Description: "After upgrading to v0.0.61 in onboard meessages there are a lot of errors like:" followed by repeated `status: NotFound, message: "sandbox not found"` diagnostics and metadata. — add test evidence or identify existing coverage. The fresh-onboard cleanup path now calls OpenShell with `stdio: ["ignore", "pipe", "pipe"]` and `suppressOutput: true`, and the added unit test verifies tolerated missing-sandbox detach probes are attempted for every provider suffix with `suppressOutput: true`. No runtime terminal-output validation is included.
**Acceptance clause:** Issue "NotFound" error while onboarding #5025: "In v0.0.60 hadn't such errors." — add test evidence or identify existing coverage. The patch targets the v0.0.61-era provider cleanup output path by suppressing captured OpenShell command output. The diff does not include cross-version installer/upgrade validation.
**Acceptance clause:** Issue "NotFound" error while onboarding #5025 Reproduction Steps 1. `Run curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.60 bash` — add test evidence or identify existing coverage. No installer or cross-version setup validation is added in this diff; changed files are limited to provider cleanup code and its unit test.

Since last review details

Current findings:

Source-of-truth review needed: Provider cleanup quieting around existing OpenShell sandbox/provider lifecycle workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `src/lib/onboard/sandbox-provider-cleanup.ts` documents OpenShell as the source boundary and adds suppression at the cleanup subprocess calls; `test/sandbox-provider-cleanup.test.ts` covers only the tolerated missing-sandbox detach suppression case.
Add coverage for all newly quiet provider cleanup paths (test/sandbox-provider-cleanup.test.ts:172): The patch adds `suppressOutput: true` to `detachSandboxProviders()`, `recoverAttachedProvider()`, and both `deleteProviderWithRecovery()` provider-delete attempts, but the added regression test only asserts option forwarding for `detachSandboxProviders()`. This leaves the credential-adjacent provider-delete recovery path unguarded against future regressions, and the linked issue's upgrade/onboard terminal-noise acceptance is only proven at the unit option-forwarding level rather than by runtime terminal-output validation.
- Recommendation: Add behavior-specific tests asserting `recoverAttachedProvider()` forwards `suppressOutput: true` while still collecting non-tolerated failures, and `deleteProviderWithRecovery()` suppresses the initial delete, recovery detach, and retry delete calls. Consider a targeted runtime/integration validation that a fresh onboard tolerated missing-sandbox cleanup captures OpenShell `NotFound` stderr without writing it to user terminal output.
- Evidence: Production changes add `suppressOutput: true` at `src/lib/onboard/sandbox-provider-cleanup.ts:118`, `:177`, `:271`, and `:283`. The new test beginning around `test/sandbox-provider-cleanup.test.ts:172` checks only tolerated missing-sandbox detach probes over `SANDBOX_PROVIDER_SUFFIXES`; existing recovery/delete tests assert argv and outcomes but not options.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-09T06:52:19Z

E2E Advisor Recommendation

Required E2E: messaging-providers-e2e, double-onboard-e2e, onboard-repair-e2e
Optional E2E: brave-search-e2e, onboard-resume-e2e, sandbox-operations-e2e

Dispatch hint: messaging-providers-e2e,double-onboard-e2e,onboard-repair-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

messaging-providers-e2e (high (~75 min timeout, live Docker/OpenShell sandbox)): Exercises real onboarding with per-sandbox messaging providers, placeholder credentials, provider attachment, and cleanup paths affected by sandbox-provider-cleanup.ts.
double-onboard-e2e (high (~90 min timeout, multiple sandbox lifecycle phases)): Covers repeated onboard/recreate for the same sandbox and explicit destroy cleanup, directly exercising pre-delete provider detach probes added to the recreate/destroy lifecycle.
onboard-repair-e2e (medium-high (~60 min timeout, live onboarding repair)): Covers resume/repair after the recorded sandbox is pruned from OpenShell, the path where missing-sandbox detach probes are tolerated and now suppress OpenShell output.

Optional E2E

brave-search-e2e (medium-high (live Brave/search onboarding path)): Optional adjacent coverage for the brave-search provider suffix included in SANDBOX_PROVIDER_SUFFIXES; useful if the change owner wants search-provider-specific cleanup confidence.
onboard-resume-e2e (medium-high (~60 min timeout, live resume)): Optional companion to onboard-repair-e2e for interrupted onboard resume behavior; less directly targeted than repair because it mostly skips existing sandbox/gateway work.
sandbox-operations-e2e (medium-high (~60 min timeout, multi-sandbox operations)): Optional broader destroy/lifecycle confidence for sandbox deletion and registry cleanup, though it does not focus on attached provider recovery.

New E2E recommendations

provider cleanup output suppression (medium): Existing live E2Es cover provider lifecycle broadly, but none appears to assert the specific regression this PR targets: tolerated missing-sandbox/provider-detach probes and provider delete recovery should not print raw OpenShell diagnostics while still preserving captured/redacted failure output.
- Suggested test: Add a targeted live provider-cleanup E2E that creates attached messaging/search providers, prunes or deletes the sandbox to leave stale attachment conditions, reruns onboard/destroy, and asserts cleanup remains quiet for tolerated detach/delete failures while non-tolerated diagnostics are redacted and surfaced.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: messaging-providers-e2e,double-onboard-e2e,onboard-repair-e2e

github-actions · 2026-06-09T06:52:20Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: ubuntu-repo-cloud-openclaw-double-same-provider, ubuntu-repo-cloud-openclaw-telegram

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: The changed onboard sandbox-provider cleanup helper is called during normal cloud OpenClaw onboarding before sandbox creation/deletion. This scenario is the smallest standard Ubuntu repo-checkout path that exercises the onboard flow and smoke-validates the resulting sandbox.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

ubuntu-repo-cloud-openclaw-double-same-provider: Optional adjacent coverage for provider replacement/reuse flows, which are closer to the deleteProviderWithRecovery path touched by this helper change.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-same-provider
ubuntu-repo-cloud-openclaw-telegram: Optional adjacent coverage for a messaging-provider onboarding path; the cleanup helper detaches per-sandbox messaging providers and this scenario validates a concrete messaging provider attachment flow.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-telegram

Relevant changed files

src/lib/onboard/sandbox-provider-cleanup.ts

## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests.  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples.  --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>

fix(onboard): suppress provider cleanup probe output

e1baa08

Signed-off-by: San Dang <sdang@nvidia.com>

sandl99 self-assigned this Jun 9, 2026

sandl99 added area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression VRDC Issues and PRs submitted by NVIDIA VRDC test team. labels Jun 9, 2026

cv approved these changes Jun 9, 2026

View reviewed changes

cv added the v0.0.62 Release target label Jun 9, 2026

sandl99 requested a review from laitingsheng June 9, 2026 06:50

Merge branch 'main' into fix/quiet-onboard-provider-cleanup

597b743

cv enabled auto-merge (squash) June 9, 2026 06:59

cv merged commit 38c03b1 into main Jun 9, 2026
35 checks passed

cv deleted the fix/quiet-onboard-provider-cleanup branch June 9, 2026 07:00

miyoungc mentioned this pull request Jun 10, 2026

docs: refresh v0.0.62 release docs #5157

Merged

Conversation

sandl99 commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sandl99 commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading