fix(cli): probe live sandbox agent versions by ChunkyMonkey11 · Pull Request #4550 · NVIDIA/NemoClaw

ChunkyMonkey11 · 2026-05-29T21:44:12Z

Summary

NemoClaw now verifies the OpenClaw version actually running inside live sandboxes before reporting status or deciding whether an upgrade is needed. This prevents reused sandboxes from being marked up to date only because cached host metadata matches the current expected version.

Related Issue

Fixes #4429

Changes

Force live agent version probing for running sandboxes in status and upgrade-sandboxes.
Preserve the previously recorded sandbox agent version when reusing an existing sandbox instead of overwriting it with the current expected version.
Treat unavailable SSH config output as an unavailable runtime probe instead of spawning ssh with an empty config.
Add regression coverage for status output, upgrade classification, runtime probing fallback, and reused sandbox metadata preservation.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional checks run:

npm run build:cli
npm run typecheck:cli
npx vitest run src/lib/onboard/sandbox-registry-metadata.test.ts src/lib/sandbox/version.test.ts src/lib/domain/maintenance/upgrade.test.ts src/lib/actions/gateway-drift-preflight.test.ts src/lib/actions/sandbox/status.test.ts
git diff --check upstream/main...HEAD

Note: local full hook execution was not completed because the repo's CLI coverage hook recursively invoked itself through a temporary git commit in a test fixture. The branch was pushed with local hooks skipped after the focused checks above passed.

Signed-off-by: Revant Patel revant.h.patel@gmail.com

Summary by CodeRabbit

New Features
- Status shows probed live agent versions and gives clearer guidance when verification is forced or unavailable.
- Upgrade checks probe running sandboxes first before classifying staleness.
- Registry updates preserve cached agentVersion while updating other agent fields.
Bug Fixes
- Do not trust empty or unavailable probe responses; omit stale cached versions from status.
- Runtime readiness now recognizes both "Ready" and "Running" phases.
Tests
- Expanded tests for probing behavior, probe-first upgrade checks, registry metadata reuse, and CLI/status output.

Signed-off-by: Revant Patel <revant.h.patel@gmail.com>

copy-pr-bot · 2026-05-29T21:44:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-29T21:44:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c91ba797-ef46-497a-8940-bbc823ead7ab

📥 Commits

Reviewing files that changed from the base of the PR and between 37eaa03 and b59404d.

📒 Files selected for processing (1)

test/cli.test.ts

📝 Walkthrough

Walkthrough

Adds a VersionCheckOptions contract, treats empty SSH probe output as failure, conditionally forces live SSH probing for running sandboxes in status and upgrade flows, preserves cached registry agentVersion during metadata updates, broadens readiness parsing, and adds tests and CLI cases validating probe-first behavior.

Changes

Live sandbox version probing and drift detection

Layer / File(s)	Summary
Version check options contract and empty output handling `src/lib/sandbox/version.ts`	Adds exported `VersionCheckOptions { forceProbe?: boolean; skipProbe?: boolean }`, updates `checkAgentVersion` signature, and treats empty SSH-config output as probe failure.
Version check probing test coverage `src/lib/sandbox/version.test.ts`	New test ensures `forceProbe` does not fall back to cached agentVersion when SSH probing yields no data; detection becomes `unavailable`, `sandboxVersion` is `null`, and `isStale` is `false`.
Status command conditional probing `src/lib/actions/sandbox/status.ts`	Adds `shouldProbeSandboxRuntimeVersion`, passes `forceProbe`/`skipProbe` into `checkAgentVersion`, prints live `sandboxVersion` only when verified, and emits "version not verified" / "unable to verify" lines when forced probing is inconclusive.
Upgrade command conditional probing `src/lib/actions/upgrade-sandboxes.ts`	Wraps `checkAgentVersion` to pass `{ forceProbe: true }` for sandboxes present in the live sandbox name set, enabling probe-first staleness classification for running sandboxes.
Registry metadata agent field handling `src/lib/onboard/sandbox-registry-metadata.ts`	Refactors `updateReusedSandboxMetadata` to compute `agent` fields separately and explicitly set `agentVersion` to `existingEntry?.agentVersion ?? null`.
Registry metadata test infrastructure and reuse validation `src/lib/onboard/sandbox-registry-metadata.test.ts`	Switches to async `makeHelpers` dynamic import, adds `openclawAgent` helper, adds test verifying model/provider updates while preserving existing `agentVersion`, and updates existing runtime-field tests to await helpers.
Runtime readiness parsing `src/lib/runtime-recovery.ts`, `src/lib/runtime-recovery.test.ts`	`parseReadySandboxNames` now treats parsed phases `Ready` or `Running` as live and skips `NotReady`; tests updated to include Running and to ignore phase-like tokens outside the PHASE column.
CLI tests for live probing behavior `test/cli.test.ts`	Adds optional `agentVersion` to `SandboxEntry`, adds status test asserting live SSH-reported agent version is displayed (not cached), and updates/adds `upgrade-sandboxes --check` tests to stub sandbox list/get, `ssh-config`, and `ssh` for probe-first checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#3859: Related changes to SSH-config probing guards used by live version detection.
NVIDIA/NemoClaw#4256: Related change to readiness parsing in runtime recovery.

Suggested labels

fix, NemoClaw CLI, Sandbox, Integration: OpenClaw, v0.0.55

Suggested reviewers

ericksoa
jyaunches
cv

"🐰 I hop through sandboxes, nose to the shell,
I probe for versions and listen quite well.
When probes go silent and cached claims mislead,
I flag the drift and nudge a rebuild deed.
Tests steady my paws; CI helps me excel."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(cli): probe live sandbox agent versions' directly reflects the main objective: adding live probing of agent versions instead of relying solely on cached metadata.
Linked Issues check	✅ Passed	The PR addresses all coding requirements from issue `#4429`: forces live agent version probing in status and upgrade-sandboxes commands, preserves cached agent versions for reused sandboxes, treats unavailable SSH config as failed probe, and includes regression tests for new behavior.
Out of Scope Changes check	✅ Passed	All changes are directly related to live agent version probing: runtime-recovery.ts updates support proper sandbox state detection, version.ts/test.ts implement probing logic, CLI tests validate probe-first behavior, and sandbox-registry-metadata changes preserve existing agent versions when reusing sandboxes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Revant Patel <revant.h.patel@gmail.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/actions/sandbox/status.ts (1)
314-342: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fix status agent version probing so it reports live versions when cached agentVersion is missing.

In src/lib/actions/sandbox/status.ts, shouldProbeSandboxRuntimeVersion(...) returns lookup.state === "present" && Boolean(sandbox.agentVersion), so when sandbox.agentVersion is null status sets skipProbe: true and checkAgentVersion returns sandboxVersion: null. The logging only prints an Agent: line when versionCheck.sandboxVersion exists, so running sandboxes with missing cached metadata can produce no agent version output.

This is inconsistent with src/lib/actions/upgrade-sandboxes.ts, which forces { forceProbe: true } for all live sandboxes (liveNames.has(sandboxName)), independent of cached agentVersion, so rebuild/upgrade can determine the actual live agent version.

Align status with upgrade by probing for running sandboxes even when sandbox.agentVersion is null (e.g., remove the Boolean(sandbox.agentVersion) gating from shouldProbeSandboxRuntimeVersion).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/status.ts` around lines 314 - 342, The status code
currently sets shouldProbeRuntimeVersion = lookup.state === "present" &&
Boolean(sb.agentVersion), which prevents probing live sandboxes when cached
sb.agentVersion is null; change the logic so shouldProbeRuntimeVersion is true
for any live sandbox (i.e., remove the Boolean(sb.agentVersion) gating), and
pass that into sandboxVersion.checkAgentVersion (forceProbe:
shouldProbeRuntimeVersion, skipProbe: !shouldProbeRuntimeVersion) so live
sandboxes are probed for their actual runtime agent version even when cached
metadata is missing; update references to shouldProbeRuntimeVersion,
lookup.state, sb.agentVersion, and the sandboxVersion.checkAgentVersion call
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/lib/actions/sandbox/status.ts`:
- Around line 314-342: The status code currently sets shouldProbeRuntimeVersion
= lookup.state === "present" && Boolean(sb.agentVersion), which prevents probing
live sandboxes when cached sb.agentVersion is null; change the logic so
shouldProbeRuntimeVersion is true for any live sandbox (i.e., remove the
Boolean(sb.agentVersion) gating), and pass that into
sandboxVersion.checkAgentVersion (forceProbe: shouldProbeRuntimeVersion,
skipProbe: !shouldProbeRuntimeVersion) so live sandboxes are probed for their
actual runtime agent version even when cached metadata is missing; update
references to shouldProbeRuntimeVersion, lookup.state, sb.agentVersion, and the
sandboxVersion.checkAgentVersion call accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05f54ea1-b125-4b1c-90d7-06bdf5cdfa82

📥 Commits

Reviewing files that changed from the base of the PR and between 182d3fb and a8b0dc9.

📒 Files selected for processing (2)

src/lib/actions/sandbox/status.ts
test/cli.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/cli.test.ts

Signed-off-by: Test User <test@example.com>

ChunkyMonkey11 · 2026-05-30T01:38:04Z

@cv I pushed a fix for the failing CI / Pull Request Vitest assertions. The issue was Running sandboxes not being treated as live for upgrade probing, plus a stale hard-coded OpenClaw expected version in the test. Targeted and full affected CLI tests pass locally.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/runtime-recovery.test.ts`:
- Around line 62-77: The test for parseReadySandboxNames needs to assert that
sandboxes with PHASE "NotReady" are excluded: update the test case inside the it
block that calls parseReadySandboxNames to include an extra line with a sandbox
whose PHASE is "NotReady" (e.g., "zeta ... NotReady") and ensure the expected
result array passed to toEqual does not contain that name; reference
parseReadySandboxNames and the current test that checks for
["alpha","epsilon","delta"] when adding the NotReady row and asserting it is
absent.

In `@src/lib/runtime-recovery.ts`:
- Around line 54-55: The current check uses cols.includes(...) which can match
tokens outside the PHASE column; instead extract the single PHASE token from
cols (e.g., let phase = cols[PHASE_INDEX] or parse the token known to represent
phase) and then evaluate it exactly: const isReadyOrRunning = phase === "Ready"
|| phase === "Running"; if (!isReadyOrRunning || phase === "NotReady") continue;
Update the logic that computes isReadyOrRunning and the subsequent check to use
the single phase variable (phase) rather than cols.includes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05d1011a-4a2a-476e-8e38-8f0ec1aac6d9

📥 Commits

Reviewing files that changed from the base of the PR and between 5e4d09f and 08fba22.

📒 Files selected for processing (3)

src/lib/runtime-recovery.test.ts
src/lib/runtime-recovery.ts
test/cli.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/cli.test.ts

Signed-off-by: Test User <test@example.com>

wscurran · 2026-06-01T14:51:13Z

✨ Thanks for submitting this detailed PR about probing live sandbox agent versions, which fixes the issue of reused sandboxes being marked as up to date due to cached host metadata. This proposes a way to verify the OpenClaw version running inside live sandboxes before reporting status or deciding whether an upgrade is needed.

Related open issues:

#4429 [WSL2 x86_64][CLI&UX] After nemoclaw upgrade, sandbox openclaw stays old; status silent on drift

## Summary - Adds the v0.0.56 release notes section with links to the deeper docs pages for installer, status, inference, messaging, policy, and lifecycle changes. - Updates source docs for the remaining release-prep gaps around `uv` in the PyPI preset, compact WhatsApp pairing guidance, and `nemoclaw inference set` command boundaries. - Refreshes generated `nemoclaw-user-*` skills and removes skipped experimental command terms from generated skill surfaces. ## Source summary - #4613 -> `docs/manage-sandboxes/lifecycle.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents that public installs and `nemoclaw update` follow the maintained `lkg` tag by default. - #4419 -> `docs/about/release-notes.mdx`: Notes that non-interactive Linux installs can reactivate Docker group membership and continue in one installer run when `sg docker` is available. - #4550 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures live sandbox agent-version probing for status, connect, and upgrade checks. - #4609 -> `docs/inference/use-local-inference.mdx`, `docs/about/release-notes.mdx`: Captures the GPU Docker-driver host-network local-inference reachability gate. - #4607 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents compact WhatsApp QR pairing guidance and gateway/session diagnostics. - #4582 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Reflects Slack credential validation before enabling the channel. - #4554 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Keeps Telegram allowlist alias guidance in the generated user skills and release notes. - #4563 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Includes the new `nemoclaw <name> skill remove <skill>` command in command docs and release notes. - #4566 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents the `nemoclaw inference set` redirect boundary when `--provider` or `--model` is missing. - #4323 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures per-sandbox status JSON support. - #4506 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures debug command sandbox-name validation and safer tarball writing. - #4569 -> `docs/network-policy/integration-policy-examples.mdx`, `docs/about/release-notes.mdx`: Documents that the `pypi` preset allows `/usr/local/bin/uv`. - #4579 -> `docs/network-policy/integration-policy-examples.mdx`, `docs/about/release-notes.mdx`: Captures observable Jira preset validation guidance. - #4229 -> `docs/manage-sandboxes/lifecycle.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents user-data preservation defaults for uninstall. - #4399 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures CPU-only sandbox intent preservation across rebuilds. - #4058 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures safer snapshot restore behavior around existing destinations. - #4155 and #4460 -> skipped by `docs/.docs-skip`: Removed skipped experimental command terms from source docs and generated skill evals instead of documenting those features. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports the pre-existing light-mode accent contrast warning) - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" .agents/skills` (no matches) - `npm run build:cli` (run to refresh local CLI artifacts for the pre-push TypeScript hook) - Commit hooks passed, including `NEMOCLAW_* env-var documentation gate`, `Verify docs-to-skills output`, `markdownlint-cli2`, `gitleaks`, and `Test (skills YAML)`.  ## Summary by CodeRabbit * **Documentation** * Expanded Model Router setup with YAML examples, flow diagrams, and credential handling; strengthened agent-config immutability and integrity guidance; messaging channels updated (Telegram aliases, WhatsApp pairing/diagnostics); CLI docs revised (GPU detection, inference set behavior, uninstall/rebuild preservation); overview rebranded to NemoClaw and added v0.0.56 release notes. * **New Features** * Added `nemoclaw <name> channels status` (messaging diagnostics, JSON); added `nemoclaw <name> skill remove`; Hermes no longer marked experimental; DGX Spark quickstart sandbox-name note.

fix(cli): probe live sandbox agent versions

182d3fb

Signed-off-by: Revant Patel <revant.h.patel@gmail.com>

ChunkyMonkey11 and others added 2 commits May 29, 2026 16:16

Merge branch 'main' into fix/probe-live-sandbox-agent-version

a8b0dc9

docs(cli): document sandbox version probe helpers

5e4d09f

Signed-off-by: Revant Patel <revant.h.patel@gmail.com>

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

cv added the v0.0.56 Release target label May 29, 2026

cv and others added 4 commits May 29, 2026 16:27

Merge branch 'main' into fix/probe-live-sandbox-agent-version

2bf10db

Merge branch 'main' into fix/probe-live-sandbox-agent-version

e2add47

Merge branch 'main' into fix/probe-live-sandbox-agent-version

b1cf125

fix(cli): probe running sandboxes for upgrades

08fba22

Signed-off-by: Test User <test@example.com>

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread src/lib/runtime-recovery.test.ts

Comment thread src/lib/runtime-recovery.ts Outdated

Test User added 2 commits May 29, 2026 18:41

fix(cli): parse sandbox phase exactly

5f03059

Signed-off-by: Test User <test@example.com>

test(cli): cover notready sandbox phase parsing

a0bcd53

Signed-off-by: Test User <test@example.com>

github-actions Bot mentioned this pull request May 30, 2026

fix(cli): surface docker/sandbox container/dashboard port failure layers in sandbox status #4388

Merged

12 tasks

ChunkyMonkey11 added 3 commits May 30, 2026 01:19

Merge branch 'main' into fix/probe-live-sandbox-agent-version

37eaa03

Merge branch 'main' into fix/probe-live-sandbox-agent-version

b59404d

Merge branch 'main' into fix/probe-live-sandbox-agent-version

7f61b9d

wscurran added fix labels Jun 1, 2026

cv approved these changes Jun 1, 2026

View reviewed changes

cv merged commit 9248f5e into NVIDIA:main Jun 1, 2026
18 checks passed

miyoungc mentioned this pull request Jun 1, 2026

docs: refresh 0.0.56 release documentation #4618

Merged

coderabbitai Bot mentioned this pull request Jun 1, 2026

fix(sandbox): stop false Docker unhealthy and add paused-container hint (#4503, #4495) #4600

Merged

12 tasks

wscurran added area: cli Command line interface, flags, terminal UX, or output area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery labels Jun 3, 2026

wscurran added bug-fix PR fixes a bug or regression and removed NemoClaw CLI labels Jun 3, 2026

This was referenced Jun 6, 2026

test(cli): start splitting cli test suite #4895

Merged

test(cli): split oversized CLI suites #4898

Merged

fix(sandbox): recover Docker-driver sandbox from labels post-reboot (#4423) #5091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): probe live sandbox agent versions#4550

fix(cli): probe live sandbox agent versions#4550
cv merged 12 commits into
NVIDIA:mainfrom
ChunkyMonkey11:fix/probe-live-sandbox-agent-version

ChunkyMonkey11 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

ChunkyMonkey11 commented May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChunkyMonkey11 commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ChunkyMonkey11 commented May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChunkyMonkey11 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading