fix(upgrade): detect NemoClaw image drift in upgrade-sandboxes (#5026) by yimoj · Pull Request #5102 · NVIDIA/NemoClaw

yimoj · 2026-06-10T02:30:50Z

Summary

nemoclaw upgrade-sandboxes only compared the agent version inside a sandbox, so after a NemoClaw upgrade that left the bundled OpenClaw expected_version unchanged it printed All sandboxes are up to date. even though the NemoClaw image payload (scripts, patches, policies, Dockerfile, generated config) had changed. This adds a persisted NemoClaw build fingerprint so image/build drift is detected even when the agent version is unchanged.

Related Issue

Fixes #5026

Changes

Persist a NemoClaw build fingerprint (nemoclawVersion, from getVersion()) on managed sandbox images at create/rebuild time, stamped via getSandboxAgentRegistryFields so onboard.ts stays net-neutral.
classifyUpgradeableSandboxes now flags a sandbox when its agent version is stale or its recorded NemoClaw build differs from the running build, and reports the reason, e.g. OpenClaw v2026.5.27 unchanged; NemoClaw image v0.0.60 → v0.0.61.
Drift is asserted only on positive evidence — a recorded fingerprint that differs. See the safety note below.
The sandbox-reuse path no longer re-stamps the fingerprint, so reusing a sandbox after a NemoClaw upgrade cannot mask drift.
Tests: unit coverage for isNemoclawImageStale / classification, and a CLI regression test that a current-agent-version sandbox with a stale recorded fingerprint is detected (upgrade-sandboxes --check).

Design note — why missing fingerprints are not flagged (safe, forward-looking)

A missing fingerprint is intentionally treated as not drifted. It is ambiguous between a legacy NemoClaw-managed image (safe to rebuild) and a custom --from image, and auto-rebuilding a custom image whose Dockerfile path is unavailable would silently recreate it with the default image (data loss). Custom images are therefore left without a fingerprint, and pre-existing sandboxes opt into NemoClaw image-drift detection on their next rebuild. Every sandbox created or rebuilt on this release onward gets full drift detection.

Type of Change

Code change (feature, bug fix, or refactor)

Verification

Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Reproduced "nemoclaw upgrade-sandboxes" are not upgrading existed sandboxes after upgrading nemoclaw #5026 and verified the fix end-to-end through the real nemoclaw upgrade-sandboxes --check CLI (stale recorded fingerprint and a drifting build are flagged; matching/missing fingerprints are not; a custom --from image is never auto-rebuilt).
npm run typecheck:cli, Biome, and the affected Vitest suites pass (upgrade, gateway-drift-preflight, list-share-live-inference, onboard sandbox state handler, registry). The full cli suite could not be run to completion in the dev sandbox due to a host resource cap (OOM/SIGKILL); remaining failures observed there were confirmed environmental (missing plugin deps / live host gateway / single-fork cross-file state).

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

New Features
- Detect NemoClaw image/build drift during sandbox upgrades and report richer reasons when sandboxes are stale, even if agent version is unchanged.
Bug Fixes
- Preserve recorded NemoClaw build fingerprint during sandbox creation/registry updates to avoid accidental overwrites.
Tests
- Added regression tests covering NemoClaw image drift detection (issue 5026) and CLI output for drift cases.

coderabbitai · 2026-06-10T02:31:03Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a8687454-d756-4dea-b29b-228b2d900b80

📥 Commits

Reviewing files that changed from the base of the PR and between ed8ebe7 and 215c4bf.

📒 Files selected for processing (4)

src/lib/actions/gateway-drift-preflight.test.ts
src/lib/actions/upgrade-sandboxes.ts
src/lib/domain/maintenance/upgrade.test.ts
src/lib/domain/maintenance/upgrade.ts

✅ Files skipped from review due to trivial changes (1)

src/lib/actions/gateway-drift-preflight.test.ts

🚧 Files skipped from review as they are similar to previous changes (3)

src/lib/domain/maintenance/upgrade.test.ts
src/lib/actions/upgrade-sandboxes.ts
src/lib/domain/maintenance/upgrade.ts

📝 Walkthrough

Walkthrough

This PR adds NemoClaw build fingerprint tracking and image drift detection to the sandbox upgrade system. Existing sandboxes now record their NemoClaw build version and are correctly flagged for upgrade when the CLI's NemoClaw version differs, independent of agent version changes.

Changes

NemoClaw Image Drift Detection in Sandbox Upgrades

Layer / File(s)	Summary
Upgrade classification types and sandbox schema `src/lib/domain/maintenance/upgrade.ts`, `src/lib/state/registry.ts`	Introduces `UpgradeStaleReason` (`"agent-version" \| "image-drift"`), extends `UpgradeSandboxCandidate` with optional `reasons` and image fingerprint fields (`imageCurrent`/`imageExpected`), adds `ClassifyUpgradeOptions` with `currentNemoclawVersion`, and extends `SandboxEntry` schema to persist `nemoclawVersion`.
Image drift detection logic `src/lib/domain/maintenance/upgrade.ts`	Implements `isNemoclawImageStale` that flags drift only when both recorded and current fingerprints are known and different.
Upgrade classification with image drift `src/lib/domain/maintenance/upgrade.ts`	Updates `classifyUpgradeableSandboxes` to accept `nemoclawVersion` on sandboxes and an `options` argument, accumulates `reasons` for `agent-version` and/or `image-drift`, and conditionally includes image fingerprint fields in output.
NemoClaw build fingerprint capture and persistence `src/lib/onboard/sandbox-agent.ts`, `src/lib/state/registry.ts`	Adds `getNemoclawBuildFingerprint()` to safely resolve the running build fingerprint, updates `getSandboxAgentRegistryFields()` to return `nemoclawVersion` when agent version is known, and persists `nemoclawVersion` via `registerSandbox`.
Sandbox registry handler exclusion `src/lib/onboard/machine/handlers/sandbox.ts`	Avoids overwriting persisted `nemoclawVersion` by excluding it from `updateSandboxRegistry` when updating registry fields returned from the agent.
CLI action integration and output `src/lib/actions/upgrade-sandboxes.ts`	Resolves current NemoClaw version at runtime, passes `currentNemoclawVersion` into `classifyUpgradeableSandboxes`, adds `describeStaleUpgrade()` to format consolidated stale reasons, and prints that instead of only an agent-version transition.
Unit and integration test coverage `src/lib/domain/maintenance/upgrade.test.ts`, `src/lib/actions/gateway-drift-preflight.test.ts`, `test/cli/list-share-live-inference.test.ts`	Adds unit tests for `isNemoclawImageStale` and richer `classifyUpgradeableSandboxes` outputs, updates gateway preflight test to expect `currentNemoclawVersion` in options, and adds a CLI regression test verifying image-drift is detected when the recorded `nemoclawVersion` is stale even if `agentVersion` matches the expected current value.

Sequence Diagram

sequenceDiagram
  participant CLI as upgrade-sandboxes
  participant GetVersion as getVersion()
  participant Classify as classifyUpgradeableSandboxes
  participant Drift as isNemoclawImageStale
  participant Output as describeStaleUpgrade

  CLI->>GetVersion: resolveCurrentNemoclawVersion()
  GetVersion-->>CLI: currentNemoclawVersion (string | null)
  CLI->>Classify: classifyUpgradeableSandboxes(sandboxes, { currentNemoclawVersion })
  Classify->>Drift: isNemoclawImageStale(recorded, current)
  Drift-->>Classify: driftDetected (boolean)
  Classify-->>CLI: candidates[] with reasons and optional imageCurrent/imageExpected
  CLI->>Output: describeStaleUpgrade(candidate)
  Output-->>CLI: human-readable reason
  CLI->>CLI: print stale sandbox listing

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

area: sandbox, bug-fix, v0.0.63

Suggested reviewers

cv

Poem

🐰 A tiny rabbit hops on cue,
It sniffs the builds both old and new,
When NemoClaw's image drifts away,
The rabbit flags it without delay,
"Time to upgrade!" — it says, with a chew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and concisely describes the main change: detecting NemoClaw image drift in the upgrade-sandboxes command, with a reference to the issue number.
Linked Issues check	✅ Passed	The code changes fully address issue `#5026` by implementing NemoClaw build fingerprint persistence and detection logic to flag image drift during upgrade checks.
Out of Scope Changes check	✅ Passed	All changes are directly related to the objectives of detecting NemoClaw image drift; no unrelated modifications were introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

`upgrade-sandboxes` only compared the agent version inside a sandbox, so after a NemoClaw upgrade that left the bundled OpenClaw `expected_version` unchanged it printed "All sandboxes are up to date" even though the NemoClaw image payload (scripts, patches, policies, Dockerfile, generated config) had changed and the sandbox needed a rebuild. Persist a NemoClaw build fingerprint (`nemoclawVersion`) on managed sandbox images at create/rebuild time and compare it against the running NemoClaw build during classification. A sandbox is now flagged for rebuild when its agent version is stale OR its recorded NemoClaw build differs from the running one, with a clear reason, e.g. `OpenClaw v2026.5.27 unchanged; NemoClaw image v0.0.60 -> v0.0.61`. Drift is asserted only on positive evidence — a recorded fingerprint that differs. A missing fingerprint is never flagged: it is ambiguous between a legacy managed image and a custom `--from` image, and auto-rebuilding a custom image whose Dockerfile path is unavailable would silently recreate it with the default image. Custom images are therefore left without a fingerprint, and legacy sandboxes opt into drift detection on their next rebuild. The sandbox-reuse path no longer re-stamps the fingerprint, so reusing a sandbox after a NemoClaw upgrade cannot mask drift. The fingerprint is stamped via `getSandboxAgentRegistryFields`, keeping `onboard.ts` net-neutral. Fixes NVIDIA#5026 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

wscurran · 2026-06-10T14:57:47Z

✨
Related open issues:

#5026 "nemoclaw upgrade-sandboxes" are not upgrading existed sandboxes after upgrading nemoclaw

prekshivyas

Reviewed (code + security checklist). Persists a NemoClaw build fingerprint (nemoclawVersion) on managed sandbox entries at create/rebuild, and extends upgrade-sandboxes to flag sandboxes whose recorded build differs from the running build even when the agent version is unchanged (#5026).

✅ Approve. Defensively designed: drift is positive-evidence-only (missing fingerprint ⇒ not drifted), which correctly avoids the dangerous failure mode of auto-rebuilding a custom --from image onto the default. Verified the two safety claims against source — updateReusedSandboxMetadata never touches nemoclawVersion, and the supplementary update strips it so a reused image can't be re-stamped. getVersion() runs git describe via execFileSync (no shell). Security: all categories pass.

Non-blocking nits:

isNemoclawImageStale uses recorded !== current, so a downgrade also flags drift. Behavior is defensible — just tweak the JSDoc (says "older than running") to "differs from" to match.
describeStaleUpgrade returns "" on empty reasons (currently unreachable; robustness only).

Tests adequate: unit coverage for the stale predicate + classifier, plus a real CLI regression (--check with current agent + stale fingerprint).

## Summary - Add the v0.0.63 release-note section using the published development note as source context. - Update source docs for sandbox recovery, OpenClaw config restore safety, managed vLLM selection, Slack Socket Mode conflict handling, and host diagnostics. - Refresh generated `nemoclaw-user-*` skills from the updated Fern MDX docs. - Update the release-doc refresh skill so post-release docs for version `n` look up the matching announcement discussion and use the `n+1` patch release label. - Fix CLI/docs parity by avoiding a `--from <Dockerfile>` flag mention inside the `upgrade-sandboxes` command section. ## Source summary - #5034 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document safer stale-sandbox recovery through `rebuild --yes` before recreating from scratch. - #5091 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Docker-driver post-reboot recovery from OpenShell container labels. - #5101, #5174, #5177 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/about/release-notes.mdx`: Document OpenClaw `openclaw.json` preservation, merge behavior, and fail-safe restore handling. - #5102 -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/about/release-notes.mdx`: Document `upgrade-sandboxes` image-fingerprint drift detection. - #4201 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document the installer diagnostic for unexpected Docker daemon access outside the `docker` group. - #5038 -> `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/about/release-notes.mdx`: Document the interactive managed-vLLM model picker and non-interactive override behavior. - #5040, #5041 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Ollama auth-proxy recovery and host DNS preflight diagnostics. - #4986, #5039 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/about/release-notes.mdx`: Document Slack validation and duplicate Slack Socket Mode sandbox handling. - #4981, #5168 -> `docs/about/release-notes.mdx`: Capture Hermes gateway secret-guard and wrapped-argv startup hardening in the release surface. - Follow-up -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`: Record the post-release docs workflow, discussion-announcement lookup, and next-patch release label rule. - Follow-up -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`: Reword custom Dockerfile sandbox text so CLI parity does not treat `--from` as an `upgrade-sandboxes` flag. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-cli` - Skip-term scan for `docs/.docs-skip` blocked terms across generated user skills  ## Summary by CodeRabbit * **Documentation** * Enhanced local inference setup with interactive model selection prompts and environment variable overrides * Improved sandbox upgrade detection using build fingerprints and version checks * Clarified configuration restore behavior preserving user settings during rebuild/restore * Added gateway authentication as fifth security layer * Expanded Slack messaging validation with live credential checking * Enhanced troubleshooting guidance for Docker access, DNS issues, and sandbox recovery * Updated release notes for v0.0.63 featuring sandbox recovery and inference improvements

yimoj force-pushed the fix/5026-upgrade-sandbox-image-fingerprint branch from ed8ebe7 to 215c4bf Compare June 10, 2026 02:35

yimoj added the v0.0.63 Release target label Jun 10, 2026

wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression labels Jun 10, 2026

prekshivyas self-assigned this Jun 10, 2026

prekshivyas approved these changes Jun 10, 2026

View reviewed changes

cv merged commit 79279b9 into NVIDIA:main Jun 10, 2026
34 checks passed

miyoungc mentioned this pull request Jun 11, 2026

docs: refresh v0.0.63 release docs #5244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(upgrade): detect NemoClaw image drift in upgrade-sandboxes (#5026)#5102

fix(upgrade): detect NemoClaw image drift in upgrade-sandboxes (#5026)#5102
cv merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/5026-upgrade-sandbox-image-fingerprint

yimoj commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

wscurran commented Jun 10, 2026

Uh oh!

prekshivyas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yimoj commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Design note — why missing fingerprints are not flagged (safe, forward-looking)

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

wscurran commented Jun 10, 2026

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yimoj commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading