fix(upgrade): pin base image via pull-then-inspect and rebuild stale sandboxes#1943
Conversation
📝 WalkthroughWalkthroughAfter onboarding, the installer may run Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Installer as "Install Script"
participant CLI as "nemoclaw CLI"
participant Manager as "upgradeSandboxes"
participant Registry as "Sandboxes Registry"
participant Openshell as "openshell"
participant Rebuilder as "sandboxRebuild"
participant Docker as "Docker Engine"
User->>Installer: run installer (post-onboard)
Installer->>CLI: "nemoclaw upgrade-sandboxes --auto"
CLI->>Manager: start upgrade-sandboxes
Manager->>Registry: list registered sandboxes
Manager->>Openshell: "openshell sandbox list"
Openshell-->>Manager: running sandbox names
Manager->>Manager: compute stale vs current agentVersion
loop each stale running sandbox
Manager->>Rebuilder: sandboxRebuild(name, ["--yes"], {throwOnError:true})
Rebuilder->>Docker: build/pull base image / create image
Docker-->>Rebuilder: build result / image digest
Rebuilder-->>Manager: success/failure
end
Manager-->>CLI: summary (counts)
CLI-->>User: output + exit
sequenceDiagram
participant createSandbox as "createSandbox()"
participant DockerPull as "docker pull"
participant DockerInspect as "docker inspect"
participant Patch as "patchStagedDockerfile()"
participant Filesystem as "Staged Dockerfile"
createSandbox->>DockerPull: pull ghcr.io/nvidia/nemoclaw/sandbox-base:latest
alt pull & inspect succeed
DockerPull-->>DockerInspect: inspect for RepoDigests
DockerInspect-->>createSandbox: repo digest (ghcr.io/...@sha256:...)
createSandbox->>Patch: patchStagedDockerfile(..., baseImageRef=ghcr.io/...@sha256:...)
Patch->>Filesystem: rewrite ARG BASE_IMAGE to `@sha256:...`
Filesystem-->>Patch: patched Dockerfile
else pull/inspect fails
createSandbox->>createSandbox: warn and fall back to :latest
createSandbox->>Patch: patchStagedDockerfile(..., baseImageRef=null)
Patch->>Filesystem: leave ARG BASE_IMAGE unchanged
end
Patch-->>createSandbox: staged Dockerfile ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
test/cli.test.ts (1)
2199-2313: Tighten stale/current assertions to prevent false positives.Both new tests assert expected substrings, but neither asserts the opposite state is absent. Adding negative assertions will make regressions much harder to slip through.
Suggested assertion hardening
expect(r.code).toBe(0); // Should report the stale sandbox with version info expect(r.out).toContain("my-agent"); expect(r.out).toContain("2026.3.11"); expect(r.out).toMatch(/stale|need upgrading/i); + expect(r.out).not.toMatch(/up to date/i); @@ const r = runWithEnv("upgrade-sandboxes --check 2>&1", { HOME: home, PATH: `${localBin}:${process.env.PATH || ""}`, }); expect(r.code).toBe(0); expect(r.out).toContain("up to date"); + expect(r.out).not.toMatch(/stale|need upgrading/i);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 2199 - 2313, The tests for "upgrade-sandboxes --check" should add negative assertions to avoid false positives: in the first test (the one expecting a stale sandbox) assert that r.out does NOT contain the "up to date" message (or equivalent string), and in the second test (the one expecting all-current) assert that r.out does NOT contain the stale sandbox indicators (e.g., the sandbox name "my-agent", the old version "2026.3.11", or the regex /stale|need upgrading/i). Locate the two specs by their it(...) descriptions and add these complementary expect(r.out).not.toContain / not.toMatch assertions using the existing r variable to harden the checks.test/onboard.test.ts (1)
4703-4728: Add direct behavior tests forpullAndResolveBaseImageDigest().Current checks are mostly structural/constant-based. Add runtime-path tests for digest resolution and GHCR-unreachable fallback to fully protect the regression fix.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.ts` around lines 4703 - 4728, The test suite only checks structure/constants and misses exercising pullAndResolveBaseImageDigest behavior; add new runtime tests that call pullAndResolveBaseImageDigest (and indirectly createSandbox if needed) to assert it returns a resolved image digest when GHCR is reachable and that it falls back to the sandbox-base registry or a local default when GHCR is unreachable/unresolvable; ensure tests stub/mock network/Docker registry calls (or inject a mock registry client) so you can simulate a successful digest resolution and a GHCR failure, and add assertions for the returned image string containing a digest and for the fallback branch, referencing pullAndResolveBaseImageDigest, createSandbox (if used), patchStagedDockerfile, and SANDBOX_BASE_IMAGE to locate relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 482-496: The current logic reads a single RepoDigests entry (via
runCapture into inspectOutput) and picks the first sha256, which can come from
any registry; modify the parsing after inspectOutput is set so you
iterate/filter the RepoDigests entries returned by docker inspect and select the
digest whose repository prefix exactly matches the expected SANDBOX_BASE_IMAGE
(i.e., "ghcr.io/nvidia/nemoclaw/sandbox-base"); then extract the sha256 from
that matched entry, build ref = `${SANDBOX_BASE_IMAGE}@${digest}`, and return as
before (keep function names/variables inspectOutput, runCapture,
SANDBOX_BASE_IMAGE, digest, ref).
In `@src/nemoclaw.ts`:
- Around line 1844-1848: The current handler around captureOpenshell(["sandbox",
"list"], { ignoreError: true }) (liveResult) and the subsequent rebuild loop
returns normally on failure, so the process exits with code 0 even when
OpenShell query or any sandbox rebuild fails; change these code paths to exit
with a non-zero status: when liveResult.status !== 0 call process.exit(1) (or
propagate an error) instead of returning, and in the sandbox rebuild logic
detect any failed rebuilds (e.g., track non-zero statuses from the rebuild
command or function used for rebuilding sandboxes) and call process.exit(1)
after finishing the loop if any rebuild failed so --auto scripts receive a
non-zero exit code; update both the captureOpenshell failure block and the
rebuild-result aggregation logic (referencing captureOpenshell, liveResult and
the rebuild loop/function) accordingly.
- Around line 1854-1864: The loop over sandboxes currently only pushes entries
when versionCheck.isStale is true, but checkAgentVersion() can return
detectionMethod: "unavailable" (no cached agentVersion) and isStale: false;
update the condition to treat detectionMethod === "unavailable" as needing
attention. Change the if in the for-loop that references
sandboxVersion.checkAgentVersion(sb.name) to push the sandbox when
versionCheck.isStale || versionCheck.detectionMethod === "unavailable", and
populate the stale entry using versionCheck.sandboxVersion (or a fallback like
"unknown") and versionCheck.expectedVersion so legacy/stopped sandboxes appear
in the report.
---
Nitpick comments:
In `@test/cli.test.ts`:
- Around line 2199-2313: The tests for "upgrade-sandboxes --check" should add
negative assertions to avoid false positives: in the first test (the one
expecting a stale sandbox) assert that r.out does NOT contain the "up to date"
message (or equivalent string), and in the second test (the one expecting
all-current) assert that r.out does NOT contain the stale sandbox indicators
(e.g., the sandbox name "my-agent", the old version "2026.3.11", or the regex
/stale|need upgrading/i). Locate the two specs by their it(...) descriptions and
add these complementary expect(r.out).not.toContain / not.toMatch assertions
using the existing r variable to harden the checks.
In `@test/onboard.test.ts`:
- Around line 4703-4728: The test suite only checks structure/constants and
misses exercising pullAndResolveBaseImageDigest behavior; add new runtime tests
that call pullAndResolveBaseImageDigest (and indirectly createSandbox if needed)
to assert it returns a resolved image digest when GHCR is reachable and that it
falls back to the sandbox-base registry or a local default when GHCR is
unreachable/unresolvable; ensure tests stub/mock network/Docker registry calls
(or inject a mock registry client) so you can simulate a successful digest
resolution and a GHCR failure, and add assertions for the returned image string
containing a digest and for the fallback branch, referencing
pullAndResolveBaseImageDigest, createSandbox (if used), patchStagedDockerfile,
and SANDBOX_BASE_IMAGE to locate relevant code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: bf8a27d9-ac86-4345-aca4-478be0bbdbcc
📒 Files selected for processing (5)
scripts/install.shsrc/lib/onboard.tssrc/nemoclaw.tstest/cli.test.tstest/onboard.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/test-upgrade-stale-sandbox.sh (2)
169-175: Wait loop lacks explicit timeout failure.Both wait loops (lines 105-110 and 170-175) iterate 30 times with 5-second sleeps but don't explicitly fail if the sandbox never becomes ready. The subsequent
grep -qcheck (line 111-112) will catch the failure, but having an explicit timeout message would improve debuggability.💡 Optional: Add explicit timeout handling
# Wait for sandbox to be ready after rebuild +ready=0 for _i in $(seq 1 30); do if openshell sandbox list 2>/dev/null | grep -q "${SANDBOX_NAME}.*Ready\|${SANDBOX_NAME}.*Running"; then + ready=1 break fi sleep 5 done +[ "$ready" = "1" ] || fail "Sandbox ${SANDBOX_NAME} did not become ready within 150s after rebuild"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 169 - 175, The wait loops that run `for _i in $(seq 1 30); do ... openshell sandbox list ... | grep -q "${SANDBOX_NAME}.*Ready\|${SANDBOX_NAME}.*Running"; done` lack explicit timeout failure handling; after the loop completes without a break add an explicit check for the sandbox state and emit a clear error message to stderr (including ${SANDBOX_NAME}) and exit non‑zero (e.g., `echo "Timed out waiting for ${SANDBOX_NAME} to be Ready/Running" >&2; exit 1`). Apply this to both occurrences of the loop so the script fails fast with a descriptive message when the sandbox never becomes ready.
119-127: Shell variable not expanded inside Python heredoc.The
${REGISTRY_FILE}and${SANDBOX_NAME}variables are inside single quotes of the Python string, but they're being expanded by the shell because the outer quotes are double quotes. This actually works correctly here because the shell expands them before Python sees them. However, this pattern can be confusing and error-prone.Additionally, consider adding a timeout failure check after the wait loop (lines 105-110) to fail explicitly if the sandbox never becomes ready, rather than continuing silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 119 - 127, Replace the inline Python invocation that embeds shell vars into the quoted Python string (producing OLD_AGENT_VERSION) with a safer pattern that passes ${REGISTRY_FILE} and ${SANDBOX_NAME} as explicit arguments or reads them from the environment (e.g., call python3 with "$REGISTRY_FILE" "$SANDBOX_NAME" and use sys.argv or os.environ inside Python) so shell expansion isn't implicitly relied upon; also add an explicit timeout/failure check immediately after the sandbox readiness wait loop (the loop that waits for the sandbox to become ready) so the script exits non‑zero and logs a clear error if the sandbox never becomes ready instead of continuing silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 169-175: The wait loops that run `for _i in $(seq 1 30); do ...
openshell sandbox list ... | grep -q
"${SANDBOX_NAME}.*Ready\|${SANDBOX_NAME}.*Running"; done` lack explicit timeout
failure handling; after the loop completes without a break add an explicit check
for the sandbox state and emit a clear error message to stderr (including
${SANDBOX_NAME}) and exit non‑zero (e.g., `echo "Timed out waiting for
${SANDBOX_NAME} to be Ready/Running" >&2; exit 1`). Apply this to both
occurrences of the loop so the script fails fast with a descriptive message when
the sandbox never becomes ready.
- Around line 119-127: Replace the inline Python invocation that embeds shell
vars into the quoted Python string (producing OLD_AGENT_VERSION) with a safer
pattern that passes ${REGISTRY_FILE} and ${SANDBOX_NAME} as explicit arguments
or reads them from the environment (e.g., call python3 with "$REGISTRY_FILE"
"$SANDBOX_NAME" and use sys.argv or os.environ inside Python) so shell expansion
isn't implicitly relied upon; also add an explicit timeout/failure check
immediately after the sandbox readiness wait loop (the loop that waits for the
sandbox to become ready) so the script exits non‑zero and logs a clear error if
the sandbox never becomes ready instead of continuing silently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b752cc4a-56f3-4eeb-9753-42b196f08e2f
📒 Files selected for processing (2)
.github/workflows/nightly-e2e.yamltest/e2e/test-upgrade-stale-sandbox.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/e2e/test-upgrade-stale-sandbox.sh (1)
119-127: Registry JSON lookup is brittle forsandboxesshape.The Python snippet assumes
sandboxesis a key-addressable object (.get(name)), but runtime code paths consume it as an iterable list of sandbox objects. This can degrade diagnostics toerror: ...strings.Proposed fix
OLD_AGENT_VERSION=$(python3 -c " import json, sys +name = '${SANDBOX_NAME}' try: d = json.load(open('${REGISTRY_FILE}')) - sb = d.get('sandboxes', {}).get('${SANDBOX_NAME}', {}) + sandboxes = d.get('sandboxes', {}) + if isinstance(sandboxes, list): + sb = next((s for s in sandboxes if s.get('name') == name), {}) + elif isinstance(sandboxes, dict): + sb = sandboxes.get(name, {}) + else: + sb = {} print(sb.get('agentVersion', 'unknown')) except Exception as e: print(f'error: {e}') " 2>/dev/null || echo "unknown")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 119 - 127, The Python snippet that sets OLD_AGENT_VERSION treats the registry's "sandboxes" as a dict and can emit "error: ..." into OLD_AGENT_VERSION; update the lookup to handle "sandboxes" as a list by iterating d.get('sandboxes', []) and finding the object whose "name" equals the SANDBOX_NAME, then print its "agentVersion" (or "unknown" if missing); also ensure any exception prints nothing useful to STDOUT (or print "unknown") so OLD_AGENT_VERSION never captures raw exception strings from REGISTRY_FILE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 115-116: The version probe commands for OLD_OPENCLAW_VERSION and
NEW_OPENCLAW_VERSION currently use "|| true" which hides failures; remove the
"|| true" from the openshell sandbox exec calls (the ones assigning
OLD_OPENCLAW_VERSION and NEW_OPENCLAW_VERSION), capture both output and the
command exit status, and if the exit status is non‑zero log an error (including
the captured output) and exit non‑zero so the subsequent comparison (the version
equality check) is reliable; ensure you update any surrounding logic that
assumed a silent failure so the script fails fast on openshell sandbox exec
errors.
- Around line 190-197: The test currently treats a non-clean result from
`nemoclaw upgrade-sandboxes --check` as non-fatal (using `info`) so the E2E can
pass while stale sandboxes remain; change the behavior so when `RECHECK_OUTPUT`
does not contain "up to date" (the grep check around `RECHECK_OUTPUT`), the
script fails the test (use the test harness `fail` function or `exit 1`) and
include `RECHECK_OUTPUT` in the failure message to aid debugging; update the
block that calls `upgrade-sandboxes --check` and the conditional that currently
calls `pass`/`info` to instead call `pass` on success and `fail` (with output)
on any other result.
---
Nitpick comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 119-127: The Python snippet that sets OLD_AGENT_VERSION treats the
registry's "sandboxes" as a dict and can emit "error: ..." into
OLD_AGENT_VERSION; update the lookup to handle "sandboxes" as a list by
iterating d.get('sandboxes', []) and finding the object whose "name" equals the
SANDBOX_NAME, then print its "agentVersion" (or "unknown" if missing); also
ensure any exception prints nothing useful to STDOUT (or print "unknown") so
OLD_AGENT_VERSION never captures raw exception strings from REGISTRY_FILE.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1068226c-08ea-41e4-bef7-7bead4747463
📒 Files selected for processing (1)
test/e2e/test-upgrade-stale-sandbox.sh
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
test/e2e/test-upgrade-stale-sandbox.sh (2)
120-121:⚠️ Potential issue | 🟠 MajorFail fast when OpenClaw version probes fail.
Line 120 and Line 182 still use
|| true, which masksopenshell sandbox execfailures and can make the version comparison unreliable.Proposed fix
-OLD_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1 || true) +OLD_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1) \ + || fail "Failed to read old sandbox OpenClaw version: ${OLD_OPENCLAW_VERSION}" @@ -NEW_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1 || true) +NEW_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1) \ + || fail "Failed to read new sandbox OpenClaw version after rebuild: ${NEW_OPENCLAW_VERSION}"Also applies to: 182-183
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 120 - 121, The version probe currently masks failures by appending "|| true" to the "openshell sandbox exec --name \"${SANDBOX_NAME}\" -- openclaw --version" calls (used to populate OLD_OPENCLAW_VERSION and the later NEW_OPENCLAW_VERSION), which can produce unreliable comparisons; remove "|| true" and instead immediately fail fast when the command exits non-zero: run the openshell command without "|| true", check its exit status, and if it fails log an error including the SANDBOX_NAME and the raw command output and exit the script with a non-zero status so the test never compares invalid version strings.
198-202:⚠️ Potential issue | 🟠 MajorMake Phase 6 a hard failure when stale state remains.
Line 201 treats a non-clean
upgrade-sandboxes --checkresult as non-fatal, so this E2E can still report success incorrectly.Proposed fix
if echo "$RECHECK_OUTPUT" | grep -qi "up to date"; then pass "Phase 6: upgrade-sandboxes --check reports all up to date after rebuild" else - info "Phase 6: Sandbox may still appear stale (non-fatal)" + fail "Phase 6: upgrade-sandboxes --check did not report 'up to date' after rebuild. Output: ${RECHECK_OUTPUT}" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 198 - 202, Currently the Phase 6 check inspects RECHECK_OUTPUT with grep -qi "up to date" and calls pass(...) on success but only logs info(...) on failure; change the failure branch to make the test a hard failure by invoking the test-failure mechanism (replace info "Phase 6: Sandbox may still appear stale (non-fatal)" with fail "Phase 6: upgrade-sandboxes --check reports stale state after rebuild" or otherwise exit with non-zero status), keeping the same condition that inspects RECHECK_OUTPUT and the existing pass(...) call on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Line 48: The diagnostics pipeline invoked in the fail() path (the echo command
that runs docker images | grep ...) can fail the whole pipeline under set -o
pipefail when grep finds no matches; change that line so the pipeline cannot
return non-zero by appending "|| true" to the pipeline (or capture output into a
variable and echo that), e.g. modify the echo invocation that contains "docker
images --format ... | grep -i 'sandbox\|nemoclaw\|openclaw' | head -10" to
ensure it always succeeds (pipeline || true) so fail() diagnostics never get
short-circuited by pipefail.
---
Duplicate comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 120-121: The version probe currently masks failures by appending
"|| true" to the "openshell sandbox exec --name \"${SANDBOX_NAME}\" -- openclaw
--version" calls (used to populate OLD_OPENCLAW_VERSION and the later
NEW_OPENCLAW_VERSION), which can produce unreliable comparisons; remove "||
true" and instead immediately fail fast when the command exits non-zero: run the
openshell command without "|| true", check its exit status, and if it fails log
an error including the SANDBOX_NAME and the raw command output and exit the
script with a non-zero status so the test never compares invalid version
strings.
- Around line 198-202: Currently the Phase 6 check inspects RECHECK_OUTPUT with
grep -qi "up to date" and calls pass(...) on success but only logs info(...) on
failure; change the failure branch to make the test a hard failure by invoking
the test-failure mechanism (replace info "Phase 6: Sandbox may still appear
stale (non-fatal)" with fail "Phase 6: upgrade-sandboxes --check reports stale
state after rebuild" or otherwise exit with non-zero status), keeping the same
condition that inspects RECHECK_OUTPUT and the existing pass(...) call on
success.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6bedb28e-302d-43e0-a79e-a8675e5305e7
📒 Files selected for processing (1)
test/e2e/test-upgrade-stale-sandbox.sh
There was a problem hiding this comment.
♻️ Duplicate comments (3)
test/e2e/test-upgrade-stale-sandbox.sh (3)
201-208:⚠️ Potential issue | 🟠 MajorPhase 6 should fail when
--checkis not clean.At Lines 201-208, the test can pass even if stale sandboxes remain because the non-clean result is logged as info instead of failing.
Suggested fix
if echo "$RECHECK_OUTPUT" | grep -qi "up to date"; then pass "Phase 6: upgrade-sandboxes --check reports all up to date after rebuild" else - info "Phase 6: Sandbox may still appear stale (non-fatal)" + fail "Phase 6: upgrade-sandboxes --check did not report 'up to date' after rebuild. Output: ${RECHECK_OUTPUT}" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 201 - 208, The test currently treats a non-clean "nemoclaw upgrade-sandboxes --check" result as non-fatal by calling info, so Phase 6 can pass incorrectly; change the else branch that currently calls info to instead fail the test (use the existing fail function or exit with non-zero) so that when RECHECK_OUTPUT does not contain "up to date" the script fails (update the block referencing RECHECK_OUTPUT, the grep conditional, and replace info "Phase 6: Sandbox may still appear stale (non-fatal)" with a failing call such as fail "Phase 6: upgrade-sandboxes --check reports stale sandboxes" or exit 1).
48-48:⚠️ Potential issue | 🟡 MinorGuard diagnostics pipeline from
pipefailshort-circuit.At Line 48,
grepcan return non-zero when no matches exist, which can interrupt diagnostics infail()under strict mode.Suggested fix
- echo -e "${YELLOW}[DIAG]${NC} Docker images: $(docker images --format '{{.Repository}}:{{.Tag}} {{.ID}}' | grep -i 'sandbox\|nemoclaw\|openclaw' | head -10)" >&2 + echo -e "${YELLOW}[DIAG]${NC} Docker images: $(docker images --format '{{.Repository}}:{{.Tag}} {{.ID}}' | grep -Ei 'sandbox|nemoclaw|openclaw' | head -10 || true)" >&2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` at line 48, The diagnostics pipeline using "docker images ... | grep -i 'sandbox\|nemoclaw\|openclaw' | head -10" can return non-zero and short-circuit under set -o pipefail (affecting the fail() diagnostics); update that echo/substitution to guard the pipeline by making the grep failure non-fatal (for example append "|| true" to the grep or to the whole substitution) so the echo command always succeeds; locate the line containing the docker images + grep pipeline in test/e2e/test-upgrade-stale-sandbox.sh and add the "|| true" guard to the pipeline expression.
120-121:⚠️ Potential issue | 🟠 MajorDo not mask OpenClaw version probe failures.
Lines 120 and 188 use
|| true, which can hideopenshell sandbox execfailures and make the before/after comparison unreliable.Suggested fix
-OLD_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1 || true) +OLD_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1) \ + || fail "Failed to read OpenClaw version from old sandbox" @@ -NEW_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1 || true) +NEW_OPENCLAW_VERSION=$(openshell sandbox exec --name "${SANDBOX_NAME}" -- openclaw --version 2>&1) \ + || fail "Failed to read OpenClaw version after rebuild"Also applies to: 188-189
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 120 - 121, The test currently masks failures by appending "|| true" to the OpenClaw probe calls (the assignment to OLD_OPENCLAW_VERSION and the later probe around lines referenced), which can hide command failures; remove the "|| true" from the openshell sandbox exec invocations (the commands that set OLD_OPENCLAW_VERSION and the later NEW_OPENCLAW_VERSION) and instead explicitly check the command exit status or the captured variable (e.g. if [ -z "$OLD_OPENCLAW_VERSION" ] || [ $? -ne 0 ]; then error "openshell sandbox exec failed for SANDBOX_NAME"; exit 1; fi) so any probe failure fails the test and the before/after comparison remains reliable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 201-208: The test currently treats a non-clean "nemoclaw
upgrade-sandboxes --check" result as non-fatal by calling info, so Phase 6 can
pass incorrectly; change the else branch that currently calls info to instead
fail the test (use the existing fail function or exit with non-zero) so that
when RECHECK_OUTPUT does not contain "up to date" the script fails (update the
block referencing RECHECK_OUTPUT, the grep conditional, and replace info "Phase
6: Sandbox may still appear stale (non-fatal)" with a failing call such as fail
"Phase 6: upgrade-sandboxes --check reports stale sandboxes" or exit 1).
- Line 48: The diagnostics pipeline using "docker images ... | grep -i
'sandbox\|nemoclaw\|openclaw' | head -10" can return non-zero and short-circuit
under set -o pipefail (affecting the fail() diagnostics); update that
echo/substitution to guard the pipeline by making the grep failure non-fatal
(for example append "|| true" to the grep or to the whole substitution) so the
echo command always succeeds; locate the line containing the docker images +
grep pipeline in test/e2e/test-upgrade-stale-sandbox.sh and add the "|| true"
guard to the pipeline expression.
- Around line 120-121: The test currently masks failures by appending "|| true"
to the OpenClaw probe calls (the assignment to OLD_OPENCLAW_VERSION and the
later probe around lines referenced), which can hide command failures; remove
the "|| true" from the openshell sandbox exec invocations (the commands that set
OLD_OPENCLAW_VERSION and the later NEW_OPENCLAW_VERSION) and instead explicitly
check the command exit status or the captured variable (e.g. if [ -z
"$OLD_OPENCLAW_VERSION" ] || [ $? -ne 0 ]; then error "openshell sandbox exec
failed for SANDBOX_NAME"; exit 1; fi) so any probe failure fails the test and
the before/after comparison remains reliable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: c549fd91-c296-4557-8096-6a880c7c3c3b
📒 Files selected for processing (1)
test/e2e/test-upgrade-stale-sandbox.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/nemoclaw.ts (2)
1612-1618: Preserve the exit code on the throw path.
codeis accepted here but discarded whenthrowOnErroris enabled, so batch callers only get a genericError. Attaching it to the thrown error also removes the unused-parameter guideline violation.Suggested change
const bail = opts.throwOnError ? (msg, code = 1) => { - throw new Error(msg); + const error = new Error(msg); + error.exitCode = code; + throw error; } : (_msg, code = 1) => process.exit(code);As per coding guidelines, "Prefix unused variables with underscore (
_) in JavaScript/TypeScript".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1612 - 1618, The bail function currently discards the exit code when opts.throwOnError is true; modify the throw branch (the function assigned to bail when opts.throwOnError is truthy) to include the provided code on the thrown error so callers (e.g., upgradeSandboxes) can inspect the exit code; specifically, in the throw path of bail, construct an Error with msg and attach the code (e.g., set error.code = code or use Object.assign) before throwing so the code parameter is used and preserved.
1832-1938: SplitupgradeSandboxes()into smaller helpers.This function now handles parsing, classification, reporting, prompting, and execution in one block. It is already at/over the repo complexity cap, so extracting helpers here would make the failure paths much easier to maintain.
As per coding guidelines, "Limit cyclomatic complexity to 20 in JavaScript/TypeScript files, with target of 15".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1832 - 1938, The upgradeSandboxes function is too large—extract smaller helpers to lower cyclomatic complexity and improve maintainability: create (1) a gatherSandboxes helper that wraps registry.listSandboxes, calls captureOpenshell(["sandbox","list"]) and parseLiveSandboxNames, and returns { sandboxes, liveNames } and any error state; (2) a classifySandboxes(sandboxes, liveNames) that uses sandboxVersion.checkAgentVersion to build and return { stale, unknown }; (3) a reportSandboxStatus(stale, unknown, checkOnly) that prints the status/summary lines currently in the function; and (4) a rebuildStaleSandboxes(rebuildable, stopped, skipConfirm) that encapsulates prompting via askPrompt and calling sandboxRebuild, counting rebuilt/failed and returning the result (and throwing or returning an exit code). Then make upgradeSandboxes orchestrate these helpers, preserving existing calls/behavior to sandboxRebuild, askPrompt, sandboxVersion.checkAgentVersion, captureOpenshell, parseLiveSandboxNames, and registry.listSandboxes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nemoclaw.ts`:
- Around line 1894-1912: The current logic ignores `unknown` sandboxes when not
in `--check` mode, causing a no-op when all sandboxes are `unknown`; update the
rebuild selection so `unknown` running sandboxes are considered for rebuild:
when computing `rebuildable` (currently `const rebuildable = stale.filter((s) =>
s.running)`), include `unknown.filter((s) => s.running)` as well (or merge
`unknown` into `stale` beforehand), and adjust the `stopped` computation to
include stopped `unknown` entries so the `Skipping ... stopped sandbox(es)` and
the "No running stale sandboxes to rebuild." message reflect both `stale` and
`unknown` states (refer to variables `checkOnly`, `stale`, `unknown`,
`rebuildable`, and `stopped`).
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 101-114: The script currently can exit early on a failing docker
build (due to set -e) and never restore the original BLUEPRINT; modify the build
block so cleanup always runs: either add a trap that restores BLUEPRINT from
BLUEPRINT_BAK (use trap 'mv "${BLUEPRINT_BAK}" "${BLUEPRINT}"' EXIT) before
running the docker build, or wrap the docker build in an if ! docker build ...;
then BUILD_RC=$?; mv "${BLUEPRINT_BAK}" "${BLUEPRINT}"; fail "Failed to build
old base image"; fi flow; ensure BUILD_RC is still set and the final mv cleanup
happens in all paths referencing BLUEPRINT, BLUEPRINT_BAK, BUILD_RC and the
docker build invocation.
---
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 1612-1618: The bail function currently discards the exit code when
opts.throwOnError is true; modify the throw branch (the function assigned to
bail when opts.throwOnError is truthy) to include the provided code on the
thrown error so callers (e.g., upgradeSandboxes) can inspect the exit code;
specifically, in the throw path of bail, construct an Error with msg and attach
the code (e.g., set error.code = code or use Object.assign) before throwing so
the code parameter is used and preserved.
- Around line 1832-1938: The upgradeSandboxes function is too large—extract
smaller helpers to lower cyclomatic complexity and improve maintainability:
create (1) a gatherSandboxes helper that wraps registry.listSandboxes, calls
captureOpenshell(["sandbox","list"]) and parseLiveSandboxNames, and returns {
sandboxes, liveNames } and any error state; (2) a classifySandboxes(sandboxes,
liveNames) that uses sandboxVersion.checkAgentVersion to build and return {
stale, unknown }; (3) a reportSandboxStatus(stale, unknown, checkOnly) that
prints the status/summary lines currently in the function; and (4) a
rebuildStaleSandboxes(rebuildable, stopped, skipConfirm) that encapsulates
prompting via askPrompt and calling sandboxRebuild, counting rebuilt/failed and
returning the result (and throwing or returning an exit code). Then make
upgradeSandboxes orchestrate these helpers, preserving existing calls/behavior
to sandboxRebuild, askPrompt, sandboxVersion.checkAgentVersion,
captureOpenshell, parseLiveSandboxNames, and registry.listSandboxes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 820a0dcd-5684-496d-ae2f-200784b365aa
📒 Files selected for processing (5)
.github/workflows/nightly-e2e.yaml.gitleaksignoresrc/lib/onboard.tssrc/nemoclaw.tstest/e2e/test-upgrade-stale-sandbox.sh
✅ Files skipped from review due to trivial changes (1)
- .gitleaksignore
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/nightly-e2e.yaml
…sandboxes (#1904) Re-implements the digest pinning and upgrade-sandboxes features reverted in #1938. The original PR (#1937) broke all e2e tests because it read a digest from blueprint.yaml (belongs to ghcr.io/nvidia/openshell-community) and applied it to ghcr.io/nvidia/nemoclaw/sandbox-base — a different registry. Docker digests are repo-specific, so every pull returned "manifest unknown". This fix never reads blueprint.yaml for the base image digest. Instead: - pullAndResolveBaseImageDigest() pulls sandbox-base:latest from GHCR - docker inspect extracts the actual repo digest - patchStagedDockerfile() pins ARG BASE_IMAGE to the inspected digest The digest is self-consistent by construction — it always comes from the same registry we pin to. Falls back to unpinned :latest when GHCR is unreachable (offline/firewall users). Also re-adds the upgrade-sandboxes command with fixes from code review: - --auto flag for non-interactive installer contexts - sandbox list failure handling before classifying sandboxes - throwOnError option for sandboxRebuild to prevent process.exit in loops - RepoDigests filtered by SANDBOX_BASE_IMAGE prefix (ordering not guaranteed) - Exit non-zero from upgrade-sandboxes when sandbox list or rebuilds fail - Report sandboxes with unavailable version detection Closes #1904 Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Adds test-upgrade-stale-sandbox.sh that reproduces the reporter's exact scenario: install current NemoClaw, create a sandbox with an older OpenClaw, run upgrade-sandboxes --check to detect staleness, rebuild, and verify the sandbox runs the new version. Adds the upgrade-stale-sandbox-e2e job and shields-config-e2e job to the nightly workflow, and wires them into the failure notification. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
73e3644 to
1aa450a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
test/e2e/test-upgrade-stale-sandbox.sh (1)
101-114:⚠️ Potential issue | 🟡 MinorBlueprint restore may be skipped if
docker buildfails underset -e.With
set -euo pipefail, a failingdocker buildat lines 106-110 will exit the script beforeBUILD_RC=$?(line 111) and before themvrestore (line 113). This leaves the repository with a modifiedblueprint.yaml.Consider using a trap to ensure cleanup:
Suggested fix
cp "${BLUEPRINT}" "${BLUEPRINT_BAK}" +restore_blueprint() { + [ -f "${BLUEPRINT_BAK}" ] && mv "${BLUEPRINT_BAK}" "${BLUEPRINT}" +} +trap restore_blueprint EXIT + sed "s/min_openclaw_version:.*/min_openclaw_version: \"${OLD_OPENCLAW_VERSION}\"/" "${BLUEPRINT}" >"${BLUEPRINT}.tmp" mv "${BLUEPRINT}.tmp" "${BLUEPRINT}" -docker build \ +if ! docker build \ --build-arg "OPENCLAW_VERSION=${OLD_OPENCLAW_VERSION}" \ -f "${REPO_ROOT}/Dockerfile.base" \ -t "${OLD_BASE_TAG}" \ - "${REPO_ROOT}" -BUILD_RC=$? - -mv "${BLUEPRINT_BAK}" "${BLUEPRINT}" -[ "$BUILD_RC" -eq 0 ] || fail "Failed to build old base image" + "${REPO_ROOT}"; then + fail "Failed to build old base image" +fi + +restore_blueprint +trap - EXIT🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-upgrade-stale-sandbox.sh` around lines 101 - 114, The script can exit before restoring BLUEPRINT if docker build fails under set -e; add a cleanup trap that always restores the original file (move BLUEPRINT_BAK back to BLUEPRINT) and captures the exit status so BUILD_RC is set correctly after the docker build attempt; specifically wrap the docker build block so that either a trap (using trap 'mv "${BLUEPRINT_BAK}" "${BLUEPRINT}"' EXIT) or a local cleanup function is installed before running docker build, ensure BUILD_RC is assigned from the docker build exit code (or from $? immediately after the build) and the trap is cleared/reset after successful restore, referencing variables and steps BLUEPRINT, BLUEPRINT_BAK, docker build, and BUILD_RC to locate where to add the trap and restore logic.
🧹 Nitpick comments (2)
test/onboard.test.ts (1)
4703-4714: Add a behavioral test forpullAndResolveBaseImageDigest, not just constant shape.This check validates
SANDBOX_BASE_IMAGE, but it doesn’t exercise the helper’s pull+inspect flow or itsnullfallback path. A focused unit test forpullAndResolveBaseImageDigest()would better protect the core fix path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.ts` around lines 4703 - 4714, Add a focused behavioral unit test that calls pullAndResolveBaseImageDigest (not just checking SANDBOX_BASE_IMAGE string) and verifies both the successful pull+inspect path and the null fallback; mock/stub the Docker interactions (the pull and image inspect/resolve used inside pullAndResolveBaseImageDigest) to simulate a resolved digest from the nemoclaw/sandbox-base registry and assert the returned digest and that the registry host in the pulled image matches SANDBOX_BASE_IMAGE, then add a second sub-case where the pull/inspect mocks fail/throw and assert pullAndResolveBaseImageDigest returns null to cover the fallback path.src/nemoclaw.ts (1)
1894-1903: Minor UX improvement: Clarify rebuild guidance when unknown sandboxes are present.Line 1901 says "Run
nemoclaw upgrade-sandboxesto rebuild them" even when the only sandboxes reported areunknown(version could not be checked). Sinceunknownsandboxes are not included in the rebuild path (onlystaleare), this message could be misleading.Consider conditionally showing this message only when
stale.length > 0:if (checkOnly) { if (stale.length > 0) console.log(` ${stale.length} sandbox(es) need upgrading.`); if (unknown.length > 0) { console.log( ` ${unknown.length} sandbox(es) could not be version-checked; start them and rerun, or rebuild manually.`, ); } - console.log(" Run `nemoclaw upgrade-sandboxes` to rebuild them."); + if (stale.length > 0) { + console.log(" Run `nemoclaw upgrade-sandboxes` to rebuild them."); + } return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1894 - 1903, The message telling users to run `nemoclaw upgrade-sandboxes` is shown even when only unknown sandboxes exist; update the checkOnly branch in the function containing variables checkOnly, stale, and unknown so that the "Run `nemoclaw upgrade-sandboxes` to rebuild them." line is only printed when stale.length > 0 (i.e., when there are actually sandboxes eligible for upgrade); keep the existing unknown diagnostic and overall early return behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 553-554: The notify-on-failure aggregation is missing the
shields-config-e2e job—add "shields-config-e2e" into the needs array alongside
cloud-e2e, cloud-experimental-e2e, etc., and include the corresponding OR clause
in the if condition (needs.shields-config-e2e.result == 'failure') so a failure
in shields-config-e2e will trigger the failure path; update the needs list and
the if expression in the nightly-e2e workflow accordingly.
In `@scripts/install.sh`:
- Around line 1270-1273: The current block around the sandbox upgrade call
treats failures as non-fatal by using "|| warn", which lets critical upgrade
errors go unnoticed; change the logic in the block that checks _has_sandboxes
and command_exists nemoclaw so that when nemoclaw upgrade-sandboxes --auto fails
you log the failure (including stderr) via the existing error logging function
and exit with a non-zero status (e.g., call error or exit 1) instead of calling
warn; look for the symbols _has_sandboxes, command_exists, nemoclaw and the
invocation upgrade-sandboxes --auto to update the error handling accordingly.
In `@src/lib/onboard.ts`:
- Around line 2817-2825: The current code treats any falsy result from
pullAndResolveBaseImageDigest() as "use cached :latest"; change that so when
resolved is falsy you explicitly check for a local image named
`${SANDBOX_BASE_IMAGE}:${SANDBOX_BASE_TAG}` (e.g. via an
imageExistsLocally/inspectImage call or Docker daemon query) and only fall back
if that check returns true—otherwise fail early with a clear error/exit. Update
the block around pullAndResolveBaseImageDigest() and the resolved handling to
perform this local-image existence check and emit a fatal error (including the
image:tag) when neither pulling nor a local image is available. Ensure
references to pullAndResolveBaseImageDigest, resolved, and
SANDBOX_BASE_IMAGE/SANDBOX_BASE_TAG are used so the change is applied at the
correct location.
- Around line 1051-1056: The current replacement unconditionally overrides any
ARG BASE_IMAGE in the Dockerfile; change the logic in onboard.ts so before
replacing you parse the existing ARG BASE_IMAGE from the dockerfile (use the
same regex that finds /^ARG BASE_IMAGE=.*$/m), and only perform the replace with
baseImageRef when the extracted value already references the built-in image
(contains "ghcr.io/nvidia/nemoclaw/sandbox-base"); otherwise leave the ARG
untouched (alternatively, gate passing baseImageRef only for the built-in
Dockerfile path where pullAndResolveBaseImageDigest() is used). Ensure you still
use the same dockerfile and baseImageRef variables and preserve existing
behavior when the condition matches.
---
Duplicate comments:
In `@test/e2e/test-upgrade-stale-sandbox.sh`:
- Around line 101-114: The script can exit before restoring BLUEPRINT if docker
build fails under set -e; add a cleanup trap that always restores the original
file (move BLUEPRINT_BAK back to BLUEPRINT) and captures the exit status so
BUILD_RC is set correctly after the docker build attempt; specifically wrap the
docker build block so that either a trap (using trap 'mv "${BLUEPRINT_BAK}"
"${BLUEPRINT}"' EXIT) or a local cleanup function is installed before running
docker build, ensure BUILD_RC is assigned from the docker build exit code (or
from $? immediately after the build) and the trap is cleared/reset after
successful restore, referencing variables and steps BLUEPRINT, BLUEPRINT_BAK,
docker build, and BUILD_RC to locate where to add the trap and restore logic.
---
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 1894-1903: The message telling users to run `nemoclaw
upgrade-sandboxes` is shown even when only unknown sandboxes exist; update the
checkOnly branch in the function containing variables checkOnly, stale, and
unknown so that the "Run `nemoclaw upgrade-sandboxes` to rebuild them." line is
only printed when stale.length > 0 (i.e., when there are actually sandboxes
eligible for upgrade); keep the existing unknown diagnostic and overall early
return behavior unchanged.
In `@test/onboard.test.ts`:
- Around line 4703-4714: Add a focused behavioral unit test that calls
pullAndResolveBaseImageDigest (not just checking SANDBOX_BASE_IMAGE string) and
verifies both the successful pull+inspect path and the null fallback; mock/stub
the Docker interactions (the pull and image inspect/resolve used inside
pullAndResolveBaseImageDigest) to simulate a resolved digest from the
nemoclaw/sandbox-base registry and assert the returned digest and that the
registry host in the pulled image matches SANDBOX_BASE_IMAGE, then add a second
sub-case where the pull/inspect mocks fail/throw and assert
pullAndResolveBaseImageDigest returns null to cover the fallback path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a2da5716-9774-432e-8ebf-798f6a55cf79
📒 Files selected for processing (8)
.github/workflows/nightly-e2e.yaml.gitleaksignorescripts/install.shsrc/lib/onboard.tssrc/nemoclaw.tstest/cli.test.tstest/e2e/test-upgrade-stale-sandbox.shtest/onboard.test.ts
✅ Files skipped from review due to trivial changes (1)
- .gitleaksignore
Address two CodeRabbit findings: 1. patchStagedDockerfile now only rewrites ARG BASE_IMAGE when the current value already references ghcr.io/nvidia/nemoclaw/sandbox-base. Custom --from Dockerfiles using a different base image are preserved. 2. When pullAndResolveBaseImageDigest fails, verify the base image exists locally before falling back to unpinned :latest. On first-time installs behind a firewall with no cached image, fail early with a clear error instead of deferring to a confusing Docker build failure. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The process.exit(1) in the offline fallback broke CI unit tests — the basic-checks runner has no Docker daemon, so docker image inspect always fails, causing 10+ createSandbox tests to abort. The Docker build itself will fail clearly if the image is truly unavailable, so a warning with actionable guidance is sufficient here. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/lib/onboard.ts (1)
2825-2848: Consider extracting this fallback block to keepcreateSandbox()from growing further.This is good logic, but adding more branch-heavy behavior inside
createSandbox()makes future maintenance/testing harder.As per coding guidelines: "Limit cyclomatic complexity to 20 in JavaScript/TypeScript files, with target of 15".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 2825 - 2848, The fallback logic checking pullAndResolveBaseImageDigest(), invoking runCapture(...) to inspect SANDBOX_BASE_IMAGE and SANDBOX_BASE_TAG, and logging/errors should be extracted out of createSandbox() into a new helper (e.g., ensureBaseImageAvailableOrExit or resolveBaseImageWithFallback) that returns the resolved digest or null and performs the local-check logging and process.exit(1) on hard failure; update createSandbox() to call this helper and use its return value, keeping all references to pullAndResolveBaseImageDigest, runCapture, SANDBOX_BASE_IMAGE and SANDBOX_BASE_TAG inside the new function so the main createSandbox() body is shorter and cyclomatic complexity is reduced.test/onboard.test.ts (2)
4753-4764: Test name overstates what is being verified.This case validates
SANDBOX_BASE_IMAGEonly; it does not exercisepullAndResolveBaseImageDigestdirectly. Consider renaming for precision.✏️ Suggested rename
-it("regression `#1904`: pullAndResolveBaseImageDigest uses sandbox-base registry", () => { +it("regression `#1904`: SANDBOX_BASE_IMAGE uses sandbox-base registry", () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.ts` around lines 4753 - 4764, The test title is misleading: it claims to verify pullAndResolveBaseImageDigest but only asserts the SANDBOX_BASE_IMAGE constant; update the test name/description to reflect that it validates the SANDBOX_BASE_IMAGE value (e.g., "SANDBOX_BASE_IMAGE references nemoclaw/sandbox-base and not openshell-community") so the behavior matches the assertions, and leave the assertions on SANDBOX_BASE_IMAGE unchanged; reference the test block containing the assertions and the SANDBOX_BASE_IMAGE symbol when making the rename.
4766-4778: Order assertion is too broad and can false-pass.The current
indexOfscan is file-wide; it can match unrelated occurrences outsidecreateSandbox. Scope the check tocreateSandboxsource text first.🎯 More robust scoped assertion
const source = fs.readFileSync( path.join(import.meta.dirname, "..", "src", "lib", "onboard.ts"), "utf-8", ); - const pullPos = source.indexOf("pullAndResolveBaseImageDigest()"); - assert.ok(pullPos !== -1, "pullAndResolveBaseImageDigest() call not found in onboard.ts"); - const patchPos = source.indexOf("patchStagedDockerfile(", pullPos); + const fnMatch = source.match(/async function createSandbox\([\s\S]*?\n\}/); + assert.ok(fnMatch, "createSandbox() definition not found in onboard.ts"); + const fnSource = fnMatch[0]; + const pullPos = fnSource.indexOf("pullAndResolveBaseImageDigest()"); + assert.ok(pullPos !== -1, "pullAndResolveBaseImageDigest() call not found in createSandbox()"); + const patchPos = fnSource.indexOf("patchStagedDockerfile(", pullPos); assert.ok( patchPos > pullPos, "pullAndResolveBaseImageDigest must be called BEFORE patchStagedDockerfile — regression `#1904`", );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.ts` around lines 4766 - 4778, The current test searches the whole file for pullAndResolveBaseImageDigest() and patchStagedDockerfile( which can match unrelated code; instead extract the source of createSandbox and run the ordering assertions inside that scope: locate the createSandbox start by finding "createSandbox(" or its declaration token (e.g., "function createSandbox" / "const createSandbox"), compute the function body bounds (e.g., find the matching closing brace or use a small regex to capture the function block), then call indexOf("pullAndResolveBaseImageDigest()") and indexOf("patchStagedDockerfile(", startAt=0) on that extracted createSandbox source and assert that pullPos !== -1 and patchPos > pullPos to ensure pullAndResolveBaseImageDigest is called before patchStagedDockerfile only within createSandbox.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2842-2847: The early process.exit(1) in the offline-image failure
path (where SANDBOX_BASE_IMAGE and SANDBOX_BASE_TAG are logged) causes the
temporary staged build context to be left behind; change this path to run the
same cleanup used elsewhere (the run("rm -rf ...") cleanup) before exiting or
restructure the surrounding function to use a try/finally that always calls the
cleanup routine, then exit with the same non-zero code if the image is
unavailable; ensure you update the branch that currently calls process.exit(1)
so it invokes the existing cleanup logic (or the cleanup function) prior to
terminating.
---
Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 2825-2848: The fallback logic checking
pullAndResolveBaseImageDigest(), invoking runCapture(...) to inspect
SANDBOX_BASE_IMAGE and SANDBOX_BASE_TAG, and logging/errors should be extracted
out of createSandbox() into a new helper (e.g., ensureBaseImageAvailableOrExit
or resolveBaseImageWithFallback) that returns the resolved digest or null and
performs the local-check logging and process.exit(1) on hard failure; update
createSandbox() to call this helper and use its return value, keeping all
references to pullAndResolveBaseImageDigest, runCapture, SANDBOX_BASE_IMAGE and
SANDBOX_BASE_TAG inside the new function so the main createSandbox() body is
shorter and cyclomatic complexity is reduced.
In `@test/onboard.test.ts`:
- Around line 4753-4764: The test title is misleading: it claims to verify
pullAndResolveBaseImageDigest but only asserts the SANDBOX_BASE_IMAGE constant;
update the test name/description to reflect that it validates the
SANDBOX_BASE_IMAGE value (e.g., "SANDBOX_BASE_IMAGE references
nemoclaw/sandbox-base and not openshell-community") so the behavior matches the
assertions, and leave the assertions on SANDBOX_BASE_IMAGE unchanged; reference
the test block containing the assertions and the SANDBOX_BASE_IMAGE symbol when
making the rename.
- Around line 4766-4778: The current test searches the whole file for
pullAndResolveBaseImageDigest() and patchStagedDockerfile( which can match
unrelated code; instead extract the source of createSandbox and run the ordering
assertions inside that scope: locate the createSandbox start by finding
"createSandbox(" or its declaration token (e.g., "function createSandbox" /
"const createSandbox"), compute the function body bounds (e.g., find the
matching closing brace or use a small regex to capture the function block), then
call indexOf("pullAndResolveBaseImageDigest()") and
indexOf("patchStagedDockerfile(", startAt=0) on that extracted createSandbox
source and assert that pullPos !== -1 and patchPos > pullPos to ensure
pullAndResolveBaseImageDigest is called before patchStagedDockerfile only within
createSandbox.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 88b1deca-7792-4c5c-bc18-b036c8dab096
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard.test.ts
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
2825-2859:⚠️ Potential issue | 🟡 MinorOnly resolve/pull
sandbox-basewhen the staged Dockerfile actually uses it.This now runs for built-in,
--from, and agent Dockerfiles. For a custom Dockerfile with a different base image, it still does an unnecessary GHCR pull and can emit a false warning that the build may fail becauseghcr.io/nvidia/nemoclaw/sandbox-base:latestis unavailable, even though that image is never referenced.Suggested fix
- const resolved = pullAndResolveBaseImageDigest(); - if (resolved) { + const dockerfileText = fs.readFileSync(stagedDockerfile, "utf8"); + const currentBaseImage = dockerfileText.match(/^ARG BASE_IMAGE=(.*)$/m)?.[1]?.trim() || ""; + const usesSandboxBase = + currentBaseImage.startsWith(`${SANDBOX_BASE_IMAGE}:`) || + currentBaseImage.startsWith(`${SANDBOX_BASE_IMAGE}@`) || + dockerfileText.includes(`FROM ${SANDBOX_BASE_IMAGE}:`) || + dockerfileText.includes(`FROM ${SANDBOX_BASE_IMAGE}@`); + + const resolved = usesSandboxBase ? pullAndResolveBaseImageDigest() : null; + if (resolved) { console.log(` Pinning base image to ${resolved.digest.slice(0, 19)}...`); - } else { + } else if (usesSandboxBase) { // Check if the image exists locally before falling back to unpinned :latest. // On a first-time install behind a firewall with no cached image, warn early // so the user knows the build will likely fail. const localCheck = runCapture( ["docker", "image", "inspect", `${SANDBOX_BASE_IMAGE}:${SANDBOX_BASE_TAG}`], { ignoreError: true }, );Based on learnings:
patchStagedDockerfileis called on built-in,--fromcustom, and agent-generated Dockerfiles increateSandbox.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: db6fda70-e00d-4ee7-b98f-f772fe09dd6e
📒 Files selected for processing (1)
src/lib/onboard.ts
Refresh user-facing docs against the 34 commits merged between v0.0.17 and v0.0.18. Highlights: - Replace the Ollama 0.0.0.0 binding guidance with the new authenticated reverse proxy on 127.0.0.1:11435 (#1922). - Document the compatible-endpoint provider defaulting to /v1/chat/completions and the NEMOCLAW_PREFERRED_API=openai-responses opt-in (#1984). - Add the new nemoclaw upgrade-sandboxes command with --check, --auto, and --yes flags (#1943). - Note the cross-sandbox messaging overlap warning and 409 detection in nemoclaw <name> status (#1953). - Document the messaging-token rotation auto-rebuild flow (#1967). - Cover new troubleshooting entries for the Ollama auth proxy, IPv6 localhost resolution, orphan SSH port-forward cleanup on re-onboard, and rotated messaging credentials (#1978, #1950). - Note tar failure exit code for nemoclaw debug --output (#1770) and the orphaned openshell process cleanup in nemoclaw uninstall (#1940). Also: - Extend docs/.docs-skip to exclude the experimental sandbox-mgmt shields and config commands (#1976). - Fix a sphinx-autobuild infinite rebuild loop in docs/conf.py by writing docs/project.json only when its contents change. - Bump the docs version switcher preferred entry to 0.0.18. - Regenerate nemoclaw-user-* agent skills from docs/. Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Made-with: Cursor
## Summary Refresh user-facing documentation against the 34 commits merged between v0.0.17 and v0.0.18, bump the docs version switcher to v0.0.18, and fix a `sphinx-autobuild` infinite-rebuild loop triggered by `docs/conf.py`. ## Changes - **Ollama authenticated reverse proxy** (#1922): Replace the `0.0.0.0:11434` guidance in `docs/inference/use-local-inference.md` with the new token-gated proxy on `127.0.0.1:11435`, including persisted token, health-check exemption, and sandbox provider wiring. Replace the matching troubleshooting entry in `docs/reference/troubleshooting.md`. - **Compatible-endpoint default API path** (#1984): Document that the compatible-endpoint provider now defaults to `/v1/chat/completions` and update `NEMOCLAW_PREFERRED_API` to describe `openai-responses` as the opt-in instead of `openai-completions`. Updates in `use-local-inference.md`, `switch-inference-providers.md`, and `troubleshooting.md`. - **`nemoclaw upgrade-sandboxes` command** (#1943): Add a new reference entry in `docs/reference/commands.md` covering `--check`, `--auto`, and `--yes` flags. - **Messaging token rotation auto-rebuild** (#1967, #1953): Note the automatic rebuild behavior and cross-sandbox overlap warning in `docs/deployment/set-up-telegram-bridge.md`, `commands.md`, and `troubleshooting.md`. - **Other troubleshooting additions**: - `localhost` → `127.0.0.1` IPv6 note (#1978) - Orphan SSH port-forward cleanup on re-onboard (#1950) - Orphan `openshell` process cleanup in `nemoclaw uninstall` (#1940) - Non-zero exit on tar failure in `nemoclaw debug --output` (#1770) - **Skip list**: Extend `docs/.docs-skip` to exclude the experimental sandbox-mgmt shields and config commands feature (#1976), which was explicitly merged as not-yet-documented. - **Build stability**: `docs/conf.py` now writes `docs/project.json` only when contents change, so `make docs-live` / `sphinx-autobuild` no longer detects its own generated file as a source change and enters an infinite rebuild loop. - **Version switcher**: Bump `docs/versions1.json` and `docs/project.json` preferred entry to v0.0.18 so this refresh renders under the new version. - **Agent skills**: Regenerate `nemoclaw-user-*` skills from `docs/` with `scripts/docs-to-skills.py`. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes (ran via pre-commit hook on staged files) - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Cursor --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added `nemoclaw upgrade-sandboxes` command to rebuild sandboxes when base-image digests change. * Introduced authenticated reverse proxy for local Ollama inference with token-based access control. * Automatic sandbox backup, recreation, and restore when messaging credentials are updated. * Cross-sandbox messaging token overlap detection with status warnings. * **Improvements** * Compatible-endpoint provider now defaults to `/v1/chat/completions` API path. * Enhanced troubleshooting documentation with new diagnostics sections. * **Documentation** * Updated onboarding and configuration guides. * Expanded version documentation to 0.0.18. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
Summary
upgrade-sandboxesreverted in fix: revert base image digest pinning that broke all e2e tests #1938blueprint.yaml(belongs toghcr.io/nvidia/openshell-community) and applied it toghcr.io/nvidia/nemoclaw/sandbox-base— a different registry. Docker digests are repo-specific, so every sandbox creation failed with "manifest unknown"pullAndResolveBaseImageDigest()pullssandbox-base:latestfrom GHCR, thendocker inspectextracts the actual repo digest. The digest is self-consistent by construction — it always comes from the same registry we pin to:latestwhen GHCR is unreachable (offline/firewall users)nemoclaw upgrade-sandboxescommand with--check/--auto/--yesflags and fixes from CodeRabbit review (sandbox list error handling,throwOnErrorfor batch rebuilds,--autoin install.sh)Closes #1904
Test plan
ARG BASE_IMAGEreferencesnemoclaw/sandbox-base, NOTopenshell-communitypullAndResolveBaseImageDigest()called beforepatchStagedDockerfile()upgrade-sandboxes --checkdetects stale sandbox (original reporter scenario from [All Platform][Upgrade][Github Issue #1904] The sandbox OpenClaw version is not upgraded after NemoClaw upgrade #1904)upgrade-sandboxes --checkreports all-current when no sandboxes are stalenemoclaw upgrade-sandboxes --checkon a real environment with a stale sandboxSummary by CodeRabbit
New Features
Improvements
Tests
Chores