Skip to content

fix(onboard): use Docker-driver gateway on macOS OpenShell 0.0.37#3383

Merged
ericksoa merged 24 commits into
mainfrom
fix/darwin-openshell-037-gateway
May 12, 2026
Merged

fix(onboard): use Docker-driver gateway on macOS OpenShell 0.0.37#3383
ericksoa merged 24 commits into
mainfrom
fix/darwin-openshell-037-gateway

Conversation

@ericksoa

@ericksoa ericksoa commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the macOS install/onboard regression when tip-of-main installs OpenShell 0.0.37 over an older or incomplete OpenShell. OpenShell 0.0.37 removed the legacy openshell gateway start/destroy lifecycle verbs; NemoClaw had switched to the standalone Docker-driver gateway path on Linux only, so macOS still called the removed verbs during onboarding.

Changes

  • Treat Darwin as a Docker-driver gateway platform in onboarding so macOS starts the standalone openshell-gateway process instead of calling removed openshell gateway start/destroy verbs.
  • Install and require the macOS arm64 openshell-gateway-aarch64-apple-darwin.tar.gz asset when OpenShell 0.0.37 is present.
  • Move OpenShell repair into an explicit onboard preflight helper, ensureOpenshellForOnboard(), so nemoclaw onboard itself upgrades stale OpenShell and repairs incomplete Docker-driver installs before gateway start.
  • Fix onboard’s Docker-driver binary requirement by platform: Linux requires openshell-gateway and openshell-sandbox; macOS requires openshell-gateway.
  • Extend openshell-gateway-upgrade-e2e to model the real upgrade path: install NemoClaw v0.0.36 with the curl-style installer, create a working registered claw on OpenShell 0.0.36, start a running in-sandbox agent process, run the current curl-style installer/onboard path from the PR SHA, and assert the same sandbox and same agent PID remain healthy with OpenShell 0.0.37.
  • Upload the OpenShell install/preflight log from that E2E on failure.

Verification

  • bash -n test/e2e/test-openshell-gateway-upgrade.sh scripts/install-openshell.sh
  • shellcheck scripts/install-openshell.sh test/e2e/test-openshell-gateway-upgrade.sh
  • git diff --check
  • npm run build:cli
  • npx vitest run test/install-openshell-version-check.test.ts test/onboard.test.ts --testTimeout 60000
  • bash test/e2e/test-openshell-gateway-upgrade.sh on macOS: passes the synthetic macOS installer regression and skips the live Linux gateway/agent upgrade section as intended.

Notes

The live old-install upgrade section of test/e2e/test-openshell-gateway-upgrade.sh remains Linux-only and is wired into openshell-gateway-upgrade-e2e in nightly. It now starts from the real NemoClaw v0.0.36 installer, not a synthetic OpenShell-only setup.

Summary by CodeRabbit

  • New Features

    • Onboarding/installer now supports macOS Docker-driver gateway onboarding and selects platform-specific install assets.
  • Bug Fixes

    • Improved OS-aware detection to identify missing Docker-driver components and choose the correct install/upgrade payloads across macOS and Linux.
  • Tests

    • Added macOS arm64 and expanded upgrade/regression e2e tests validating gateway upgrade and sandbox-survivor continuity; improved test diagnostics.
  • Chores

    • CI job text clarified and gateway install logs included in failure artifacts.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds platform-aware OpenShell installer checks and macOS aarch64 gateway asset selection, refactors onboard install/upgrade into helpers that consider platform-specific Docker-driver binary requirements (gateway vs gateway+sandbox), enables the gateway path on Darwin, and expands unit and E2E tests to validate installer behavior and survivor sandbox persistence across gateway restarts.

Changes

macOS Docker-driver gateway support and test coverage

Layer / File(s) Summary
Installer platform detection and macOS asset support
scripts/install-openshell.sh
New required_driver_bins_present() checks OS-specific requirements (Linux: gateway+sandbox; macOS: gateway). Asset selection uses an OS case statement, supporting macOS aarch64 gateway and conditional Linux gateway+sandbox assets; installed-version check now calls the new function.
Installer unit tests for platform-specific behavior
test/install-openshell-version-check.test.ts
Curl stubs now emit curl stub: $* to stderr. New macOS arm64 tests verify missing gateway detection and inclusion of openshell-gateway-aarch64-apple-darwin.tar.gz in the install payload. Adjusted a Linux upgrade test to pass os: "Linux" explicitly.
Onboard helpers and runtime gateway enablement
src/lib/onboard.ts
Introduced areRequiredDockerDriverBinariesPresent() and ensureOpenshellForOnboard() and moved install/upgrade logic into these helpers; isLinuxDockerDriverGatewayEnabled() now returns true for darwin as well; new helpers are exported.
Onboard test expectations and internals contract
test/onboard.test.ts
Exposes areRequiredDockerDriverBinariesPresent in test internals, updates gateway environment tests to expect darwin=true and win32=false, broadens port-listener tests, adjusts mocked platform in a regression harness, and adds an onboard-repair test for binary requirements across platforms.
CI job comments and artifacts
.github/workflows/nightly-e2e.yaml
Job description comments updated to describe the old-install upgrade edge case; failure artifact upload expanded to include /tmp/nemoclaw-e2e-openshell-gateway-install.log.
E2E gateway upgrade test with macOS installer regression and sandbox persistence
test/e2e/test-openshell-gateway-upgrade.sh
Adds macOS arm64 installer regression checks, improves failure diagnostics, introduces old/current OpenShell selection and survivor sandbox parameters, adds helpers to create an old OpenShell wrapper and a Docker-backed survivor sandbox with a persistent agent/marker, and asserts sandbox continuity (marker, pid, heartbeat, registry/list) across gateway restart.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nudged a gate on Darwin's lea,

an orphaned binary found by me,
installers fetch by arch and bin,
the sandbox hummed while restarts begin,
tests hop home — the agent stays within.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: enabling Docker-driver gateway support on macOS for OpenShell 0.0.37, which is the primary regression fix across multiple modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/darwin-openshell-037-gateway

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Failed: pi exited with status 1; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-pi-raw-output.txt

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/test-openshell-gateway-upgrade.sh (1)

84-93: ⚡ Quick win

Restore sandboxes.json after test to avoid persistent local side effects.

At Line 213-237 the script mutates $HOME/.nemoclaw/sandboxes.json, but cleanup (Line 84-93) doesn’t revert it. This can leave stale local state after manual runs.

♻️ Proposed patch
@@
 REGISTRY_FILE="$HOME/.nemoclaw/sandboxes.json"
+REGISTRY_BACKUP=""
@@
 cleanup() {
   set +e
   cleanup_pid "$OLD_PID"
   cleanup_pid "$NEW_PID"
@@
   rm -f "$PID_FILE"
+  if [ "${REGISTRY_BACKUP:-}" = "__ABSENT__" ]; then
+    rm -f "$REGISTRY_FILE"
+  elif [ -n "${REGISTRY_BACKUP:-}" ] && [ -f "$REGISTRY_BACKUP" ]; then
+    mkdir -p "$(dirname "$REGISTRY_FILE")"
+    cp "$REGISTRY_BACKUP" "$REGISTRY_FILE"
+    rm -f "$REGISTRY_BACKUP"
+  fi
 }
@@
 create_survivor_sandbox() {
@@
   mkdir -p "$(dirname "$REGISTRY_FILE")"
+  if [ -z "${REGISTRY_BACKUP:-}" ]; then
+    if [ -f "$REGISTRY_FILE" ]; then
+      REGISTRY_BACKUP="$(mktemp)"
+      cp "$REGISTRY_FILE" "$REGISTRY_BACKUP"
+    else
+      REGISTRY_BACKUP="__ABSENT__"
+    fi
+  fi
   python3 - <<PY

Also applies to: 212-237

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openshell-gateway-upgrade.sh` around lines 84 - 93, The test
mutates the local sandboxes JSON but never restores it; before any mutation (in
the setup section around where sandboxes.json is modified) save the original
contents to a temp backup (e.g., BACKUP_SANDBOXES or a tmp file) and after the
test ensure restoration in the cleanup() function; update cleanup() (which
already calls cleanup_pid, handles OLD_PID/NEW_PID, openshell sandbox
delete/SURVIVOR_SANDBOX and removes PID_FILE) to detect the backup and restore
the original sandboxes JSON (or remove the temporary copy) so the local state is
reverted even on early exits.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Line 3790: The change widening Docker-driver mode to include darwin causes
downstream procfs-based checks to misclassify macOS gateway processes; update
the logic to skip /proc inspections on macOS by guarding calls to readProcessEnv
and readProcessExe (and any callers that use them, e.g., the gateway process
reuse/drift classification) when platform === "darwin" instead of assuming Linux
semantics; add a short-circuit that treats darwin as “cannot inspect via /proc”
and falls back to a safe non-stale/unowned assumption or an alternative check so
healthy macOS gateway processes are not marked stale.

---

Nitpick comments:
In `@test/e2e/test-openshell-gateway-upgrade.sh`:
- Around line 84-93: The test mutates the local sandboxes JSON but never
restores it; before any mutation (in the setup section around where
sandboxes.json is modified) save the original contents to a temp backup (e.g.,
BACKUP_SANDBOXES or a tmp file) and after the test ensure restoration in the
cleanup() function; update cleanup() (which already calls cleanup_pid, handles
OLD_PID/NEW_PID, openshell sandbox delete/SURVIVOR_SANDBOX and removes PID_FILE)
to detect the backup and restore the original sandboxes JSON (or remove the
temporary copy) so the local state is reverted even on early exits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f0f0af19-d406-4f47-80f9-e6bf0244d7c3

📥 Commits

Reviewing files that changed from the base of the PR and between 4b47ab4 and f80407e.

📒 Files selected for processing (5)
  • scripts/install-openshell.sh
  • src/lib/onboard.ts
  • test/e2e/test-openshell-gateway-upgrade.sh
  • test/install-openshell-version-check.test.ts
  • test/onboard.test.ts

Comment thread src/lib/onboard.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25711792996
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25712253610
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/test-openshell-gateway-upgrade.sh (1)

406-410: 💤 Low value

Duplicate local declaration for marker.

Line 406 declares local marker pid counter, then line 410 declares local marker again. The second declaration is redundant.

♻️ Proposed fix
 assert_survivor_sandbox_after_upgrade() {
-  local marker pid counter
+  local pid counter marker
   info "Verifying survivor sandbox after OpenShell gateway upgrade"
   wait_for_survivor_ready || fail "survivor sandbox is not Ready after gateway upgrade"

-  local marker
   marker="$(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openshell-gateway-upgrade.sh` around lines 406 - 410, Remove
the redundant local declaration for marker: keep the original single declaration
"local marker pid counter" and delete the subsequent "local marker" line so the
variables are declared only once (look for the duplicate declaration of marker
in the test-openshell-gateway-upgrade.sh script near the survivor readiness
check).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/test-openshell-gateway-upgrade.sh`:
- Around line 406-410: Remove the redundant local declaration for marker: keep
the original single declaration "local marker pid counter" and delete the
subsequent "local marker" line so the variables are declared only once (look for
the duplicate declaration of marker in the test-openshell-gateway-upgrade.sh
script near the survivor readiness check).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cf0ab207-6a05-4851-a121-d5097970d217

📥 Commits

Reviewing files that changed from the base of the PR and between f80407e and 19d7a59.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-openshell-gateway-upgrade.sh
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/nightly-e2e.yaml

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lib/onboard.ts (1)

3927-3931: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Darwin enablement still routes through Linux-only /proc checks.

With this gate returning true on macOS, the reuse/drift path now reaches readProcessEnv / readProcessExe and the requireDockerDriverEnv: true checks below. Those are Linux-only, so healthy openshell-gateway processes on Darwin will be classified as stale/unowned and get restarted or fail adoption on later onboard/resume runs. This still needs the non-Linux guard/fallback that was already called out earlier.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 3927 - 3931, The function
isLinuxDockerDriverGatewayEnabled currently returns true for darwin which lets
macOS code flow into Linux-only checks (readProcessEnv, readProcessExe and
places using requireDockerDriverEnv: true) causing healthy openshell-gateway
processes to be misclassified; change the gate to only enable on Linux (return
platform === "linux") or, alternatively, keep the function but add an explicit
platform guard where callers invoke readProcessEnv/readProcessExe and check
requireDockerDriverEnv so those Linux-specific checks are skipped on darwin,
ensuring macOS follows the non-Linux fallback path.
🧹 Nitpick comments (1)
test/e2e/test-openshell-gateway-upgrade.sh (1)

358-401: 💤 Low value

Redundant local marker declaration.

Line 359 already declares marker as local, so the local marker on line 363 is redundant.

♻️ Suggested cleanup
 assert_survivor_sandbox_after_upgrade() {
   local marker pid counter
   info "Verifying survivor sandbox after OpenShell gateway upgrade"
   wait_for_survivor_ready || fail "survivor sandbox is not Ready after gateway upgrade"

-  local marker
   marker="$(
     openshell sandbox exec --name "$SURVIVOR_SANDBOX" -- \
       cat /tmp/nemoclaw-gateway-upgrade-marker 2>/dev/null || true
   )"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openshell-gateway-upgrade.sh` around lines 358 - 401, In
function assert_survivor_sandbox_after_upgrade remove the duplicate local
declaration for marker (the second "local marker" before assigning
marker="$(openshell sandbox exec ... )"); keep the initial local
marker,pid,counter declaration at the top and delete the redundant declaration
so marker is only declared once.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3516-3647: ensureOpenshellForOnboard currently assumes
installOpenshell() success implies required docker-driver binaries are present;
after any installOpenshell() call (the ones inside ensureOpenshellForOnboard
where openshellInstall is assigned), re-run the docker-driver check
(areRequiredDockerDriverBinariesPresent or equivalent) and, if still missing
when isLinuxDockerDriverGatewayEnabled() is true and docker-driver binaries were
the original reason for reinstall (needsDockerDriverBinaries was true), treat
the install as failed: log a clear error and exit (mirror existing failure
handling for installs/upgrades) or attempt one more install+check loop before
giving up so ensureOpenshellForOnboard never returns success while required
openshell-gateway/openshell-sandbox binaries are absent.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 3927-3931: The function isLinuxDockerDriverGatewayEnabled
currently returns true for darwin which lets macOS code flow into Linux-only
checks (readProcessEnv, readProcessExe and places using requireDockerDriverEnv:
true) causing healthy openshell-gateway processes to be misclassified; change
the gate to only enable on Linux (return platform === "linux") or,
alternatively, keep the function but add an explicit platform guard where
callers invoke readProcessEnv/readProcessExe and check requireDockerDriverEnv so
those Linux-specific checks are skipped on darwin, ensuring macOS follows the
non-Linux fallback path.

---

Nitpick comments:
In `@test/e2e/test-openshell-gateway-upgrade.sh`:
- Around line 358-401: In function assert_survivor_sandbox_after_upgrade remove
the duplicate local declaration for marker (the second "local marker" before
assigning marker="$(openshell sandbox exec ... )"); keep the initial local
marker,pid,counter declaration at the top and delete the redundant declaration
so marker is only declared once.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2247a77e-5bff-420e-8279-ae1cbc71f98b

📥 Commits

Reviewing files that changed from the base of the PR and between 19d7a59 and 0785611.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • test/e2e/test-openshell-gateway-upgrade.sh
  • test/onboard.test.ts

Comment thread src/lib/onboard.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25712441875
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25712785974
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/test-openshell-gateway-upgrade.sh (1)

419-427: 💤 Low value

Remove duplicate local marker declaration.

marker is declared twice: at line 419 and again at line 423. This is redundant and likely a copy-paste artifact.

Proposed fix
 assert_survivor_sandbox_after_upgrade() {
   local marker pid counter
   info "Verifying survivor sandbox after OpenShell gateway upgrade"
   wait_for_survivor_ready || fail "survivor sandbox is not Ready after gateway upgrade"

-  local marker
   marker="$(
     openshell sandbox exec --name "$SURVIVOR_SANDBOX" -- \
       cat /tmp/nemoclaw-gateway-upgrade-marker 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openshell-gateway-upgrade.sh` around lines 419 - 427, The file
declares the variable "marker" twice; remove the redundant declaration so
"marker" is declared only once before it's assigned. Specifically, delete the
earlier standalone "local marker" and keep the single "local marker" that
directly precedes the assignment using openshell sandbox exec (referencing
SURVIVOR_SANDBOX and the cat /tmp/nemoclaw-gateway-upgrade-marker command); no
other logic changes needed and wait_for_survivor_ready() and subsequent uses of
marker should remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-openshell-gateway-upgrade.sh`:
- Around line 368-372: The sandbox creation fails because the test uses a
standalone gateway binary (exec "$STALE_GATEWAY_BIN") and registers it locally
(openshell gateway add --local --name nemoclaw ...) but still passes --gateway
nemoclaw to openshell sandbox create, which expects a Docker cluster container;
fix by either removing the --gateway nemoclaw flag from the openshell sandbox
create call (so it uses the default/selected gateway) or ensure Docker-based
gateway infrastructure is created before sandbox creation by invoking the
install script used by other tests (run the same install.sh/setup sequence that
creates the openshell-cluster-nemoclaw container) so the openshell sandbox
create --gateway nemoclaw command can find the expected Docker cluster.

---

Nitpick comments:
In `@test/e2e/test-openshell-gateway-upgrade.sh`:
- Around line 419-427: The file declares the variable "marker" twice; remove the
redundant declaration so "marker" is declared only once before it's assigned.
Specifically, delete the earlier standalone "local marker" and keep the single
"local marker" that directly precedes the assignment using openshell sandbox
exec (referencing SURVIVOR_SANDBOX and the cat
/tmp/nemoclaw-gateway-upgrade-marker command); no other logic changes needed and
wait_for_survivor_ready() and subsequent uses of marker should remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88287808-a6d4-47e2-afd1-a01d1e92aa81

📥 Commits

Reviewing files that changed from the base of the PR and between 0785611 and a24a0c8.

📒 Files selected for processing (1)
  • test/e2e/test-openshell-gateway-upgrade.sh

Comment thread test/e2e/test-openshell-gateway-upgrade.sh Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25712886046
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ⚠️ cancelled

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25713136197
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

local old_head expected_head
old_head="$(git -C "$HOME/.nemoclaw/source" rev-parse HEAD 2>/dev/null || true)"
expected_head="$(git ls-remote https://github.com/NVIDIA/NemoClaw.git "refs/tags/${OLD_NEMOCLAW_REF}" | awk '{print $1}')"
[ -n "$old_head" ] && [ "$old_head" = "$expected_head" ] \
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25713253315
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25716192263
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

Comment thread src/lib/onboard.ts
destroyGateway();
registry.clearAll();
retireLegacyGatewayForDockerDriverUpgrade();
gatewayReuseState = "missing";
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25716841021
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ❌ failure

Failed jobs: openshell-gateway-upgrade-e2e. Check run artifacts for logs.

Comment thread test/e2e/test-openshell-gateway-upgrade.sh Fixed
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25766991915
Branch: fix/darwin-openshell-037-gateway
Requested jobs: openshell-gateway-upgrade-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
openshell-gateway-upgrade-e2e ✅ success

…037-gateway

# Conflicts:
#	src/lib/onboard.ts
@ericksoa ericksoa merged commit 2fe170b into main May 12, 2026
28 checks passed
@ericksoa ericksoa deleted the fix/darwin-openshell-037-gateway branch May 12, 2026 23:25
@miyoungc miyoungc mentioned this pull request May 12, 2026
4 tasks
ericksoa pushed a commit that referenced this pull request May 13, 2026
## Summary
- Add v0.0.40 release notes and update docs version metadata.
- Document release-prep behavior changes around onboarding, local
inference, policy preset filtering, and config recovery.
- Refresh generated `nemoclaw-user-*` skills from the source docs.

## Source summary
- #3383 -> `docs/about/release-notes.md`, `docs/reference/commands.md`,
`docs/manage-sandboxes/lifecycle.md`: Reflect macOS Docker-driver
OpenShell gateway onboarding and upgrade behavior.
- #3378 -> `docs/about/release-notes.md`: Capture the Docker-driver
gateway TCP readiness fix and clearer startup failures.
- #3338 -> `docs/about/release-notes.md`,
`docs/inference/use-local-inference.md`: Reflect the Ollama auth proxy
token requirement on native API routes.
- #3420 -> `docs/about/release-notes.md`,
`docs/get-started/prerequisites.md`,
`docs/inference/use-local-inference.md`: Document the Linux Ollama
`zstd` preflight and sudo messaging.
- #3417 -> `docs/about/release-notes.md`,
`docs/inference/inference-options.md`,
`docs/inference/use-local-inference.md`: Reflect detected running vLLM
provider selection.
- #3223 -> `docs/about/release-notes.md`, `docs/reference/commands.md`,
`docs/reference/network-policies.md`, `docs/get-started/quickstart.md`:
Document agent-aware policy preset filtering.
- #3385 -> `docs/about/release-notes.md`: Capture the dashboard forward
TCP reachability check.
- #3160 -> `docs/about/release-notes.md`,
`docs/reference/troubleshooting.md`: Document empty `openclaw.json`
baseline recovery.
- #3367 -> `docs/about/release-notes.md`: Capture OpenClaw plugin
compatibility metadata.

## Test plan
- [x] `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user`
- [x] `make docs`
- [x] `git diff --check`
- [x] Skip-term scan for `docs/.docs-skip` terms

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes v0.0.40

* **New Features**
* Sandbox configuration recovery when inference changes cause data loss
  * Policy presets now intelligently filter based on agent capabilities
  * Enhanced gateway health checks and upgrade reliability

* **Documentation**
* Improved local inference setup instructions with clearer dependency
requirements
  * Clarified vLLM experimental feature availability and prerequisites
  * Reorganized architecture documentation for enhanced clarity
  * Refined security and hardening guidance

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3427)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
latenighthackathon added a commit to latenighthackathon/NemoClaw that referenced this pull request May 14, 2026
… k3s

Clarifies that the OpenShell sandbox runs as a sibling Docker container on
Linux and macOS Apple Silicon (Docker-driver path, default since 0.0.39 via
NVIDIA#3001 and NVIDIA#3383) or as a pod in the embedded k3s cluster on macOS Intel,
Windows, and WSL2 (legacy path).

architecture.md gets a new platform/compute-path table at the top of
Deployment Topology, the existing Mermaid diagram is identified as the
legacy k3s path, and the layering table now lists both Sandbox container
and Sandbox pod variants. prerequisites.md drops k3s from the RAM-pressure
sentence (k3s is an internal detail of the gateway container, not a
separate process the user budgets for) and rewords the fuse-overlayfs note
to refer to the OpenShell gateway image rather than k3s directly.

Refs NVIDIA#3432.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
latenighthackathon added a commit to latenighthackathon/NemoClaw that referenced this pull request May 15, 2026
… k3s

Clarifies that the OpenShell sandbox runs as a sibling Docker container on
Linux and macOS Apple Silicon (Docker-driver path, default since 0.0.39 via
NVIDIA#3001 and NVIDIA#3383) or as a pod in the embedded k3s cluster on macOS Intel,
Windows, and WSL2 (legacy path).

architecture.md gets a new platform/compute-path table at the top of
Deployment Topology, the existing Mermaid diagram is identified as the
legacy k3s path, and the layering table now lists both Sandbox container
and Sandbox pod variants. prerequisites.md drops k3s from the RAM-pressure
sentence (k3s is an internal detail of the gateway container, not a
separate process the user budgets for) and rewords the fuse-overlayfs note
to refer to the OpenShell gateway image rather than k3s directly.

Refs NVIDIA#3432.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression and removed OpenShell labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants