Skip to content

Change small local model to qwen3.5:9b#3

Closed
ericksoa wants to merge 1 commit into
mainfrom
change-small-mode-qwen
Closed

Change small local model to qwen3.5:9b#3
ericksoa wants to merge 1 commit into
mainfrom
change-small-mode-qwen

Conversation

@ericksoa

Copy link
Copy Markdown
Contributor

Migrated from NVIDIA/openshell-openclaw-plugin#13 by @jacobtomlinson

@jacobtomlinson

Copy link
Copy Markdown
Member

We should hold on this until decisions are made about which models are best to use here.

jessesanford pushed a commit to jessesanford/NemoClaw that referenced this pull request Mar 24, 2026
Add `openclaw nemoclaw onboard` command
ericksoa pushed a commit that referenced this pull request Apr 7, 2026
… (#1305)

## Summary

Fixes the four issues reported in #1114 — EACCES permission errors and
missing gateway token when running inside the NemoClaw sandbox.

### Issue mapping

| # | Reported error | Fix |
|---|----------------|-----|
| 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` |
`install_configure_guard` — intercepts `openclaw configure` with a clear
error and directs users to `nemoclaw onboard --resume` on the host |
| 2 | Same as #1 (different PID) | Same fix |
| 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already
resolved on main via #1519 (credentials symlink to `.openclaw-data/`) |
| 4 | No WhatsApp QR code | Consequence of #3, also resolved by #1519 |

### Root cause (issues 1 & 2)

OpenClaw's `configure` command performs atomic writes — it creates a
temp
file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config.
Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level,
file creation is rejected with EACCES. This is by design: the sandbox
config is intentionally immutable at runtime.

Rather than weakening Landlock (security regression), we intercept the
command in the sandbox shell and guide users to the correct host-side
workflow.

### Changes

**1. `install_configure_guard()`** — Writes a shell function wrapper to
`.bashrc`/`.profile` that intercepts `openclaw configure` and prints:
```
Error: 'openclaw configure' cannot modify config inside the sandbox.
The sandbox config is read-only (Landlock enforced) for security.

To change your configuration, exit the sandbox and run:
  nemoclaw onboard --resume

This rebuilds the sandbox with your updated settings.
```
All other `openclaw` subcommands pass through to the real binary.

**2. `export_gateway_token()`** — Reads `gateway.auth.token` from
`openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so
interactive sessions (`openshell sandbox connect`) can authenticate
with the gateway. Persists to `.bashrc`/`.profile` using idempotent
marker blocks and cleans stale tokens on revocation.

**3. `_read_gateway_token()` helper** — Shared Python snippet used by
both `export_gateway_token` and `print_dashboard_urls` (deduplication,
uses `with open()` context manager).

All three are called in both root and non-root startup paths.

## Security properties preserved

- `/sandbox/.openclaw` remains root-owned, Landlock read-only
- `openclaw.json` remains chmod 444 (immutable)
- No new attack surface — token is read-only from existing config
- `command openclaw` bypass preserves all non-configure functionality

Fixes #1114

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…IA#1114) (NVIDIA#1305)

## Summary

Fixes the four issues reported in NVIDIA#1114 — EACCES permission errors and
missing gateway token when running inside the NemoClaw sandbox.

### Issue mapping

| # | Reported error | Fix |
|---|----------------|-----|
| 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` |
`install_configure_guard` — intercepts `openclaw configure` with a clear
error and directs users to `nemoclaw onboard --resume` on the host |
| 2 | Same as NVIDIA#1 (different PID) | Same fix |
| 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already
resolved on main via NVIDIA#1519 (credentials symlink to `.openclaw-data/`) |
| 4 | No WhatsApp QR code | Consequence of NVIDIA#3, also resolved by NVIDIA#1519 |

### Root cause (issues 1 & 2)

OpenClaw's `configure` command performs atomic writes — it creates a
temp
file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config.
Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level,
file creation is rejected with EACCES. This is by design: the sandbox
config is intentionally immutable at runtime.

Rather than weakening Landlock (security regression), we intercept the
command in the sandbox shell and guide users to the correct host-side
workflow.

### Changes

**1. `install_configure_guard()`** — Writes a shell function wrapper to
`.bashrc`/`.profile` that intercepts `openclaw configure` and prints:
```
Error: 'openclaw configure' cannot modify config inside the sandbox.
The sandbox config is read-only (Landlock enforced) for security.

To change your configuration, exit the sandbox and run:
  nemoclaw onboard --resume

This rebuilds the sandbox with your updated settings.
```
All other `openclaw` subcommands pass through to the real binary.

**2. `export_gateway_token()`** — Reads `gateway.auth.token` from
`openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so
interactive sessions (`openshell sandbox connect`) can authenticate
with the gateway. Persists to `.bashrc`/`.profile` using idempotent
marker blocks and cleans stale tokens on revocation.

**3. `_read_gateway_token()` helper** — Shared Python snippet used by
both `export_gateway_token` and `print_dashboard_urls` (deduplication,
uses `with open()` context manager).

All three are called in both root and non-root startup paths.

## Security properties preserved

- `/sandbox/.openclaw` remains root-owned, Landlock read-only
- `openclaw.json` remains chmod 444 (immutable)
- No new attack surface — token is read-only from existing config
- `command openclaw` bypass preserves all non-configure functionality

Fixes NVIDIA#1114

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
jyaunches pushed a commit to jyaunches/NemoClaw that referenced this pull request Apr 14, 2026
- Guard runArgv/runArgvCapture against shell:true to prevent security
  bypass (finding #1) — throws if a caller attempts to re-enable shell
  interpretation. Added 2 tests.
- Document the intentional bash -c exception in getOllamaWarmupCommand
  explaining why it's safe (finding NVIDIA#2).
- Remove dead getOpenshellCommand() from policies.ts (finding NVIDIA#3).
- Remove unused shellQuote import from nim.ts (finding NVIDIA#4).
- Fix brittle indexOf assertion in onboard-readiness test (finding NVIDIA#5).
jyaunches pushed a commit that referenced this pull request Apr 20, 2026
- Remove unused getForwardList() call from getActiveSandboxSessions —
  only pgrep/ps is needed for SSH session detection (warning #1)
- Consolidate double-prompt in sandboxDestroy into single enriched
  confirmation prompt (warning #2)
- Remove noisy cleanupGatewayAfterLastSandbox forward check that would
  always fire due to dashboard forward (warning #3)
- Use word-boundary regex in parseSshProcesses to prevent false positives
  when sandbox names share prefixes (warning #4)
- Export SessionClassification as named interface (suggestion #1)
- Use cross-platform ps -axo instead of Linux-only pgrep -a for macOS
  compatibility (suggestion #2)
- Add forwardCount to SessionClassification for future consumers
- Add tests for word-boundary matching edge cases
ericksoa added a commit that referenced this pull request Apr 20, 2026
- Deduplicate: timer now imports lockAgentConfig from shields.ts instead
  of reimplementing ~60 lines of kubectlExec + stat/lsattr verification
  inline. Removes the duplicated kubectlExec and K3S_CONTAINER constant
  from shields-timer.ts.

- Fix timer state gap (Blocker #3): the !lockVerified path now explicitly
  writes updateState({ shieldsDown: true }) before exiting, rather than
  relying on the absence of an update from shieldsDown().

- Fix rollback state lie (CodeRabbit): shieldsDown rollback no longer
  marks shields as UP when policy restore or lock verification fails.
  If either fails, state stays shieldsDown: true with guidance for manual
  intervention.

- Add lsattr format comment (Warning #3) for the flag-parsing line.
ericksoa added a commit that referenced this pull request Apr 26, 2026
Eight findings from a fresh-context adversarial review. Resolutions:

1. Memoize applyOverlayfsAutoFix per upstream image. recoverGatewayRuntime's
   second call to getGatewayStartEnv now returns the cached patched-tag
   without re-running assessHost or re-attempting the build. (The
   reviewer's "45-minute retry storm" was overstated — pRetry captures
   gatewayEnv from outer scope, so the build attempt is once per
   startGatewayWithOptions, not once per retry — but the recovery-path
   redundancy is real and worth deduping.)

2. Bind the patched-image cache key to the upstream image's content
   digest. computePatchedTag now SHAs over (upstreamImage, upstreamDigest,
   snapshotter, dockerfile). ensurePatchedClusterImage resolves the
   digest via `docker image inspect <upstream>` (zero network cost when
   warm; air-gap-safe with pre-staged images). If the local upstream
   isn't there, a `docker manifest inspect` reachability probe (bounded
   by inspectTimeoutMs, default 30s) runs BEFORE the long pull, so
   air-gapped/restricted hosts fail in seconds with a documented error
   instead of hanging through a 10-minute pull timeout. New unit test:
   "differs when only the upstream digest changes". New unit test:
   "fails fast with a documented error when upstream is unreachable on a
   cache miss".

3. Same `docker manifest inspect` probe doubles as the air-gap UX fix
   from finding #3 — fast failure mode + actionable error message that
   points at the troubleshooting doc.

4. Exclude WSL2 hosts from hasNestedOverlayConflict. We don't have a
   confirmed reproducer there, and the WSL kernel's overlay story is
   different from bare Linux. Conservative: leave WSL on the upstream
   image. New unit test: "does not flag a WSL2 Linux host as a conflict".

5. applyOverlayfsAutoFix now logs a console.warn breadcrumb when
   assessHost throws, instead of silently returning null. Future
   regressions in host assessment won't make the auto-fix mysteriously
   stop firing without any user-visible signal.

6. Tighten the e2e negative phase. Was: "any non-zero install.sh exit"
   passes (SKIP on the canonical-error-string check). Now: requires at
   least one of three nested-overlay-failure signatures in the cluster
   log or the install log:
     - "overlayfs snapshotter cannot be enabled" (k3s init)
     - "CreateDiff: Canceled" (sandbox image build)
     - "failed to mount overlay" (catch-all)
   Otherwise FAIL. Distinguishes a real reproduction from unrelated
   flakes (NVIDIA_API_KEY rejection, GHCR rate-limit, daemon blip).

7. Docs note that switching the host's storage driver via daemon.json
   doesn't just kill running containers — it also rebuilds the entire
   local image graph, so previously-pulled images become unusable until
   re-pulled. Documented under the manual workaround.

8. parseDockerStorageDriver now falls back to the plain-text
   `Storage Driver: <name>` form. assessHost still passes
   `--format '{{json .}}'` (the canonical path), but a future caller
   injecting raw `docker info` output won't silently miss the conflict.
   New unit test for the plain-text fixture.

Local sanity-build of the patched Dockerfile (with `ubuntu:24.04` as the
UPSTREAM stand-in) still produces a working `fuse-overlayfs --version`
binary in the final image.

Refs #2481.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
latenighthackathon referenced this pull request in latenighthackathon/NemoClaw Apr 27, 2026
shouldShowLine recognized classic Docker pull lines (Pulling from /
hex-prefix layer status / Status:) but dropped BuildKit progress
(#3 resolve … / #3 sha256:… 12.34MB / 45.67MB), so users on BuildKit-
enabled engines saw the Pulling base image from registry... phase
banner but no actual progress until the pull completed.

Extract the BuildKit pull-line regex into a single BUILDKIT_PULL_LINE
const used by both shouldShowLine and isPullLine — fixes the
forwarding gap and removes the previous duplicate inline regex
between the two predicates.

Tighten the BuildKit test to assert sawProgress: true and that both
emitted progress lines actually reach logLine, locking in the fix
against silent regressions.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
cv added a commit that referenced this pull request Apr 28, 2026
)

## Summary

NemoClaw's sandbox create stream only recognized the legacy Docker
builder format, so BuildKit output would not be treated as active build
progress once OpenShell emits it.

This adds BuildKit progress markers to the same parser path as the
existing legacy builder output. It keeps the current legacy behavior and
makes `#1 [internal] ...`, `#2 CACHED`, and `#3 DONE ...` visible as
build progress.

## Changes

- `src/lib/sandbox-create-stream.ts`: recognize BuildKit step and
completion lines while tracking the build phase.
- `src/lib/sandbox-create-stream.test.ts`: cover BuildKit progress
output and verify it is streamed to the user.

## Testing

- `npm run build:cli` passed
- `npm run typecheck:cli` passed
- `npm test -- src/lib/sandbox-create-stream.test.ts` passed
- `npm test` was also attempted. The full suite is not green on current
main in this environment; failures are in existing
installer/onboard/legacy-guard tests outside this change.

## Evidence it works

The new focused test feeds BuildKit-style output into
`streamSandboxCreate` and verifies that the lines are logged, collected
in output, and mark sandbox creation as having seen progress.

Fixes #2311

Signed-off-by: Deepak Jain <deepujain@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection and display of BuildKit and upload progress so
progress markers and completion states are recognized reliably.

* **Refactor**
* Centralized progress-detection logic for more consistent handling of
build and upload output.

* **Tests**
* Added a test ensuring BuildKit-formatted progress lines are captured,
included in output, and reported to the log callback.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
cv added a commit that referenced this pull request Apr 28, 2026
)

## Summary

NemoClaw's sandbox create stream only recognized the legacy Docker
builder format, so BuildKit output would not be treated as active build
progress once OpenShell emits it.

This adds BuildKit progress markers to the same parser path as the
existing legacy builder output. It keeps the current legacy behavior and
makes `#1 [internal] ...`, `#2 CACHED`, and `#3 DONE ...` visible as
build progress.

## Changes

- `src/lib/sandbox-create-stream.ts`: recognize BuildKit step and
completion lines while tracking the build phase.
- `src/lib/sandbox-create-stream.test.ts`: cover BuildKit progress
output and verify it is streamed to the user.

## Testing

- `npm run build:cli` passed
- `npm run typecheck:cli` passed
- `npm test -- src/lib/sandbox-create-stream.test.ts` passed
- `npm test` was also attempted. The full suite is not green on current
main in this environment; failures are in existing
installer/onboard/legacy-guard tests outside this change.

## Evidence it works

The new focused test feeds BuildKit-style output into
`streamSandboxCreate` and verifies that the lines are logged, collected
in output, and mark sandbox creation as having seen progress.

Fixes #2311

Signed-off-by: Deepak Jain <deepujain@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection and display of BuildKit and upload progress so
progress markers and completion states are recognized reliably.

* **Refactor**
* Centralized progress-detection logic for more consistent handling of
build and upload output.

* **Tests**
* Added a test ensuring BuildKit-formatted progress lines are captured,
included in output, and reported to the log callback.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
ericksoa added a commit that referenced this pull request Apr 29, 2026
## Summary

Upgrades OpenClaw from **2026.4.9** to **2026.4.24** (latest stable,
CalVer).

### Fixes in this PR

1. **Version bumps** — `Dockerfile.base`,
`nemoclaw-blueprint/blueprint.yaml`, `agents/openclaw/manifest.yaml`,
`src/lib/sandbox-version.test.ts`.
2. **Patch 4 updated** — OpenClaw 2026.4.24 restructured
`replaceConfigFile` to first attempt
`tryWriteSingleTopLevelIncludeMutation` (writes to a `$include` file
like `plugins.json5`) before falling back to `writeConfigFile`. The old
patch matched an exact tab-indented `writeConfigFile(params.nextConfig,
{...})` string that no longer exists. Updated to match the new `if
(!await tryWriteSingleTopLevelIncludeMutation(...)) await
writeConfigFile(...)` block and wrap the entire write path in the
OPENSHELL_SANDBOX-gated EACCES try/catch.
3. **`plugin-runtime-deps` symlink** — OpenClaw 2026.4.24 introduced
lazy plugin runtime-dep installation (Jiti loader). The CLI writes to
`~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/` on first
invocation. NemoClaw locks `/sandbox/.openclaw` to `444 root:root`, so
every bundled provider failed to load with `EACCES`. Fix: created the
dir in the writable `.openclaw-data` tree and symlinked it from the
immutable config tree, mirroring the existing pattern used for `logs`,
`credentials`, `extensions`, etc. Added in both `Dockerfile.base`
(canonical) and `Dockerfile` (idempotent fixup for stale GHCR base).
4. **Selective sandbox safety-net** — `_SANDBOX_SAFETY_NET` (a Node
`--require` preload from `nemoclaw-start.sh`) used to be a catch-all
swallow + `process.exit` interceptor. Rewritten to: (a) gate to gateway
processes only (`OPENSHELL_SANDBOX=1` + `argv[2]==='gateway'`) so CLI
commands keep default Node crash behaviour; (b) match documented
known-benign patterns (currently `ciao`/mDNS — produced when bonjour's
probe state machine cancels itself, since the sandbox netns has no
multicast); (c) for unknown errors, log full stack but keep gateway
alive (gateway is shared infrastructure, user-initiated actions must not
take it down); (d) drop `process.exit` interception entirely. The CIAO
guard's `uncaughtException` listener was similarly gated to gateway
processes — registering one in CLI processes turns Node's default
crash-on-uncaught into silent absorb, which would silently hang
`openclaw agent`.
5. **Disable bonjour and qqbot bundled plugins** — both ship
enabled-by-default in 2026.4.24 and break in the sandbox netns:
- **bonjour**: introduced in 2026.4.15, uses `@homebridge/ciao` for mDNS
announcement. Sandbox netns has no multicast — ciao's probe state
machine fails at startup.
- **qqbot**: has `stageRuntimeDependencies=true`, so its npm deps
(`@tencent-connect/qqbot-connector`, `silk-wasm`, etc.) install on first
load. The sandbox L7 proxy denies the registry URL with `403
policy_denied`, the install retries for ~6 minutes, and while channel
loading is stuck the gateway can't service `openclaw agent` requests.
Both disabled via `plugins.entries.<id>.enabled = false` in
`scripts/generate-openclaw-config.py`.
6. **Build-context fix for `generate-openclaw-config.py`** — main's PR
#2449 (commit `f5ee8a4d`) extracted the inline Python config-generator
from Dockerfile into `scripts/generate-openclaw-config.py` and added
`COPY scripts/generate-openclaw-config.py …` to Dockerfile, but did not
update `src/lib/sandbox-build-context.ts` which curates the optimized
build context for sandbox image builds. Without this, every nightly E2E
job (and any sandbox onboard) fails with `COPY failed: file not found in
build context`. Added the file to `stageOptimizedSandboxBuildContext()`
next to `nemoclaw-start.sh` and added a test assertion so the staging
stays in sync.

### Status

Most recent un-rate-limited run (25015126555 with build-context fix):
**13 of 18 jobs pass**. `sandbox-operations-e2e` still fails — only
TC-SBX-02 (Connect & Chat) within it. All other TC-SBX cases (03, 04,
05, 06, 07, 08, 10, 11, 12) pass on `test-sbx-a`, confirming the gateway
is functional. After the `sandbox-build-context.ts` fix and the qqbot
disable, the failure mode of TC-SBX-02 changed from `SSH command timed
out after 60s` to `Expected '42' in agent reply; reply=''` — same 60-90
second hang but now hitting the test's outer `run_with_timeout` rather
than producing a stack trace. The test drops stderr (`2>/dev/null`), and
the gateway-log streamer/snapshot infrastructure has been unable to
capture `test-sbx-a`'s `/tmp/openclaw-998/openclaw-*.log` reliably (the
post-test openshell state has no active gateway after TC-SBX-06's docker
kill, and the streamer's connection to test-sbx-a races and gets
`Connection refused`). Still root-causing.

### Notable upstream changes (2026.4.9 → 2026.4.24)

- Google Meet bundled plugin, DeepSeek V4 Flash/Pro, realtime voice
loops (Talk/Voice Call/Google Meet), Gemini Live, browser automation
improvements.
- Lighter startup: static model catalogs, manifest-backed model rows,
**lazy provider dependencies** (the new plugin-runtime-deps mechanism —
root cause of fix #3).
- **Breaking:** Plugin SDK tool-result transforms migrated from
`registerEmbeddedExtensionFactory()` to
`registerAgentToolResultMiddleware()` — verified NemoClaw uses neither.
- **Breaking:** Plugin registry migrated from `plugins.installs` config
key to managed `plugins/installs.json` ledger — `openclaw doctor --fix`
migrates automatically.
- Config writes restructured to use single-file `$include` mutations
before falling back to full config write (root cause of fix #2).
- CVE-2026-41349, CVE-2026-22181 fixes; exec-approvals chat enablement
(2026.4.22); cron `jobs-state.json` separation (2026.4.20).
- bonjour mDNS plugin added in 2026.4.15 (root cause of fix #5a).

### User sandbox state migration on rebuild

Existing user sandboxes upgrade via `nemoclaw <name> rebuild`. State
(memory/, workspace/, agents/, extensions/, etc.) is backed up via tar,
sandbox is destroyed and recreated with the new image, state is
restored, `openclaw doctor --fix` runs post-restore.

**Handled automatically:** memory, cron job definitions, plugin
auto-discovery, plugin registry migration. **Existing reset behavior
(not new):** exec-approvals, credentials, device pairing. **New minor
behavior change:** cron runtime state (`jobs-state.json`) absent in
pre-2026.4.20 backups — job execution history resets, jobs may re-fire
once after upgrade.

## Test plan

- [x] CI lint, typecheck, unit tests pass
- [x] Docker base image and sandbox image build with all dist patches
applied
- [x] 13/18 nightly E2E jobs pass cleanly with all six fixes
- [ ] **TC-SBX-02** — root cause for the residual `reply=''` hang under
investigation; the gateway-log capture infrastructure needs to work
reliably post-test before we can read what's happening server-side
- [ ] Manual smoke test via `nemoclaw <sandbox> connect` interactive
flow
- [ ] Rebuild test: existing 2026.4.9 sandbox → rebuild → verify state
preserved (rebuild-openclaw-e2e covers this)
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
… (NVIDIA#2404)

## Summary

NemoClaw's sandbox create stream only recognized the legacy Docker
builder format, so BuildKit output would not be treated as active build
progress once OpenShell emits it.

This adds BuildKit progress markers to the same parser path as the
existing legacy builder output. It keeps the current legacy behavior and
makes `NVIDIA#1 [internal] ...`, `NVIDIA#2 CACHED`, and `NVIDIA#3 DONE ...` visible as
build progress.

## Changes

- `src/lib/sandbox-create-stream.ts`: recognize BuildKit step and
completion lines while tracking the build phase.
- `src/lib/sandbox-create-stream.test.ts`: cover BuildKit progress
output and verify it is streamed to the user.

## Testing

- `npm run build:cli` passed
- `npm run typecheck:cli` passed
- `npm test -- src/lib/sandbox-create-stream.test.ts` passed
- `npm test` was also attempted. The full suite is not green on current
main in this environment; failures are in existing
installer/onboard/legacy-guard tests outside this change.

## Evidence it works

The new focused test feeds BuildKit-style output into
`streamSandboxCreate` and verifies that the lines are logged, collected
in output, and mark sandbox creation as having seen progress.

Fixes NVIDIA#2311

Signed-off-by: Deepak Jain <deepujain@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection and display of BuildKit and upload progress so
progress markers and completion states are recognized reliably.

* **Refactor**
* Centralized progress-detection logic for more consistent handling of
build and upload output.

* **Tests**
* Added a test ensuring BuildKit-formatted progress lines are captured,
included in output, and reported to the log callback.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary

Upgrades OpenClaw from **2026.4.9** to **2026.4.24** (latest stable,
CalVer).

### Fixes in this PR

1. **Version bumps** — `Dockerfile.base`,
`nemoclaw-blueprint/blueprint.yaml`, `agents/openclaw/manifest.yaml`,
`src/lib/sandbox-version.test.ts`.
2. **Patch 4 updated** — OpenClaw 2026.4.24 restructured
`replaceConfigFile` to first attempt
`tryWriteSingleTopLevelIncludeMutation` (writes to a `$include` file
like `plugins.json5`) before falling back to `writeConfigFile`. The old
patch matched an exact tab-indented `writeConfigFile(params.nextConfig,
{...})` string that no longer exists. Updated to match the new `if
(!await tryWriteSingleTopLevelIncludeMutation(...)) await
writeConfigFile(...)` block and wrap the entire write path in the
OPENSHELL_SANDBOX-gated EACCES try/catch.
3. **`plugin-runtime-deps` symlink** — OpenClaw 2026.4.24 introduced
lazy plugin runtime-dep installation (Jiti loader). The CLI writes to
`~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/` on first
invocation. NemoClaw locks `/sandbox/.openclaw` to `444 root:root`, so
every bundled provider failed to load with `EACCES`. Fix: created the
dir in the writable `.openclaw-data` tree and symlinked it from the
immutable config tree, mirroring the existing pattern used for `logs`,
`credentials`, `extensions`, etc. Added in both `Dockerfile.base`
(canonical) and `Dockerfile` (idempotent fixup for stale GHCR base).
4. **Selective sandbox safety-net** — `_SANDBOX_SAFETY_NET` (a Node
`--require` preload from `nemoclaw-start.sh`) used to be a catch-all
swallow + `process.exit` interceptor. Rewritten to: (a) gate to gateway
processes only (`OPENSHELL_SANDBOX=1` + `argv[2]==='gateway'`) so CLI
commands keep default Node crash behaviour; (b) match documented
known-benign patterns (currently `ciao`/mDNS — produced when bonjour's
probe state machine cancels itself, since the sandbox netns has no
multicast); (c) for unknown errors, log full stack but keep gateway
alive (gateway is shared infrastructure, user-initiated actions must not
take it down); (d) drop `process.exit` interception entirely. The CIAO
guard's `uncaughtException` listener was similarly gated to gateway
processes — registering one in CLI processes turns Node's default
crash-on-uncaught into silent absorb, which would silently hang
`openclaw agent`.
5. **Disable bonjour and qqbot bundled plugins** — both ship
enabled-by-default in 2026.4.24 and break in the sandbox netns:
- **bonjour**: introduced in 2026.4.15, uses `@homebridge/ciao` for mDNS
announcement. Sandbox netns has no multicast — ciao's probe state
machine fails at startup.
- **qqbot**: has `stageRuntimeDependencies=true`, so its npm deps
(`@tencent-connect/qqbot-connector`, `silk-wasm`, etc.) install on first
load. The sandbox L7 proxy denies the registry URL with `403
policy_denied`, the install retries for ~6 minutes, and while channel
loading is stuck the gateway can't service `openclaw agent` requests.
Both disabled via `plugins.entries.<id>.enabled = false` in
`scripts/generate-openclaw-config.py`.
6. **Build-context fix for `generate-openclaw-config.py`** — main's PR
NVIDIA#2449 (commit `f5ee8a4d`) extracted the inline Python config-generator
from Dockerfile into `scripts/generate-openclaw-config.py` and added
`COPY scripts/generate-openclaw-config.py …` to Dockerfile, but did not
update `src/lib/sandbox-build-context.ts` which curates the optimized
build context for sandbox image builds. Without this, every nightly E2E
job (and any sandbox onboard) fails with `COPY failed: file not found in
build context`. Added the file to `stageOptimizedSandboxBuildContext()`
next to `nemoclaw-start.sh` and added a test assertion so the staging
stays in sync.

### Status

Most recent un-rate-limited run (25015126555 with build-context fix):
**13 of 18 jobs pass**. `sandbox-operations-e2e` still fails — only
TC-SBX-02 (Connect & Chat) within it. All other TC-SBX cases (03, 04,
05, 06, 07, 08, 10, 11, 12) pass on `test-sbx-a`, confirming the gateway
is functional. After the `sandbox-build-context.ts` fix and the qqbot
disable, the failure mode of TC-SBX-02 changed from `SSH command timed
out after 60s` to `Expected '42' in agent reply; reply=''` — same 60-90
second hang but now hitting the test's outer `run_with_timeout` rather
than producing a stack trace. The test drops stderr (`2>/dev/null`), and
the gateway-log streamer/snapshot infrastructure has been unable to
capture `test-sbx-a`'s `/tmp/openclaw-998/openclaw-*.log` reliably (the
post-test openshell state has no active gateway after TC-SBX-06's docker
kill, and the streamer's connection to test-sbx-a races and gets
`Connection refused`). Still root-causing.

### Notable upstream changes (2026.4.9 → 2026.4.24)

- Google Meet bundled plugin, DeepSeek V4 Flash/Pro, realtime voice
loops (Talk/Voice Call/Google Meet), Gemini Live, browser automation
improvements.
- Lighter startup: static model catalogs, manifest-backed model rows,
**lazy provider dependencies** (the new plugin-runtime-deps mechanism —
root cause of fix NVIDIA#3).
- **Breaking:** Plugin SDK tool-result transforms migrated from
`registerEmbeddedExtensionFactory()` to
`registerAgentToolResultMiddleware()` — verified NemoClaw uses neither.
- **Breaking:** Plugin registry migrated from `plugins.installs` config
key to managed `plugins/installs.json` ledger — `openclaw doctor --fix`
migrates automatically.
- Config writes restructured to use single-file `$include` mutations
before falling back to full config write (root cause of fix NVIDIA#2).
- CVE-2026-41349, CVE-2026-22181 fixes; exec-approvals chat enablement
(2026.4.22); cron `jobs-state.json` separation (2026.4.20).
- bonjour mDNS plugin added in 2026.4.15 (root cause of fix #5a).

### User sandbox state migration on rebuild

Existing user sandboxes upgrade via `nemoclaw <name> rebuild`. State
(memory/, workspace/, agents/, extensions/, etc.) is backed up via tar,
sandbox is destroyed and recreated with the new image, state is
restored, `openclaw doctor --fix` runs post-restore.

**Handled automatically:** memory, cron job definitions, plugin
auto-discovery, plugin registry migration. **Existing reset behavior
(not new):** exec-approvals, credentials, device pairing. **New minor
behavior change:** cron runtime state (`jobs-state.json`) absent in
pre-2026.4.20 backups — job execution history resets, jobs may re-fire
once after upgrade.

## Test plan

- [x] CI lint, typecheck, unit tests pass
- [x] Docker base image and sandbox image build with all dist patches
applied
- [x] 13/18 nightly E2E jobs pass cleanly with all six fixes
- [ ] **TC-SBX-02** — root cause for the residual `reply=''` hang under
investigation; the gateway-log capture infrastructure needs to work
reliably post-test before we can read what's happening server-side
- [ ] Manual smoke test via `nemoclaw <sandbox> connect` interactive
flow
- [ ] Rebuild test: existing 2026.4.9 sandbox → rebuild → verify state
preserved (rebuild-openclaw-e2e covers this)
jyaunches added a commit that referenced this pull request May 6, 2026
- Fix recovery scripts in agent-runtime.ts that still used curl -sf
  on / instead of the new HTTP status code pattern on /health (#3)
- Add device-auth-health-e2e to scorecard.needs (#8)
- Use openshell-${SANDBOX_NAME} SSH host alias in E2E test (#7)
cjagwani pushed a commit that referenced this pull request May 14, 2026
Resolve the two output threads in #3456 left after the core dead-loop fix
landed via #3459 + #3434:

Sub-bug #3 — `src/lib/onboard.ts` printed
  `nemoclaw <name> destroy --yes && nemoclaw onboard --gpu`
with a literal `<name>` placeholder, and assumed at least one sandbox
was registered. When the GPU-passthrough mismatch hit on the State B
re-run path with an empty registry (the dead-loop case), the hint was
not actionable. Replace with a registry-aware helper at
`src/lib/onboard/gpu-recovery.ts` that renders the right shape:
  - empty registry → suggest `nemoclaw uninstall && nemoclaw onboard --gpu`
  - one sandbox → suggest destroy --yes --cleanup-gateway for that name
  - multiple sandboxes → list each, only the last gets --cleanup-gateway

Sub-bug #4 — `src/lib/actions/uninstall/run-plan.ts` printed
  `Destroyed gateway 'nemoclaw' skipped`
when the openshell destroy no-op'd (gateway already gone) — the
"Destroyed … skipped" wording was self-contradictory. Extend
`runOptional` with an `onSkip` option; route the gateway destroy to
emit `Gateway 'nemoclaw' already removed or unreachable` on no-op.

Tests:
- `src/lib/onboard/gpu-recovery.test.ts` (6 tests): forbid literal
  `<name>` placeholder anywhere in the output; cover empty / single /
  multi-sandbox cases; defensive filter on whitespace names so a
  `nemoclaw  destroy` rendering can never happen.
- `src/lib/actions/uninstall/run-plan.test.ts`: assert the new
  "already removed or unreachable" wording and the absence of the
  "Destroyed gateway 'nemoclaw' skipped" string.

The core dead loop itself (sub-bugs #1, #2 and State B GPU mismatch)
is already addressed by #3459 + #3434 + #3483; #3456 will close once
this lands. See the #3456 status comment for the full mapping.

Refs #3456. Mirrors (and tightens) the approach in the closed PR #3464,
which left the literal `<name>` placeholder in tests per CodeRabbit
feedback that was never addressed.

Signed-off-by: Charan Jagwani <charjags100@gmail.com>
cv added a commit that referenced this pull request May 14, 2026
…3520)

> **Draft for visibility.** Issue-autopilot Stages 4-5 of #3456. Will
mark ready once batch self-review + CI complete.

## Summary

Closes the two remaining output threads in #3456 after the core
dead-loop fix already landed on `main` (via #3459, #3434, #3483). Full
sub-bug mapping in the [#3456 status
comment](#3456 (comment)).

- **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint
replaced with a registry-aware helper.
- **Sub-bug #4** — `Destroyed gateway 'nemoclaw' skipped`
self-contradictory wording replaced with `Gateway 'nemoclaw' already
removed or unreachable`.

## Acceptance criteria mapping

| Sub-bug | Resolution | Evidence |
|---|---|---|
| #1 dead loop | Already fixed on main (#3459) | out of scope |
| #2 firewall diagnostic | Already fixed on main (#3459) | out of scope
|
| **#3** literal `<name>` placeholder | **This PR** |
`src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` |
| **#4** misleading "skipped" wording | **This PR** |
`src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` |
| #5 uninstall residuals | Already fixed on main (#3483) | out of scope
|

## Behavior matrix

`gpuPassthroughRecoveryLines(names)`:

| Input | Suggestion |
|---|---|
| `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` |
| one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway &&
nemoclaw onboard --gpu` |
| many sandboxes | each `destroy --yes`, only the last gets
`--cleanup-gateway` |

## Test plan

```
npm run typecheck:cli
npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts
```

22 tests pass (6 new + 16 existing).

## Notes for reviewers

- This is the work [#3464
attempted](#3464); that PR was
closed without merging after CodeRabbit asked for the `<name>`
placeholder to be forbidden in tests via negative assertion. This PR
adopts that refinement.
- `runOptional` extension is backwards-compatible — existing callers
without `onSkip` get the original wording.

Closes #3456 once merged.

---------

Signed-off-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
jyaunches added a commit that referenced this pull request May 28, 2026
)

Advisor finding: scenarios with expectedFailure metadata declared
phase/errorClass/forbiddenSideEffects, but nothing in the typed
runner inspected observed phase results to verify the right phase
failed for the right reason. A scenario named
ubuntu-no-docker-preflight-negative could fail because DNS broke and
the run would still show 'failed' without catching the mismatch.

Add framework-owned negative-scenario contract verification, in the
spirit of redaction.ts and context.ts (typed orchestrator infra,
not shell):

- types.ts: ExpectedFailureContract typed shape replaces the prior
  Record<string, unknown> on ScenarioDefinition.expectedFailure and
  RunPlan.expectedFailure. Adds ExpectedFailurePhase
  (PhaseName | 'preflight') so manifests speak the user vocabulary
  while internal PhaseName stays narrow. Adds NegativeContractPhase
  / PhaseResultName so the synthetic phase result the runner emits
  cannot accidentally be declared by a scenario builder.

- orchestrators/negative-matcher.ts (new): pure function
  evaluateNegativeContract(plan, results) returning NegativeContractResult
  with outcome in {matched, no-failure-observed, wrong-phase,
  wrong-error-class}. Resolves expected.phase='preflight' to the
  onboarding orchestrator (where preflight assertions live).
  Substring-with-case-fold, separator-tolerant errorClass match.
  Excludes the runtime side-effect probe step from observed-failure
  detection so the matcher is not confused by its own enforcement
  scaffolding.

- orchestrators/runner.ts: after phases run, if plan.expectedFailure
  is set, call evaluateNegativeContract and append a synthetic
  PhaseResult with phase='negative-contract'. Emits
  .e2e/negative-contract.json artifact alongside per-phase results.
  Positive scenarios are untouched.

- run.ts: planFailed() consults the synthetic contract phase for
  negative scenarios. A negative scenario is green iff the contract
  matched AND the runtime control group's required no-side-effects
  step passed. Until the forbidden-side-effect probe lands the
  required pending step keeps that piece red, so matched-failure-mode
  alone still cannot flip a negative scenario green.

- builder.ts / scenarios/baseline.ts: thread the typed contract
  through the builder API and the canonical input shape.

- 15 new tests in e2e-negative-matcher.test.ts cover: matched,
  preflight->onboarding mapping, no-failure-observed, wrong-phase,
  wrong-errorClass, side-effect probe step ignored, case-insensitive
  matching, runner integration (matched + mismatched + positive
  unaffected), registry contract (every negative scenario opts into
  the side-effect probe step), and compiler validation rejects bad
  shapes.

Spec ownership boundaries kept honest:
- Failure injection (uninstalling docker, planting a bad key,
  occupying a port) stays runner-environment prep, not framework
  code. Matcher only inspects observed results.
- Forbidden-side-effect verification stays the
  expectedFailureNoSideEffectsProbe's job. The matcher reports
  phase + errorClass independently; the required pending step from
  cc6b7a2 keeps the side-effect axis visibly red until the probe
  lands.

354 framework tests pass (15 new). tsc clean.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
cjagwani added a commit to yimoj/NemoClaw that referenced this pull request Jun 4, 2026
NVIDIA#4538)

Two acceptance gaps from NVIDIA#4538 not closed by the original PR:

- Troubleshooting: contrast the three relevant perm states (mutable
  default 2770/660, shields-up locked 444/755 root, the 700/600 drift)
  so issue ask NVIDIA#3, "docs cover both mutable and locked-down", is
  actually answered. Prior copy only documented mutable.

- mutable-config-perms: explain why NemoClaw uses 2770/660 vs the
  issue's expected 2775/664 (gateway shares the sandbox group, so the
  "other" bit is intentionally dropped). The predicate test already
  rejects 664; the rationale belonged in code.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cv cv mentioned this pull request Jun 5, 2026
12 tasks
jyaunches added a commit that referenced this pull request Jun 10, 2026
Pass 1 of phase-5 convergence on dispatch 27283289318. First blocking
assertion: openshell-installed (#3 in chain), failing with
`spawn openshell ENOENT`.

Root cause: shell-probe.ts:127-129 \u2014 when inheritEnv is unset, the child
gets only options.env. My prereq probes passed no env, so spawn had no
PATH to resolve `openshell` against. The onboard runs worked only because
they used inheritEnv:true.

Fix: use buildAvailabilityProbeEnv() everywhere (framework allowlist
incl. PATH/HOME/CI; explicitly excludes NVIDIA_API_KEY). Layer the secret
explicitly only on the first onboard; the resume run's env deliberately
omits it to test credential hydration from the session file \u2014 this is
the typed expression of the bash test's `env -u NVIDIA_API_KEY` invariant.

Also drops inheritEnv:true on the onboard runs in favor of the same
allowlist composition pattern, matching OnboardingPhaseFixture.commandEnv.

Refs: #4348, #5098
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants