fix(sandbox): fix non-root gateway startup and add crash safety net by ericksoa · Pull Request #2472 · NVIDIA/NemoClaw

ericksoa · 2026-04-25T04:59:59Z

Summary

Fixes a 5-day outage where the gateway never started in non-root sandbox mode (Brev Launchable, no-new-privileges containers). Also adds a global safety net preventing any npm library crash from killing the gateway.

Changes

Entrypoint Landlock tolerance (scripts/nemoclaw-start.sh, scripts/lib/sandbox-init.sh)

install_configure_guard: wrap all .bashrc/.profile writes in || true — the [ -w file ] test passes (DAC) but Landlock blocks the actual write, crashing the entrypoint under set -e
lock_rc_files: || true on chmod calls
validate_tmp_permissions: expect 644 for gateway.log
Root cause: commit 20407589 (Apr 20) added install_configure_guard which writes to Landlock-protected files. Every non-root sandbox since then had a dead gateway.

Global sandbox safety net (scripts/nemoclaw-start.sh)

New sandbox-safety-net.js preload — catches ALL uncaught exceptions and unhandled rejections in sandbox mode (OPENSHELL_SANDBOX=1), logs them, and continues
Intercepts process.exit() during swallowed rejection delivery so OpenClaw's own handler (which calls process.exit(1) for non-transient errors) doesn't kill the gateway
First --require preload so handlers register before any library code

ciao network guard (scripts/nemoclaw-start.sh)

Targeted guard for @homebridge/ciao mDNS library crash (os.networkInterfaces() → uv_interface_addresses SystemError in restricted namespaces)
Monkey-patches os.networkInterfaces to return {} on failure

Slack guard improvements (scripts/nemoclaw-start.sh)

Include Slack guard in proxy-env.sh for connect sessions
Gateway.log changed from 600 to 644 for diagnostic readability

Per-channel health monitor disable (Dockerfile)

Set healthMonitor.enabled: false on each messaging channel account
Prevents OpenClaw's health monitor from killing the gateway after 120s channel-connect-grace when a channel has placeholder tokens

Remove cloud-experimental-e2e (.github/workflows/nightly-e2e.yaml)

Has been failing since March 31, wastes API tokens

Test plan

shellcheck clean
91/91 nemoclaw-start.test.ts pass
Nightly E2E: messaging-providers Phase 7 S1+S2 pass (gateway survives Slack auth failure)
Nightly E2E: sandbox-survival, skip-permissions, cloud-e2e pass

Summary by CodeRabbit

Bug Fixes
- Improved sandbox crash resilience with new runtime guards, deterministic preload ordering, and selective guard activation to reduce unexpected failures.
- Shell rc-file snippet installation and locking are now best-effort so setup proceeds when modifying user rc files fails.
Chores
- Relaxed /tmp log permissions and extended tmp-permission validation to cover new guard artifacts.
- Disabled health monitoring for messaging channel accounts.
Tests
- Reduced noisy diagnostics in an e2e test, broadened matching in unit tests to avoid false negatives, and removed an experimental nightly CI job.

…og readable Two fixes for the messaging-providers-e2e Phase 7 Slack guard test that has never passed since #2355: 1. Add the Slack channel guard to the proxy-env.sh sourced file so interactive sessions (openshell sandbox connect/exec) see the guard in NODE_OPTIONS. The guard file is installed after proxy-env.sh is written, so use a runtime conditional ([ -f ... ]) in the sourced script. This fixes the misleading diagnostic that showed NODE_OPTIONS without the guard. 2. Change gateway.log permissions from 600 to 644 so E2E diagnostics (openshell sandbox exec -- cat /tmp/gateway.log) can read the log without being the gateway user. The log doesn't contain secrets.

coderabbitai · 2026-04-25T05:00:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1f0e4f18-ad46-4b20-84a2-8906519f496c

📥 Commits

Reviewing files that changed from the base of the PR and between d27f0dd and 36f3f89.

📒 Files selected for processing (1)

scripts/lib/sandbox-init.sh

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/lib/sandbox-init.sh

📝 Walkthrough

Walkthrough

Adds a global Node.js sandbox safety-net preload and a ciao/mDNS guard preload; reorders NODE_OPTIONS (safety-net first, ciao always, Slack guard conditional); makes rc-file locking best-effort; permits /tmp/gateway.log mode 644; extends tmp-permission checks to new preloads; removes nightly cloud-experimental-e2e job; updates tests.

Changes

Cohort / File(s)	Summary
Sandbox startup & preloads `scripts/nemoclaw-start.sh`	Adds global safety-net preload (`--require $_SANDBOX_SAFETY_NET`) installing `uncaughtException`/`unhandledRejection` handlers and temporarily intercepting `process.exit()` for swallowed rejections. Adds ciao/mDNS guard preload (`--require $_CIAO_GUARD_SCRIPT`) that monkey-patches `os.networkInterfaces()` to return `{}` on failure. Reorders NODE_OPTIONS: safety-net first, ciao guard always, Slack guard only if artifact exists.
Tmp permissions & rc locking `scripts/lib/sandbox-init.sh`	Validates `/tmp/gateway.log` accepts `600` or `644`; keeps `/tmp/auto-pair.log` at `600`. Extends tmp-permission validation to include safety-net, ciao guard, and slack guard preload artifacts. Makes rc-file chmod/locking best-effort (suppressing non-fatal failures and emitting warnings).
E2E & unit tests `test/e2e/test-messaging-providers.sh`, `test/nemotron-inference-fix.test.ts`, `test/http-proxy-fix-sync.test.ts`	E2E: simplifies header text, retains Slack guard verification and NODE_OPTIONS print, removes HTTP-proxy artifact diagnostics and sandbox process list print. Unit tests: relax regex matchers for `validate_tmp_permissions` invocation patterns to be more permissive.
CI workflow `.github/workflows/nightly-e2e.yaml`	Removes `cloud-experimental-e2e` job and updates `notify-on-failure` dependencies to exclude it.
Docker config `Dockerfile`	Generated `openclaw.json` now sets `accounts.default.healthMonitor.enabled = False` for messaging channels.
Misc tmp artifacts `/tmp/...` artifacts referenced	Validation extended to cover safety-net, ciao guard, and slack guard preload files; `/tmp/gateway.log` mode acceptance updated to include `644`.

Sequence Diagram(s)

sequenceDiagram
    participant Shell as User Shell
    participant Start as scripts/nemoclaw-start.sh
    participant Node as Node.js Process
    participant Safety as Safety-net Preload
    participant Ciao as Ciao/mDNS Guard
    participant App as Application
    Note over Start,Node: Build NODE_OPTIONS (safety-net first,\nciao guard always, slack guard if present)
    Shell->>Start: launch sandbox (OPENSHELL_SANDBOX=1)
    Start->>Node: exec node with NODE_OPTIONS (preloads)
    Node->>Safety: require safety-net preload
    Safety->>Node: install uncaughtException / unhandledRejection handlers
    Node->>Ciao: require ciao guard preload
    Ciao->>Node: monkey-patch os.networkInterfaces() to safe-return {}
    Node->>App: load and run application
    App-->>Safety: runtime error / unhandled rejection
    Safety-->>App: swallow/log error and optionally prevent exit
    App->>Node: normal or recovered termination

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

"I hop where preloads weave and play,
I shield the node through night and day.
Logs tamed, rc locks softened light,
Ciao made quiet, Slack guards in sight.
🐰✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: fixing non-root gateway startup issues and adding a crash safety net mechanism for the sandbox.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/slack-guard-loading

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1399-1404: Change the permission of /tmp/gateway.log to match the
validator expectations: replace the chmod 644 call with chmod 600 so the file
created in nemoclaw-start.sh remains owned by gateway:gateway and is only
readable by owner; ensure this aligns with validate_tmp_permissions (the sandbox
tmp-permissions validator) and keep the existing touch and chown/gateway
ownership calls intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 22e388c4-ba43-4f37-8cb1-e2b31e8434dd

📥 Commits

Reviewing files that changed from the base of the PR and between 6966f4b and 815e2d7.

📒 Files selected for processing (1)

scripts/nemoclaw-start.sh

The guard file doesn't exist in the sandbox even though openclaw.json should contain "slack". Add logging to install_slack_channel_guard when the grep fails (reports file existence/readability) and add E2E diagnostics to check the grep result and container logs for guard skip/install messages.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/nemoclaw-start.sh (1)
1403-1408: ⚠️ Potential issue | 🔴 Critical

chmod 644 still conflicts with the tmp-permissions validator.

The root path still runs validate_tmp_permissions before launch, so leaving Line 1408 at 644 will fail startup if that validator still requires /tmp/gateway.log to stay owner-only. Either keep this file at 600, or update the validator and every dependent expectation together.

You can verify the mismatch with:
#!/usr/bin/env bash
set -euo pipefail

echo "== gateway.log permissions in startup scripts =="
rg -n -C2 '/tmp/gateway\.log|chmod 644|chmod 600' \
  scripts/nemoclaw-start.sh \
  scripts/lib/sandbox-init.sh \
  agents/hermes/start.sh

echo
echo "== validate_tmp_permissions implementation =="
rg -n -A60 -B5 'validate_tmp_permissions\s*' scripts/lib/sandbox-init.sh
Expected result: scripts/nemoclaw-start.sh shows chmod 644, while scripts/lib/sandbox-init.sh still documents or enforces 600 for /tmp/gateway.log.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 1403 - 1408, The startup script
creates /tmp/gateway.log with chmod 644 which conflicts with the existing
validate_tmp_permissions logic (validate_tmp_permissions in
scripts/lib/sandbox-init.sh) that expects owner-only perms; change the chmod in
the block where /tmp/gateway.log is touched/chowned in scripts/nemoclaw-start.sh
from 644 to 600 to match the validator (or alternatively, if you intend to relax
the validator, update validate_tmp_permissions and all dependent expectations
(including agents/hermes/start.sh and any tests) together), ensuring the file
remains owned by gateway:gateway and permission checks in
validate_tmp_permissions still pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 675-676: Replace the grep+head pipeline with grep's match-limit
option: in the command that sets the container_log variable (the line using
nemoclaw "$SANDBOX_NAME" logs ... | grep -i "channel
guard\|slack.*guard\|guard.*skip\|guard.*install" | head -5 || echo "no guard
messages"), remove the pipe to head and add grep -m 5 to limit matches; preserve
the case-insensitive -i and the trailing || echo fallback so container_log still
falls back to "no guard messages" when there are no matches.

---

Duplicate comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1403-1408: The startup script creates /tmp/gateway.log with chmod
644 which conflicts with the existing validate_tmp_permissions logic
(validate_tmp_permissions in scripts/lib/sandbox-init.sh) that expects
owner-only perms; change the chmod in the block where /tmp/gateway.log is
touched/chowned in scripts/nemoclaw-start.sh from 644 to 600 to match the
validator (or alternatively, if you intend to relax the validator, update
validate_tmp_permissions and all dependent expectations (including
agents/hermes/start.sh and any tests) together), ensuring the file remains owned
by gateway:gateway and permission checks in validate_tmp_permissions still pass.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93f07b20-4343-495f-8a3d-7cbbac4c1d43

📥 Commits

Reviewing files that changed from the base of the PR and between 815e2d7 and 2a26f26.

📒 Files selected for processing (2)

scripts/nemoclaw-start.sh
test/e2e/test-messaging-providers.sh

nemoclaw logs reads /tmp/gateway.log, not container stderr. The entrypoint guard messages go to stderr (Docker logs). Try openshell sandbox logs and docker logs directly to find guard installation messages.

coderabbitai

♻️ Duplicate comments (1)

test/e2e/test-messaging-providers.sh (1)

675-680: ⚠️ Potential issue | 🟡 Minor

Avoid false “no guard messages” fallbacks under pipefail.

With set -o pipefail (Line 58), grep ... | head -10 can return non-zero (SIGPIPE on grep) after matching lines, which incorrectly triggers the || echo ... fallback. This can hide real guard diagnostics.

Suggested fix

-    container_log=$(openshell sandbox logs --name "$SANDBOX_NAME" 2>&1 | grep -i "channel guard\|slack.*guard\|guard.*skip\|guard.*install\|\[channels\].*slack\|\[channels\].*guard" | head -10 || echo "no guard messages in openshell logs")
+    container_log=$(openshell sandbox logs --name "$SANDBOX_NAME" 2>&1 | grep -im 10 "channel guard\|slack.*guard\|guard.*skip\|guard.*install\|\[channels\].*slack\|\[channels\].*guard" || echo "no guard messages in openshell logs")
@@
-      docker_log=$(docker logs "$container_id" 2>&1 | grep -i "channel guard\|slack.*guard\|\[channels\]" | head -10 || echo "no guard messages in docker logs")
+      docker_log=$(docker logs "$container_id" 2>&1 | grep -im 10 "channel guard\|slack.*guard\|\[channels\]" || echo "no guard messages in docker logs")

#!/usr/bin/env bash
set -uo pipefail

echo "Repro: grep|head can trigger fallback even when matches exist"
out=$(
  printf 'match\n%.0s' {1..50} \
    | grep -i 'match' \
    | head -10 \
    || echo 'FALLBACK_TRIGGERED'
)
printf '%s\n' "$out"

echo
echo "Control: grep -m avoids SIGPIPE fallback"
out2=$(
  printf 'match\n%.0s' {1..50} \
    | grep -im 10 'match' \
    || echo 'FALLBACK_TRIGGERED'
)
printf '%s\n' "$out2"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 675 - 680, The fallback is
being incorrectly triggered due to SIGPIPE when using "grep ... | head -10"
under pipefail; update the two places that set container_log and docker_log (the
assignments referencing openshell logs and docker logs) to avoid a piped head:
replace the grep|head pipeline with grep -m 10 (use the -m/--max-count option)
so grep stops after 10 matches and won't emit SIGPIPE, ensuring the "|| echo 'no
guard messages...'" fallback only runs when there truly are no matches.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 675-680: The fallback is being incorrectly triggered due to
SIGPIPE when using "grep ... | head -10" under pipefail; update the two places
that set container_log and docker_log (the assignments referencing openshell
logs and docker logs) to avoid a piped head: replace the grep|head pipeline with
grep -m 10 (use the -m/--max-count option) so grep stops after 10 matches and
won't emit SIGPIPE, ensuring the "|| echo 'no guard messages...'" fallback only
runs when there truly are no matches.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2dfb655a-0981-42e1-a90b-d1826e713a26

📥 Commits

Reviewing files that changed from the base of the PR and between 2a26f26 and 7cbf0a9.

📒 Files selected for processing (1)

test/e2e/test-messaging-providers.sh

List all nemoclaw-* and gateway.log files in /tmp to see exactly what the entrypoint created vs what's missing.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/e2e/test-messaging-providers.sh (1)

678-684: ⚠️ Potential issue | 🟡 Minor

Avoid grep | head with pipefail in these log captures.

Line 678 and Line 683 can emit fallback text even when matches exist (SIGPIPE from head makes the pipeline fail under pipefail), which pollutes diagnostics. This was already raised previously in the PR discussion.

Suggested fix

-    container_log=$(openshell sandbox logs --name "$SANDBOX_NAME" 2>&1 | grep -i "channel guard\|slack.*guard\|guard.*skip\|guard.*install\|\[channels\].*slack\|\[channels\].*guard" | head -10 || echo "no guard messages in openshell logs")
+    container_log=$(openshell sandbox logs --name "$SANDBOX_NAME" 2>&1 | grep -im 10 "channel guard\|slack.*guard\|guard.*skip\|guard.*install\|\[channels\].*slack\|\[channels\].*guard" || echo "no guard messages in openshell logs")
@@
-    container_id=$(openshell sandbox exec --name "$SANDBOX_NAME" -- cat /proc/1/cgroup 2>/dev/null | grep -oP '[a-f0-9]{64}' | head -1 || echo "")
+    container_id=$(openshell sandbox exec --name "$SANDBOX_NAME" -- cat /proc/1/cgroup 2>/dev/null | grep -oPm 1 '[a-f0-9]{64}' || echo "")
@@
-      docker_log=$(docker logs "$container_id" 2>&1 | grep -i "channel guard\|slack.*guard\|\[channels\]" | head -10 || echo "no guard messages in docker logs")
+      docker_log=$(docker logs "$container_id" 2>&1 | grep -im 10 "channel guard\|slack.*guard\|\[channels\]" || echo "no guard messages in docker logs")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 678 - 684, The pipelines
that capture logs (the commands assigning container_log and docker_log and the
container_id retrieval) use "grep ... | head -10" which can produce SIGPIPE
under pipefail and return the fallback text; change these to avoid piping into
head (e.g., use grep's -m 10 to limit matches or use a single-tool solution like
awk/sed to select the first 10 matches) so the pipeline won't fail with SIGPIPE.
Update the two places that create container_log and docker_log (and any similar
openshell/docker log captures) to use grep -m 10 or an equivalent single-process
selector to reliably return up to 10 matches without causing a pipe failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 665-667: The current tmp_files assignment calls openshell sandbox
exec with an unquoted glob (/tmp/nemoclaw-*) so the caller shell may expand it;
change the invocation of openshell sandbox exec (the command that sets
tmp_files) to run a shell inside the sandbox (e.g., use sh -c with a
single-quoted command) so ls -la /tmp/nemoclaw-* /tmp/gateway.log is executed
and expanded inside the sandbox; update the tmp_files variable assignment and
keep the surrounding error fallback and the info "  /tmp/nemoclaw-* files:
$tmp_files" unchanged.

---

Duplicate comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 678-684: The pipelines that capture logs (the commands assigning
container_log and docker_log and the container_id retrieval) use "grep ... |
head -10" which can produce SIGPIPE under pipefail and return the fallback text;
change these to avoid piping into head (e.g., use grep's -m 10 to limit matches
or use a single-tool solution like awk/sed to select the first 10 matches) so
the pipeline won't fail with SIGPIPE. Update the two places that create
container_log and docker_log (and any similar openshell/docker log captures) to
use grep -m 10 or an equivalent single-process selector to reliably return up to
10 matches without causing a pipe failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 35e4ab05-ccaa-495b-af0f-d820d3edfff1

📥 Commits

Reviewing files that changed from the base of the PR and between 7cbf0a9 and 79455ed.

📒 Files selected for processing (1)

test/e2e/test-messaging-providers.sh

The /tmp/nemoclaw-* glob was expanding on the host shell before being passed to openshell sandbox exec, showing host files instead of sandbox files. Wrap in bash -c to expand inside the container.

Write breadcrumb timestamps to /tmp/nemoclaw-entrypoint-trace.log at key points: after proxy-env, before root/non-root branch, before each guard install call. Read the trace in the E2E diagnostic. This will show exactly where the entrypoint stops executing.

Root cause: install_configure_guard() tries to write to /sandbox/.bashrc which is Landlock read-only at runtime (#804). With set -e active, the write failure kills the entrypoint before install_slack_channel_guard and the gateway startup ever run. The proxy fix and nemotron fix work because they're installed at top level (before the root/non-root branch). The Slack guard and gateway startup are inside the branch and never execute. Fix: check file writability before attempting the .bashrc update. If the file is read-only (Landlock), skip it gracefully. Also add 2>/dev/null || true to the cat redirect as defense-in-depth.

lock_rc_files() calls chmod 444 on .bashrc/.profile which fails under Landlock. With set -e this kills the entrypoint — same root cause as the install_configure_guard fix. Add || true so it degrades gracefully.

gateway.log was changed from 600 to 644 for diagnostic readability. Update the validate_tmp_permissions check to expect 644 for gateway.log so it doesn't fail and kill the entrypoint under set -e.

The trace still dies before install_configure_guard. Add per-line traces to identify which of verify_config_integrity, apply_model_override, apply_cors_override, apply_slack_token_override, or token generation is the actual failure point.

The [ -w file ] test checks DAC permissions but cannot detect Landlock enforcement. The sandbox user owns .bashrc (DAC says writable) but Landlock blocks the write at kernel level. Under set -e, the failed write kills the entrypoint before the gateway ever starts. Remove the -w guard entirely and wrap every write operation in || true / continue so Landlock failures are silently skipped.

- Change non-root gateway.log to 644 (matching root path) - Add post-launch diagnostic: check if gateway PID is alive after 3s, dump gateway.log contents to trace file if non-empty

The @homebridge/ciao mDNS library calls os.networkInterfaces() which throws SystemError (uv_interface_addresses) inside sandboxes with restricted network namespaces. This crashes the gateway even though mDNS is not needed for NemoClaw operation. Add a NODE_OPTIONS preload that: 1. Monkey-patches os.networkInterfaces to return {} on failure 2. Catches the uncaughtException as a fallback for any call sites that bypass the monkey-patch Installed unconditionally at top level (same pattern as proxy fix and nemotron fix) since any sandbox can hit this.

The OpenClaw gateway health monitor kills the entire gateway process when a messaging channel fails to connect within 120s (the channel-connect-grace). With fake/placeholder Slack tokens, the Slack channel auth always fails, and the health monitor kills the gateway after the grace period — even though the Slack guard successfully caught the initial auth error. Set gateway.channelHealthCheckMinutes to 0 in the baked openclaw.json config, which disables the health monitor entirely. In a NemoClaw sandbox, channel health is not critical — inference, chat, and TUI should continue even if a messaging channel is misconfigured.

Replace the global channelHealthCheckMinutes=0 with per-account healthMonitor.enabled=false on each messaging channel. This prevents the health monitor from killing the gateway when a channel has placeholder tokens, while keeping the global health monitor active for inference and other subsystems. OpenClaw supports per-account overrides via accounts.default.healthMonitor.enabled in the channel config.

Any uncaught exception or unhandled rejection from any npm dependency crashes the gateway, killing inference, chat, and TUI. We've been adding per-library guards (proxy fix, Slack guard, ciao guard) but this is whack-a-mole — the next library that does something unexpected in a restricted sandbox will crash the gateway again. Add a global safety net preload (sandbox-safety-net.js) that catches ALL uncaught exceptions and unhandled rejections, logs them, and continues. Only active when OPENSHELL_SANDBOX=1 (set by OpenShell at runtime) — outside a sandbox, normal Node.js crash behavior is preserved. Loaded as the FIRST --require preload so its handlers register before any library code runs. Per-library guards (Slack, ciao) still provide targeted handling with better log messages; the safety net is the last resort for everything else.

OpenClaw installs its own unhandledRejection handler that calls process.exit(1) for non-transient errors. Our safety net catches the rejection first and swallows it, but Node.js delivers the event to ALL listeners — OpenClaw's handler also fires and exits. Monkey-patch process.exit to block exits during the rejection delivery window. A flag (_swallowing) is set during our handler and cleared on the next microtask, so OpenClaw's handler (same tick) hits the intercepted process.exit and the gateway survives.

This test has been failing since March 31 and wastes API tokens on every nightly run without providing actionable signal.

Remove all entrypoint execution traces, /tmp dumps, gateway crash diagnostics, and verbose guard-skip logging added during debugging. Simplify E2E diagnostics to just guard file and NODE_OPTIONS checks.

shfmt reformatted case statement indentation. The nemotron test regex for validate_tmp_permissions was too strict — matched only when _NEMOTRON_FIX_SCRIPT was the second argument, but it's now further in the argument list due to _SANDBOX_SAFETY_NET and _CIAO_GUARD_SCRIPT.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

scripts/nemoclaw-start.sh (2)

1634-1634: Same suggestion: add _SLACK_GUARD_SCRIPT to root-path validation.

For consistency with the non-root path recommendation.

♻️ Suggested fix

-validate_tmp_permissions "$_SANDBOX_SAFETY_NET" "$_PROXY_FIX_SCRIPT" "$_NEMOTRON_FIX_SCRIPT" "$_CIAO_GUARD_SCRIPT"
+validate_tmp_permissions "$_SANDBOX_SAFETY_NET" "$_PROXY_FIX_SCRIPT" "$_NEMOTRON_FIX_SCRIPT" "$_CIAO_GUARD_SCRIPT" "$_SLACK_GUARD_SCRIPT"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 1634, The call to validate_tmp_permissions
is missing the _SLACK_GUARD_SCRIPT argument, so update the invocation of
validate_tmp_permissions to include _SLACK_GUARD_SCRIPT alongside the existing
arguments (_SANDBOX_SAFETY_NET, _PROXY_FIX_SCRIPT, _NEMOTRON_FIX_SCRIPT,
_CIAO_GUARD_SCRIPT) so the root-path validation covers the Slack guard script as
well.

1493-1493: Consider adding _SLACK_GUARD_SCRIPT to validation.

The Slack guard (/tmp/nemoclaw-slack-channel-guard.js) is a trust-boundary file loaded via NODE_OPTIONS, but it's not passed to validate_tmp_permissions. Since the validator skips non-existent files, adding it is safe even when Slack isn't configured.

♻️ Suggested fix

-  validate_tmp_permissions "$_SANDBOX_SAFETY_NET" "$_PROXY_FIX_SCRIPT" "$_NEMOTRON_FIX_SCRIPT" "$_CIAO_GUARD_SCRIPT"
+  validate_tmp_permissions "$_SANDBOX_SAFETY_NET" "$_PROXY_FIX_SCRIPT" "$_NEMOTRON_FIX_SCRIPT" "$_CIAO_GUARD_SCRIPT" "$_SLACK_GUARD_SCRIPT"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 1493, Call validate_tmp_permissions with
the Slack guard variable as well: add _SLACK_GUARD_SCRIPT to the existing
invocation that currently passes _SANDBOX_SAFETY_NET, _PROXY_FIX_SCRIPT,
_NEMOTRON_FIX_SCRIPT, and _CIAO_GUARD_SCRIPT; this ensures the trust-bound Slack
guard file (referenced via NODE_OPTIONS, _SLACK_GUARD_SCRIPT) is validated too —
it's safe to include because validate_tmp_permissions already skips non-existent
files.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lib/sandbox-init.sh`:
- Around line 127-132: The permission validator complains because
validate_tmp_permissions expects /tmp/gateway.log mode 644 but several startup
scripts set it to 600; update the startup chmod calls (references:
agent-runtime.ts around the chmod call, src/nemoclaw.ts at the chmod line, and
agents/hermes/start.sh occurrences) to set 644 instead of 600 so the file modes
match the validator, or alternatively modify the validate_tmp_permissions logic
to accept both "600" and "644" for gateway.log; pick one approach and change all
referenced locations consistently (agent-runtime.ts, src/nemoclaw.ts,
agents/hermes/start.sh, or validate_tmp_permissions) so the validator no longer
flags a mismatch.

---

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 1634: The call to validate_tmp_permissions is missing the
_SLACK_GUARD_SCRIPT argument, so update the invocation of
validate_tmp_permissions to include _SLACK_GUARD_SCRIPT alongside the existing
arguments (_SANDBOX_SAFETY_NET, _PROXY_FIX_SCRIPT, _NEMOTRON_FIX_SCRIPT,
_CIAO_GUARD_SCRIPT) so the root-path validation covers the Slack guard script as
well.
- Line 1493: Call validate_tmp_permissions with the Slack guard variable as
well: add _SLACK_GUARD_SCRIPT to the existing invocation that currently passes
_SANDBOX_SAFETY_NET, _PROXY_FIX_SCRIPT, _NEMOTRON_FIX_SCRIPT, and
_CIAO_GUARD_SCRIPT; this ensures the trust-bound Slack guard file (referenced
via NODE_OPTIONS, _SLACK_GUARD_SCRIPT) is validated too — it's safe to include
because validate_tmp_permissions already skips non-existent files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 19af5706-991f-463d-9648-fbac9d3dbad7

📥 Commits

Reviewing files that changed from the base of the PR and between 79455ed and f659eb3.

📒 Files selected for processing (5)

.github/workflows/nightly-e2e.yaml
Dockerfile
scripts/lib/sandbox-init.sh
scripts/nemoclaw-start.sh
test/e2e/test-messaging-providers.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/test-messaging-providers.sh

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/lib/sandbox-init.sh (1)

214-214: Keep best-effort behavior, but log lock failures.

Line 214 swallows all chmod errors silently. That keeps startup resilient, but it also hides hardening regressions. Emit a warning when the lock fails.

🛠️ Suggested change

-      chmod 444 "$rc_file" 2>/dev/null || true
+      if ! chmod 444 "$rc_file" 2>/dev/null; then
+        echo "[SECURITY] Failed to lock ${rc_file} to 444; continuing (best-effort)." >&2
+      fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/sandbox-init.sh` at line 214, The chmod call currently swallows
all errors for "$rc_file" (chmod 444 "$rc_file" 2>/dev/null || true); change it
to keep the best-effort behavior but log a warning when the lock fails: run
chmod 444 on "$rc_file", capture its exit status, and if it fails emit a warning
to stderr (including the rc_file name and the failure status or errno) while
still allowing startup to continue; update the block referencing "$rc_file" so
failures are visible but non-fatal.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lib/sandbox-init.sh`:
- Around line 121-132: The trust-boundary mapping in the script is out of sync:
the runtime check in the for-loop sets /tmp/gateway.log expected_perms="644"
(via the case handling for */gateway.log) but the earlier documentation/table
still lists gateway.log as 600; update that trust-boundary map or comment to
state /tmp/gateway.log is mode 644 so the docs match the implemented check
(search for the trust-boundary table/header and the entries referencing
gateway.log and change its mode to 644).

---

Nitpick comments:
In `@scripts/lib/sandbox-init.sh`:
- Line 214: The chmod call currently swallows all errors for "$rc_file" (chmod
444 "$rc_file" 2>/dev/null || true); change it to keep the best-effort behavior
but log a warning when the lock fails: run chmod 444 on "$rc_file", capture its
exit status, and if it fails emit a warning to stderr (including the rc_file
name and the failure status or errno) while still allowing startup to continue;
update the block referencing "$rc_file" so failures are visible but non-fatal.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 17e6f851-bc96-4b7e-b7df-1570d88a71f5

📥 Commits

Reviewing files that changed from the base of the PR and between f659eb3 and ae4d5d9.

📒 Files selected for processing (3)

scripts/lib/sandbox-init.sh
scripts/nemoclaw-start.sh
test/nemotron-inference-fix.test.ts

✅ Files skipped from review due to trivial changes (1)

test/nemotron-inference-fix.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/nemoclaw-start.sh

…Slack guard to validation Same issue as the nemotron test — regex was too strict for the new argument ordering. Also add _SLACK_GUARD_SCRIPT to validate_tmp_permissions calls per CodeRabbit review.

## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from #2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from #2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from #2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from #2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from #2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: #2575 / `d392ec07`, #2565 / `a3231049`, #1965 / `db1ef3ca`, #1990 / `db665834`, #2495 / `7da86fa3`, #2496 / `3192f4f4`, #2490 / `8c209058`, #2487 / `1f615e2f`, #2483 / `5653d33a`, #2482 / `31c782c0`, #2464 / `23bb5703`, #2472 / `a54f9a34`, and #2437 / `6bc860d7`. - Skipped per docs policy: #2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; #2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification  - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure  - [x] AI-assisted — tool: Codex ---  Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped.  --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

PR NVIDIA#2472 accidentally deleted the entire cloud-experimental-e2e job from nightly-e2e.yaml. Restores Landlock enforcement, API key leak detection, openclaw tui smoke, live chat, and skill injection tests. Fixes NVIDIA#2570 Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made-with: Cursor

PR NVIDIA#2472 accidentally deleted the entire cloud-experimental-e2e job from nightly-e2e.yaml. Restores Landlock enforcement, API key leak detection, openclaw tui smoke, live chat, and skill injection tests. Verified on fork: PASS in 14m 5s. Fixes NVIDIA#2570 Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made-with: Cursor

…#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes #2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: #2564 - Weekend incident: #2471, #2472, #2482, #2490 - E2E strategy: `cloud-experimental-e2e` removal in #2472 left a coverage gap that would have been flagged by these recommendations  ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs.  ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.

PR NVIDIA#2472 accidentally deleted the entire cloud-experimental-e2e job from nightly-e2e.yaml. Restores Landlock enforcement, API key leak detection, openclaw tui smoke, live chat, and skill injection tests. Verified on fork: PASS in 14m 5s. Fixes NVIDIA#2570 Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made-with: Cursor

## Summary Restore the `cloud-experimental-e2e` job that was accidentally deleted from `nightly-e2e.yaml` in PR #2472. ## Related Issue Fixes #2570 ## Changes Restores the `cloud-experimental-e2e` job that tests: - Landlock read-only enforcement (8 assertions on .bashrc, .profile, .openclaw, .openclaw-data, /tmp) - API key leak detection in process list - `openclaw tui` smoke test inside sandbox - Live chat via `openclaw agent` - Skill injection + agent verification - `inference.local` HTTPS probe The job runs unconditionally (no feature-flag gate). Added to `notify-on-failure` needs list. Removed the old `skip/05-network-policy.sh` step (now covered by the dedicated `network-policy-e2e` job). ## Type of Change - Code change (feature, bug fix, or refactor) ## Verification - YAML validated on fork: all jobs parse correctly - Verified on fork CI: cloud-experimental-e2e PASS in 14m 5s ## AI Disclosure - AI-assisted — tool: Cursor --- Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made with [Cursor](https://cursor.com)  ## Summary by CodeRabbit * **Tests** * Added nightly cloud experimental end-to-end tests to broaden coverage. * Made the experimental job selectable from the manual job list for targeted runs. * Always-check documentation during these runs for improved QA. * Ensure experimental sandbox is torn down and verified after tests. * Upload an install-log artifact when the experimental job fails to aid troubleshooting. * Include the experimental job in failure notifications and PR reporting so results are tracked.  Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com>

## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from #2472/#2482, or the container reachability fallback from #2425), this test catches it before community users are affected. ## Related Issue Closes #2599 Related: #2425 (the `isProxyHealthy()` fallback in PR #2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed  ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling.

…VIDIA#2472) ## Summary Fixes a 5-day outage where the gateway never started in non-root sandbox mode (Brev Launchable, no-new-privileges containers). Also adds a global safety net preventing any npm library crash from killing the gateway. ### Changes **Entrypoint Landlock tolerance** (`scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh`) - `install_configure_guard`: wrap all `.bashrc`/`.profile` writes in `|| true` — the `[ -w file ]` test passes (DAC) but Landlock blocks the actual write, crashing the entrypoint under `set -e` - `lock_rc_files`: `|| true` on chmod calls - `validate_tmp_permissions`: expect 644 for gateway.log - Root cause: commit `20407589` (Apr 20) added `install_configure_guard` which writes to Landlock-protected files. Every non-root sandbox since then had a dead gateway. **Global sandbox safety net** (`scripts/nemoclaw-start.sh`) - New `sandbox-safety-net.js` preload — catches ALL uncaught exceptions and unhandled rejections in sandbox mode (OPENSHELL_SANDBOX=1), logs them, and continues - Intercepts `process.exit()` during swallowed rejection delivery so OpenClaw's own handler (which calls `process.exit(1)` for non-transient errors) doesn't kill the gateway - First `--require` preload so handlers register before any library code **ciao network guard** (`scripts/nemoclaw-start.sh`) - Targeted guard for `@homebridge/ciao` mDNS library crash (`os.networkInterfaces()` → `uv_interface_addresses` SystemError in restricted namespaces) - Monkey-patches `os.networkInterfaces` to return `{}` on failure **Slack guard improvements** (`scripts/nemoclaw-start.sh`) - Include Slack guard in `proxy-env.sh` for connect sessions - Gateway.log changed from 600 to 644 for diagnostic readability **Per-channel health monitor disable** (`Dockerfile`) - Set `healthMonitor.enabled: false` on each messaging channel account - Prevents OpenClaw's health monitor from killing the gateway after 120s channel-connect-grace when a channel has placeholder tokens **Remove cloud-experimental-e2e** (`.github/workflows/nightly-e2e.yaml`) - Has been failing since March 31, wastes API tokens ## Test plan - [x] shellcheck clean - [x] 91/91 nemoclaw-start.test.ts pass - [x] Nightly E2E: messaging-providers Phase 7 S1+S2 pass (gateway survives Slack auth failure) - [x] Nightly E2E: sandbox-survival, skip-permissions, cloud-e2e pass  ## Summary by CodeRabbit * **Bug Fixes** * Improved sandbox crash resilience with new runtime guards, deterministic preload ordering, and selective guard activation to reduce unexpected failures. * Shell rc-file snippet installation and locking are now best-effort so setup proceeds when modifying user rc files fails. * **Chores** * Relaxed /tmp log permissions and extended tmp-permission validation to cover new guard artifacts. * Disabled health monitoring for messaging channel accounts. * **Tests** * Reduced noisy diagnostics in an e2e test, broadened matching in unit tests to avoid false negatives, and removed an experimental nightly CI job.

## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from NVIDIA#2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from NVIDIA#2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from NVIDIA#2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from NVIDIA#2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from NVIDIA#2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: NVIDIA#2575 / `d392ec07`, NVIDIA#2565 / `a3231049`, NVIDIA#1965 / `db1ef3ca`, NVIDIA#1990 / `db665834`, NVIDIA#2495 / `7da86fa3`, NVIDIA#2496 / `3192f4f4`, NVIDIA#2490 / `8c209058`, NVIDIA#2487 / `1f615e2f`, NVIDIA#2483 / `5653d33a`, NVIDIA#2482 / `31c782c0`, NVIDIA#2464 / `23bb5703`, NVIDIA#2472 / `a54f9a34`, and NVIDIA#2437 / `6bc860d7`. - Skipped per docs policy: NVIDIA#2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; NVIDIA#2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification  - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure  - [x] AI-assisted — tool: Codex ---  Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped.  --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

…NVIDIA#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes NVIDIA#2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: NVIDIA#2564 - Weekend incident: NVIDIA#2471, NVIDIA#2472, NVIDIA#2482, NVIDIA#2490 - E2E strategy: `cloud-experimental-e2e` removal in NVIDIA#2472 left a coverage gap that would have been flagged by these recommendations  ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs.  ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.

) ## Summary Restore the `cloud-experimental-e2e` job that was accidentally deleted from `nightly-e2e.yaml` in PR NVIDIA#2472. ## Related Issue Fixes NVIDIA#2570 ## Changes Restores the `cloud-experimental-e2e` job that tests: - Landlock read-only enforcement (8 assertions on .bashrc, .profile, .openclaw, .openclaw-data, /tmp) - API key leak detection in process list - `openclaw tui` smoke test inside sandbox - Live chat via `openclaw agent` - Skill injection + agent verification - `inference.local` HTTPS probe The job runs unconditionally (no feature-flag gate). Added to `notify-on-failure` needs list. Removed the old `skip/05-network-policy.sh` step (now covered by the dedicated `network-policy-e2e` job). ## Type of Change - Code change (feature, bug fix, or refactor) ## Verification - YAML validated on fork: all jobs parse correctly - Verified on fork CI: cloud-experimental-e2e PASS in 14m 5s ## AI Disclosure - AI-assisted — tool: Cursor --- Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made with [Cursor](https://cursor.com)  ## Summary by CodeRabbit * **Tests** * Added nightly cloud experimental end-to-end tests to broaden coverage. * Made the experimental job selectable from the manual job list for targeted runs. * Always-check documentation during these runs for improved QA. * Ensure experimental sandbox is torn down and verified after tests. * Upload an install-log artifact when the experimental job fails to aid troubleshooting. * Include the experimental job in failure notifications and PR reporting so results are tracked.  Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com>

## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from NVIDIA#2472/NVIDIA#2482, or the container reachability fallback from NVIDIA#2425), this test catches it before community users are affected. ## Related Issue Closes NVIDIA#2599 Related: NVIDIA#2425 (the `isProxyHealthy()` fallback in PR NVIDIA#2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed  ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling.

## Summary Adds the `test-non-root-sandbox-smoke` test from #2571 — a PR-gate job that runs the production image under `-security-opt no-new-privileges` to catch #2472 and #2482 regressions, without OpenShell, NVIDIA_API_KEY, or live inference. ## Related Issue Part of #2571 ## Changes - New `test/e2e-non-root-smoke.sh` (host-side bash, no `openshell`/`nemoclaw` CLI required): - **Test 1** — entrypoint setup chain completes cleanly under `--security-opt no-new-privileges` (regression guard for # 2472; passes a `true` command via the entrypoint's `NEMOCLAW_CMD` exec path so the gateway-launch branch is bypassed and we don't need the OpenShell-managed runtime). - **Test 2** — kernel confirms `NoNewPrivs=1` inside the container (defends the test itself against silent typos in the docker flag). - New job `test-non-root-sandbox-smoke` in `.github/workflows/pr-self-hosted.yaml` — `linux-amd64-cpu4`, `timeout-minutes: 5`, `needs: build-sandbox-images`, reuses the existing `isolation-image` artifact. - Expected results: ``` my-machine@ab1-cdf40-30:~/NemoClaw$ # Run script bash test/e2e-non-root-smoke.sh TEST: 1. Entrypoint setup chain completes under --security-opt no-new-privileges PASS: entrypoint exited 0 under no-new-privileges (#2472 setup chain healthy) TEST: 2. Kernel confirms NoNewPrivs=1 inside container (defends against silent flag typos) PASS: kernel confirms NoNewPrivs=1 ======================================== Results: 2 passed, 0 failed ======================================== ``` - Upcoming plans: - **Test 3** — `openclaw tui` does not error with "Missing gateway auth token" inside a login shell under the same constraint (regression guard for # 2482) after PR #2485 is merged ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification  - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [ ] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ---  Signed-off-by: Hung Le <hple@nvidia.com>

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread scripts/nemoclaw-start.sh

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread test/e2e/test-messaging-providers.sh Outdated

fix(channels): read Docker container logs for guard diagnostics

7cbf0a9

nemoclaw logs reads /tmp/gateway.log, not container stderr. The entrypoint guard messages go to stderr (Docker logs). Try openshell sandbox logs and docker logs directly to find guard installation messages.

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

fix(channels): dump /tmp contents in guard diagnostic

79455ed

List all nemoclaw-* and gateway.log files in /tmp to see exactly what the entrypoint created vs what's missing.

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread test/e2e/test-messaging-providers.sh Outdated

ericksoa added 15 commits April 25, 2026 05:30

fix(channels): quote glob in /tmp diagnostic to expand inside sandbox

c97fda7

The /tmp/nemoclaw-* glob was expanding on the host shell before being passed to openshell sandbox exec, showing host files instead of sandbox files. Wrap in bash -c to expand inside the container.

fix(channels): make lock_rc_files tolerant of Landlock read-only home

a9bea1a

lock_rc_files() calls chmod 444 on .bashrc/.profile which fails under Landlock. With set -e this kills the entrypoint — same root cause as the install_configure_guard fix. Add || true so it degrades gracefully.

fix(channels): update validate_tmp_permissions for 644 gateway.log

5b1b711

gateway.log was changed from 600 to 644 for diagnostic readability. Update the validate_tmp_permissions check to expect 644 for gateway.log so it doesn't fail and kill the entrypoint under set -e.

fix(channels): fix non-root gateway.log perms and add crash diagnostic

3591b31

- Change non-root gateway.log to 644 (matching root path) - Add post-launch diagnostic: check if gateway PID is alive after 3s, dump gateway.log contents to trace file if non-empty

ci: remove cloud-experimental-e2e from nightly workflow

4866add

This test has been failing since March 31 and wastes API tokens on every nightly run without providing actionable signal.

chore: remove diagnostic traces and clean up PR

f659eb3

Remove all entrypoint execution traces, /tmp dumps, gateway crash diagnostics, and verbose guard-skip logging added during debugging. Simplify E2E diagnostics to just guard file and NODE_OPTIONS checks.

ericksoa changed the title ~~fix(channels): include Slack guard in proxy-env and make gateway log readable~~ fix(sandbox): fix non-root gateway startup and add crash safety net Apr 25, 2026

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread scripts/lib/sandbox-init.sh Outdated

Merge branch 'main' into fix/slack-guard-loading

ca7f824

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread scripts/lib/sandbox-init.sh Outdated

fix(test): fix http-proxy-fix validate_tmp_permissions regex and add …

5699ac7

…Slack guard to validation Same issue as the nemotron test — regex was too strict for the new argument ordering. Also add _SLACK_GUARD_SCRIPT to validate_tmp_permissions calls per CodeRabbit review.

ericksoa mentioned this pull request Apr 25, 2026

fix(cli): add timeout to downloadSandboxConfig in dashboard recovery #2470

Closed

3 tasks

miyoungc mentioned this pull request Apr 28, 2026

docs: refresh daily docs for 0.0.29 #2576

Merged

13 tasks

nvshaxie mentioned this pull request Apr 28, 2026

Error during attempt to add the second agent in the sandbox #2145

Closed

2 tasks

jyaunches mentioned this pull request Apr 28, 2026

ci(e2e): add Brev Launchable install-flow smoke test #2599

Closed

TruongNguyenG mentioned this pull request Apr 28, 2026

fix(ci): restore cloud-experimental-e2e to nightly pipeline #2609

Merged

jyaunches mentioned this pull request Apr 28, 2026

feat(ci): coderabbit E2E recommendations + selective nightly dispatch #2615

Merged

5 tasks

jyaunches mentioned this pull request Apr 29, 2026

test(e2e): add Brev launchable install-flow smoke test #2677

Merged

4 tasks

jyaunches mentioned this pull request Apr 29, 2026

feat(ci): add non-root sandbox smoke test as PR gate #2711

Closed

hunglp6d mentioned this pull request May 7, 2026

test(e2e): add non-root sandbox smoke test #3166

Merged

12 tasks

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): fix non-root gateway startup and add crash safety net#2472

fix(sandbox): fix non-root gateway startup and add crash safety net#2472
ericksoa merged 25 commits into
mainfrom
fix/slack-guard-loading

ericksoa commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ericksoa commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericksoa commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading