fix(sandbox): tie gateway healthcheck marker to launch site, not env hint (#4710) by nvshaxie · Pull Request #5116 · NVIDIA/NemoClaw

nvshaxie · 2026-06-10T06:52:34Z

Summary

The early conditional gating mark_in_container_gateway on OPENSHELL_DRIVERS=docker (added in fix: skip gateway marker for docker-driver sandboxes #4748) never fires inside the sandbox container, because OpenShell 0.0.44 does not export OPENSHELL_DRIVERS into the sandbox env. @hulynn's 2026-06-08 re-verification on v0.0.60 confirms the symptom is unchanged: marker is created on every empty-NEMOCLAW_CMD container, the Dockerfile HEALTHCHECK short-circuit ([ -f /tmp/nemoclaw-gateway-local ] || exit 0) is unreachable for docker-driver sandboxes, and the container is marked (unhealthy) on every fresh onboard.
This PR drops the early env-gated write and calls mark_in_container_gateway() directly before each openclaw gateway run --port ... launch (both non-root and root-mode paths). The marker is now true-by-construction: it exists if-and-only-if this container is about to start the gateway, regardless of whether the deployment-mode signal is plumbed through the sandbox env.

This is hulynn's fix-direction (c) from #4710:

Move mark_in_container_gateway into the gateway-launching codepath itself, so it only runs when this container will actually start the gateway.

Related Issue

Fixes #4710. Supersedes the no-op env-var guard introduced by #4748.

Changes

scripts/nemoclaw-start.sh: remove the early case ",${OPENSHELL_DRIVERS:-}," block and the _NEMOCLAW_DOCKER_DRIVER gate that wrapped the early mark_in_container_gateway call; add mark_in_container_gateway immediately before both first-launch invocations of openclaw gateway run --port "${_DASHBOARD_PORT}" (non-root at line ~3324, root-mode at line ~3550). Restart-loop fallback launches inherit the marker from the first launch — the function is idempotent (: >file), so calling it again on retry is a no-op.
scripts/nemoclaw-start.sh: rewrite the function-definition comment block to explain the launch-site invariant and the [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 root cause, so future regressions cannot reintroduce an env-gated form without removing the warning.
test/nemoclaw-start.test.ts: replace the old startup-snippet test with two new structural tests:
- ties the in-container gateway healthcheck marker to the gateway launch site (#4503, #4710) — asserts (a) the region immediately after the function definition contains no early mark_in_container_gateway call and no OPENSHELL_DRIVERS conditional; (b) every openclaw gateway run --port invocation in the script is preceded (within 6 lines) by mark_in_container_gateway, OR sits inside a restart-loop body that inherits from the first launch. The first invariant blocks a regression back to the env-gated form; the second locks the new contract.
- mark_in_container_gateway writes the marker file idempotently (#4710) — behavioral check on the helper itself.
test/nemoclaw-start.test.ts: the two existing launch-block tests (registers child PIDs ... and launches the root gateway through gosu ...) now stub mark_in_container_gateway() { :; } in their preamble so the extracted shell snippet remains self-contained after the new call is introduced.

Type of Change

Code change (feature, bug fix, or refactor)

Verification

npx vitest run --project cli test/nemoclaw-start.test.ts — 120 passing (vs. 119 baseline before this PR; 19 pre-existing failures on main are unchanged and unrelated to this change).
Manual diff review of scripts/nemoclaw-start.sh: confirmed both first-launch sites (lines 3328 + 3553 after edit) have mark_in_container_gateway immediately above the nohup ... "$OPENCLAW" gateway run --port ... invocation; the two restart-loop fallbacks at lines ~3382 / ~3639 sit inside while/until blocks downstream of those first launches and reuse the marker that's already there.

Scope note

A separate OpenClaw-side concern surfaced in @Dongni-Yang's 2026-06-04 thread (in-sandbox gateway loses its HTTP listener while the process stays alive) is out of scope for this PR — that fix has to live in the OpenClaw gateway codebase. This PR only addresses the marker-file regression hulynn re-confirmed in #4710 for docker-driver sandboxes where the gateway legitimately runs on the host. Standalone deployments where the gateway runs in-container are unaffected: the marker is still created (just at the launch site instead of at startup), so the existing pgrep healthcheck behavior is preserved bit-for-bit.

Signed-off-by: Shawn Xie shaxie@nvidia.com

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Improved container health-status handling so the in-container gateway liveness marker is only created when the gateway actually runs inside the container; host-delivered gateways skip marker creation and in-container path.
- Adjusted test file-size budget for the related tests.
Tests
- Added/updated tests verifying marker creation, idempotence, and presence at gateway start across launch modes; added host-delivery skip scenarios and harness stubs to isolate marker behavior.

@Dongni-Yang

…hint (#4710) The early conditional gating mark_in_container_gateway on OPENSHELL_DRIVERS=docker (PR #4748) never fires inside the sandbox container because OpenShell 0.0.44 does not export OPENSHELL_DRIVERS into the sandbox env. hulynn's 2026-06-08 re-verification on v0.0.60 confirmed the symptom is unchanged from v0.0.57: marker file is created on every NEMOCLAW_CMD-empty container, so the Dockerfile HEALTHCHECK short-circuit (`[ -f /tmp/nemoclaw-gateway-local ] || exit 0`) is unreachable for docker-driver sandboxes and the container is marked (unhealthy) on every fresh onboard. Fix: drop the early env-gated marker write and call mark_in_container_gateway() directly before each `openclaw gateway run --port ...` launch (both the non-root and root-mode paths). The marker is now true-by-construction: it exists if-and-only-if this container is about to start the gateway, regardless of how the deployment-mode signal is (or isn't) plumbed through the sandbox env. Restart-loop fallback launches inherit the marker from the first launch — the function is idempotent (`: >file`) so calling it again on retry is a no-op. Verified by two new regression tests in test/nemoclaw-start.test.ts: - 'ties the in-container gateway healthcheck marker to the gateway launch site (#4503, #4710)': asserts (a) the region immediately after the function definition contains no early call and no OPENSHELL_DRIVERS conditional, and (b) every `openclaw gateway run --port` invocation is preceded by mark_in_container_gateway within 6 lines (or sits inside a restart-loop body). - 'mark_in_container_gateway writes the marker file idempotently (#4710)': behavioral check on the helper itself. The two existing launch-block tests now stub `mark_in_container_gateway() { :; }` in their preamble to keep the extracted shell snippet self-contained. Test count: 120 passing (vs. 119 baseline; 19 pre-existing failures unchanged). Note: a separate OpenClaw-side concern surfaced in @Dongni-Yang's 2026-06-04 thread (in-sandbox gateway loses its HTTP listener while the process stays alive) is out of scope here — this fix only addresses the marker-file regression hulynn re-confirmed in #4710 for docker-driver sandboxes where the gateway legitimately runs on the host. Standalone deployments where the gateway runs in-container are unaffected: the marker is still created (just at the launch site instead of at startup), so the existing pgrep healthcheck behavior is preserved bit-for-bit. Fixes #4710. Signed-off-by: Shawn Xie <shaxie@nvidia.com>

copy-pr-bot · 2026-06-10T06:52:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-10T06:52:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Tie the HEALTHCHECK marker to the actual gateway launch: add host-delivery detection, move the marker helper earlier, call it immediately before starting the gateway (both entrypoint modes), and add tests that extract/execute the helper, verify timing/skip behavior, and check idempotency. Adjust test imports and reduce the test file-size budget.

Changes

Gateway marker placement and validation

Layer / File(s)	Summary
Marker function and host-delivery helpers `scripts/nemoclaw-start.sh`	`mark_in_container_gateway()` moved earlier; adds `gateway_delivered_by_openshell_host()` and `skip_in_container_gateway_for_host_delivery()` to gate marker creation.
Invoke marker at gateway start (non-root & root) `scripts/nemoclaw-start.sh`	Calls `skip_in_container_gateway_for_host_delivery` and then `mark_in_container_gateway` immediately before `openclaw gateway run` in the non-root background path and the root step-down path.
Tests: extract-run behavioral probe and idempotency `test/nemoclaw-start-gateway-marker.test.ts`	Extracts the shell `mark_in_container_gateway` function, runs behavioral probes with a fake `openclaw` to assert marker exists at invocation (two launch forms), tests host-delivered skip (no marker, no openclaw run), and adds a direct idempotency test (marker created, empty, persists across calls).
Test harness stub, imports, and budget `test/nemoclaw-start.test.ts`, `ci/test-file-size-budget.json`	Stubs `mark_in_container_gateway()` as a no-op in the gateway-launch harness, reorders Node/Vitest imports, adds `skip_in_container_gateway_for_host_delivery()` stub to diagnostics scaffolding, and reduces `legacyMaxLines` for the affected test file.

Sequence Diagram(s)

sequenceDiagram
  participant ComponentA
  participant ComponentB
  ComponentA->>ComponentB: observable interaction

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I nudge a tiny file in place,
A hollow mark to show the space.
Called just before the gateway wakes,
It stays quite small for healthchecks' sakes.
A hop, a test, a quiet trace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: shifting the gateway healthcheck marker from environment-based hinting to direct launch-site detection, addressing the core regression in issue `#4710`.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from `#4710`: marker created only at gateway launch sites, removed early env-gated write, added idempotent verification tests, and prevented regressions via structural guards.
Out of Scope Changes check	✅ Passed	All changes are scoped to the marker-file regression: script logic, test coverage updates, and file-size budget adjustment. OpenClaw listener issues remain explicitly out of scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/healthcheck-marker-gateway-launch-4710

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 331-346: The file test/nemoclaw-start.test.ts exceeds the legacy
size budget; split out the new marker-focused test and trim verbose comments:
move the test with the title starting "ties the in-container gateway healthcheck
marker to the gateway launch site (`#4503`, `#4710`)" and any closely-related helper
functions/fixtures it depends on into a new test file (e.g.,
test/nemoclaw-gateway-marker.test.ts), shorten or relocate the long inline
commentary (the large block around the NEMOCLAW_CMD checks) into a small summary
or separate docs, and update/keep any imports or shared fixtures referenced by
that test so it runs standalone; after moving, run the size-budget check to
confirm test/nemoclaw-start.test.ts is below the limit and that the new file
contains the marker-focused tests formerly at lines ~403-435.
- Around line 383-399: The current fallback sets inRestartLoop true if any
"while|until|for" appears within 60 lines, which can falsely exempt non-loop
launches; update the detection used around launchLineNumbers and
mark_in_container_gateway so it only treats a launch as inside a restart loop
when a loop header actually encloses the launch: locate the nearest loop header
upstream (e.g., the matching "while|until|for" token found when computing
upstream), ensure you stop at function boundaries, then verify the loop body
scope/indentation spans the launch index (or that the launch line occurs after
the loop header but before the loop’s end) before setting inRestartLoop true;
change the logic that sets inRestartLoop to perform this scoped containment
check rather than the current simple regex existence test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 794633e9-7c20-4f79-87e1-c3ea4c81376a

📥 Commits

Reviewing files that changed from the base of the PR and between 56cbecf and ecf648f.

📒 Files selected for processing (2)

scripts/nemoclaw-start.sh
test/nemoclaw-start.test.ts

github-actions · 2026-06-10T06:58:01Z

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, issue-2478-crash-loop-recovery-e2e
Optional E2E: sandbox-survival-e2e, gateway-health-honest-e2e

Dispatch hint: cloud-onboard-e2e,issue-2478-crash-loop-recovery-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cloud-onboard-e2e (high): Runs a real cloud OpenClaw onboard against the current Docker-driver path. This is the most direct existing E2E to catch a sandbox container being marked unhealthy or failing readiness because the in-container gateway marker is wrong when the gateway is host-owned.
issue-2478-crash-loop-recovery-e2e (high): Exercises real OpenClaw gateway startup, liveness, and crash/recovery behavior inside an onboarded sandbox. The PR modifies the exact entrypoint launch sites that start openclaw gateway run, so this should block merge for gateway lifecycle regressions.

Optional E2E

sandbox-survival-e2e (high): Additional confidence for sandbox lifecycle after gateway stop/start and recovery. Adjacent to the marker/healthcheck behavior, but less targeted than onboard and crash-loop recovery.
gateway-health-honest-e2e (medium): Adjacent regression guard for gateway health honesty when the Docker-driver gateway fails. Useful because this PR changes health/readiness signaling assumptions, though it focuses host gateway startup rather than the in-container marker contract.

New E2E recommendations

Docker-driver sandbox healthcheck marker contract (high): Existing unit tests validate marker placement in extracted shell snippets, and broad onboard E2E may catch failures indirectly, but there is no dedicated live E2E that onboards a Docker-driver sandbox, inspects the real container for absence of /tmp/nemoclaw-gateway-local, and verifies Docker health remains healthy while the host-owned gateway is serving.
- Suggested test: Add a live Docker-driver gateway-marker-healthcheck E2E that onboards OpenClaw, docker execs the sandbox container to assert /tmp/nemoclaw-gateway-local is absent, verifies docker inspect health does not fail on curl exit 7, and confirms host gateway health/status is ready.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: cloud-onboard-e2e,issue-2478-crash-loop-recovery-e2e

github-actions · 2026-06-10T06:58:02Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: ubuntu-repo-docker-post-reboot-recovery

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh changes the sandbox entrypoint gateway-marker behavior used by the Ubuntu Docker cloud OpenClaw path; this live-supported scenario exercises the standard Docker-driver OpenClaw sandbox startup and health/state validation surface.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

ubuntu-repo-docker-post-reboot-recovery: Optional adjacent lifecycle coverage for Docker-driver recovery/status behavior, which is related to gateway/sandbox health handling but not the primary changed entrypoint launch contract.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-docker-post-reboot-recovery

Relevant changed files

scripts/nemoclaw-start.sh

github-actions · 2026-06-10T06:58:14Z

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: scripts/nemoclaw-start.sh marker timing and Dockerfile absent-marker healthcheck shortcut: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `scripts/nemoclaw-start.sh` defines `mark_in_container_gateway` near line 217 and calls it at first gateway launch sites near lines 3329 and 3554; `Dockerfile` exits healthy when curl exit is 7 and the marker is absent; docs now say env hints are not authoritative.
Absent marker still conflates host-owned gateway with local pre-launch startup (scripts/nemoclaw-start.sh:217): Moving `/tmp/nemoclaw-gateway-local` creation to the two `openclaw gateway run` launch sites fixes the unreliable `OPENSHELL_DRIVERS` heuristic, but it also means standalone/local-serving startup has no marker until all setup before that launch site completes. The Dockerfile healthcheck still exits healthy on curl exit 7 when the marker is absent, so a local startup hang or failure before `mark_in_container_gateway` can be masked as healthy rather than treated as a local gateway boot failure.
- Recommendation: Introduce a distinct source of truth for gateway ownership or local-serving intent before long setup work starts, or add a separate host-owned marker/state so the Dockerfile can distinguish host-delivered gateways from local pre-launch stalls. Align the script, Dockerfile, and docs comments with that state model and add a regression for the chosen pre-launch health boundary.
- Evidence: `mark_in_container_gateway` writes `/tmp/nemoclaw-gateway-local` at `scripts/nemoclaw-start.sh:217` and is called only immediately before the non-root and root first-launch `openclaw gateway run` invocations at about `scripts/nemoclaw-start.sh:3329` and `scripts/nemoclaw-start.sh:3554`. The unchanged `Dockerfile` healthcheck has `[ -f /tmp/nemoclaw-gateway-local ] || exit 0` after curl exit 7. The changed script comment defines absence as “this entrypoint has not launched a gateway,” while the Dockerfile comment treats absence as outside-container gateway delivery.
Runtime evidence is still missing for the [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 Docker-driver health contract (test/nemoclaw-start-gateway-marker.test.ts:47): Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710's observable expected result is that Docker-driver sandboxes leave `/tmp/nemoclaw-gateway-local` absent and Docker reports `Status="healthy"` with `FailingStreak=0`. The added tests verify the marker helper, env hints, and extracted launch blocks, and existing Dockerfile unit coverage proves absent marker plus curl exit 7 exits healthy. They do not exercise a Docker-driver-shaped entrypoint/container lifecycle to prove the marker stays absent and Docker health reaches the requested state end-to-end.
- Recommendation: Add or identify targeted runtime/integration validation that starts a Docker-driver-shaped sandbox/entrypoint, confirms `/tmp/nemoclaw-gateway-local` is absent, and confirms the Docker HEALTHCHECK result for curl exit 7 is healthy without invoking `pgrep` and with `FailingStreak=0`. Do not rely only on PR prose or external workflow status.
- Evidence: `test/nemoclaw-start-gateway-marker.test.ts` constructs shell snippets and fake `openclaw` binaries; `test/nemoclaw-start.test.ts` extracts launch blocks and now verifies marker presence at real launch sites; `test/sandbox-provisioning.test.ts` covers the Dockerfile health command in isolation, not an actual Docker-driver entrypoint/container path. Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 expected: “Specifically at step 4, the marker file should NOT exist; step 3 should show `Status="healthy"` with `FailingStreak=0`.”
Healthcheck shortcut can still mask local gateway startup failures (scripts/nemoclaw-start.sh:3329): This is security-relevant sandbox lifecycle observability rather than a confirmed sandbox escape. The PR improves the previous policy by not trusting sandbox env hints such as `OPENSHELL_DRIVERS`, but the container health configuration still treats marker absence as healthy on connection-refused. Because the marker is now only written at the launch site, local-serving pre-launch failures can be indistinguishable from host-owned gateway delivery.
- Recommendation: Make the healthcheck consume an authoritative ownership/intent signal instead of overloading marker absence. Add negative coverage showing standalone/local-serving pre-launch failure does not pass health via the absent-marker shortcut.
- Evidence: Security categories 7 and 9 are warnings: `scripts/nemoclaw-start.sh` calls `mark_in_container_gateway` immediately before gateway launch, while the unchanged `Dockerfile` exits 0 whenever curl exit is 7 and `/tmp/nemoclaw-gateway-local` is absent.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Docker-driver-shaped entrypoint/container leaves `/tmp/nemoclaw-gateway-local` absent.. This PR changes sandbox/container lifecycle and Docker health semantics. The unit-style shell snippets and extracted launch-block tests are useful, but the key acceptance criteria are observable runtime behaviors involving the actual entrypoint, marker file, Docker-driver topology, and Docker HEALTHCHECK state.
**Runtime validation** — Docker-driver-shaped container with curl exit 7 reports Docker health `Status="healthy"` and `FailingStreak=0` without invoking `pgrep`.. This PR changes sandbox/container lifecycle and Docker health semantics. The unit-style shell snippets and extracted launch-block tests are useful, but the key acceptance criteria are observable runtime behaviors involving the actual entrypoint, marker file, Docker-driver topology, and Docker HEALTHCHECK state.
**Runtime validation** — Standalone/local-serving startup that fails or stalls before `openclaw gateway run` does not pass health solely because `/tmp/nemoclaw-gateway-local` is absent, or explicitly declares local-serving intent before long setup.. This PR changes sandbox/container lifecycle and Docker health semantics. The unit-style shell snippets and extracted launch-block tests are useful, but the key acceptance criteria are observable runtime behaviors involving the actual entrypoint, marker file, Docker-driver topology, and Docker HEALTHCHECK state.
**Runtime validation** — Runtime topology validation records whether the dashboard gateway listener/process is host-owned or in-container for Docker-driver so conflicting issue comments cannot regress into masking a real listener failure.. This PR changes sandbox/container lifecycle and Docker health semantics. The unit-style shell snippets and extracted launch-block tests are useful, but the key acceptance criteria are observable runtime behaviors involving the actual entrypoint, marker file, Docker-driver topology, and Docker HEALTHCHECK state.
**Runtime validation** — Full production launch path reached with `OPENSHELL_DRIVERS=docker`, missing `OPENSHELL_DRIVERS`, and OpenShell sleep-command env still writes the marker when it actually starts `openclaw gateway run`.. This PR changes sandbox/container lifecycle and Docker health semantics. The unit-style shell snippets and extracted launch-block tests are useful, but the key acceptance criteria are observable runtime behaviors involving the actual entrypoint, marker file, Docker-driver topology, and Docker HEALTHCHECK state.
**Runtime evidence is still missing for the [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 Docker-driver health contract** — Add or identify targeted runtime/integration validation that starts a Docker-driver-shaped sandbox/entrypoint, confirms `/tmp/nemoclaw-gateway-local` is absent, and confirms the Docker HEALTHCHECK result for curl exit 7 is healthy without invoking `pgrep` and with `FailingStreak=0`. Do not rely only on PR prose or external workflow status.
**Acceptance clause:** Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 Description: "marker file `/tmp/nemoclaw-gateway-local` present = gateway runs in THIS container (standalone deployments); marker absent = gateway runs on host (OpenShell Docker-driver, [Ubuntu 24.04][Sandbox][GitHub Issue #4503] sandbox Docker HEALTHCHECK always (unhealthy) — pgrep openclaw-gateway runs inside container but gateway runs on host #4503) — in-container probe cannot observe it, so HEALTHCHECK should exit 0 and defer to OpenShell host-side delivery-chain monitoring." — add test evidence or identify existing coverage. The marker is now written at the two in-container `openclaw gateway run` launch sites, and Dockerfile healthcheck coverage shows absent marker plus curl exit 7 exits healthy. Remaining gap: marker absence also covers local pre-launch state before the launch site, not only host-owned gateway delivery.
**Acceptance clause:** Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 Expected Result: "Docker HEALTHCHECK reports `(healthy)` for Docker-driver sandboxes because the gateway lives on the host and the in-container probe cannot observe it." — add test evidence or identify existing coverage. `test/sandbox-provisioning.test.ts` verifies the Dockerfile command exits healthy when curl returns 7 and the marker is absent, without consulting `pgrep`. The PR does not include runtime/entrypoint evidence that a Docker-driver sandbox leaves the marker absent and reaches Docker healthy state.

Since last review details

Current findings:

Source-of-truth review needed: scripts/nemoclaw-start.sh marker timing and Dockerfile absent-marker healthcheck shortcut: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `scripts/nemoclaw-start.sh` defines `mark_in_container_gateway` near line 217 and calls it at first gateway launch sites near lines 3329 and 3554; `Dockerfile` exits healthy when curl exit is 7 and the marker is absent; docs now say env hints are not authoritative.
Absent marker still conflates host-owned gateway with local pre-launch startup (scripts/nemoclaw-start.sh:217): Moving `/tmp/nemoclaw-gateway-local` creation to the two `openclaw gateway run` launch sites fixes the unreliable `OPENSHELL_DRIVERS` heuristic, but it also means standalone/local-serving startup has no marker until all setup before that launch site completes. The Dockerfile healthcheck still exits healthy on curl exit 7 when the marker is absent, so a local startup hang or failure before `mark_in_container_gateway` can be masked as healthy rather than treated as a local gateway boot failure.
- Recommendation: Introduce a distinct source of truth for gateway ownership or local-serving intent before long setup work starts, or add a separate host-owned marker/state so the Dockerfile can distinguish host-delivered gateways from local pre-launch stalls. Align the script, Dockerfile, and docs comments with that state model and add a regression for the chosen pre-launch health boundary.
- Evidence: `mark_in_container_gateway` writes `/tmp/nemoclaw-gateway-local` at `scripts/nemoclaw-start.sh:217` and is called only immediately before the non-root and root first-launch `openclaw gateway run` invocations at about `scripts/nemoclaw-start.sh:3329` and `scripts/nemoclaw-start.sh:3554`. The unchanged `Dockerfile` healthcheck has `[ -f /tmp/nemoclaw-gateway-local ] || exit 0` after curl exit 7. The changed script comment defines absence as “this entrypoint has not launched a gateway,” while the Dockerfile comment treats absence as outside-container gateway delivery.
Runtime evidence is still missing for the [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 Docker-driver health contract (test/nemoclaw-start-gateway-marker.test.ts:47): Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710's observable expected result is that Docker-driver sandboxes leave `/tmp/nemoclaw-gateway-local` absent and Docker reports `Status="healthy"` with `FailingStreak=0`. The added tests verify the marker helper, env hints, and extracted launch blocks, and existing Dockerfile unit coverage proves absent marker plus curl exit 7 exits healthy. They do not exercise a Docker-driver-shaped entrypoint/container lifecycle to prove the marker stays absent and Docker health reaches the requested state end-to-end.
- Recommendation: Add or identify targeted runtime/integration validation that starts a Docker-driver-shaped sandbox/entrypoint, confirms `/tmp/nemoclaw-gateway-local` is absent, and confirms the Docker HEALTHCHECK result for curl exit 7 is healthy without invoking `pgrep` and with `FailingStreak=0`. Do not rely only on PR prose or external workflow status.
- Evidence: `test/nemoclaw-start-gateway-marker.test.ts` constructs shell snippets and fake `openclaw` binaries; `test/nemoclaw-start.test.ts` extracts launch blocks and now verifies marker presence at real launch sites; `test/sandbox-provisioning.test.ts` covers the Dockerfile health command in isolation, not an actual Docker-driver entrypoint/container path. Issue [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created #4710 expected: “Specifically at step 4, the marker file should NOT exist; step 3 should show `Status="healthy"` with `FailingStreak=0`.”
Healthcheck shortcut can still mask local gateway startup failures (scripts/nemoclaw-start.sh:3329): This is security-relevant sandbox lifecycle observability rather than a confirmed sandbox escape. The PR improves the previous policy by not trusting sandbox env hints such as `OPENSHELL_DRIVERS`, but the container health configuration still treats marker absence as healthy on connection-refused. Because the marker is now only written at the launch site, local-serving pre-launch failures can be indistinguishable from host-owned gateway delivery.
- Recommendation: Make the healthcheck consume an authoritative ownership/intent signal instead of overloading marker absence. Add negative coverage showing standalone/local-serving pre-launch failure does not pass health via the absent-marker shortcut.
- Evidence: Security categories 7 and 9 are warnings: `scripts/nemoclaw-start.sh` calls `mark_in_container_gateway` immediately before gateway launch, while the unchanged `Dockerfile` exits 0 whenever curl exit is 7 and `/tmp/nemoclaw-gateway-local` is absent.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

wscurran · 2026-06-10T14:59:17Z

✨
Related open PRs:

#4748 fix: skip gateway marker for docker-driver sandboxes

Related open issues:

#4710 [Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created

… file The #4710 marker test asserted on the start-script source text (regex .not.toMatch against a source window), tripping the source-shape budget gate (1 > 0). Replace it with a behavior test: a fake `openclaw` records whether the /tmp/nemoclaw-gateway-local marker exists at the instant `gateway run` fires, for both the non-root and root step-down launch forms — proving the marker is tied to the launch site, not an env hint. The PR also pushed test/nemoclaw-start.test.ts past its 5289-line legacy size budget. Move both #4710 marker tests into a dedicated test/nemoclaw-start-gateway-marker.test.ts (164 lines, under the 1500 default) and ratchet the legacy budget for the original file down to 5234, per the test-file-size-budget legacy-ratchet rule. No production change; scripts/nemoclaw-start.sh fix is unchanged. Signed-off-by: Shawn Xie <shaxie@nvidia.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

test/nemoclaw-start-gateway-marker.test.ts (1)

136-145: ⚡ Quick win

Unify function extraction to avoid brittle brace parsing.

Line 137 uses the first } after the function start, which is fragile if mark_in_container_gateway ever gains ${...} or additional blocks/comments. Reuse the already robust extractShellFunctionFromSource helper here too.

Suggested diff

-    const fnStart = src.indexOf("mark_in_container_gateway() {");
-    const fnEnd = src.indexOf("}", fnStart);
-    if (fnStart === -1 || fnEnd === -1) {
-      throw new Error("mark_in_container_gateway function not found");
-    }
     const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-gw-marker-"));
     const markerPath = path.join(tmpDir, "nemoclaw-gateway-local");
-    const fnSrc = src
-      .slice(fnStart, fnEnd + 1)
-      .replaceAll("/tmp/nemoclaw-gateway-local", markerPath);
+    const fnSrc = extractShellFunctionFromSource(src, "mark_in_container_gateway")
+      .replaceAll("/tmp/nemoclaw-gateway-local", markerPath);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/nemoclaw-start-gateway-marker.test.ts` around lines 136 - 145, The test
currently finds the function by searching for "mark_in_container_gateway() {"
and the next "}" using fnStart/fnEnd which is brittle; instead call the existing
extractShellFunctionFromSource helper to robustly extract the full
mark_in_container_gateway function body, then replace the
"/tmp/nemoclaw-gateway-local" path in that returned string and assign to fnSrc;
update references to fnStart/fnEnd so extractShellFunctionFromSource is used to
produce the function source (keeping tmpDir and markerPath logic the same).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/nemoclaw-start-gateway-marker.test.ts`:
- Around line 136-145: The test currently finds the function by searching for
"mark_in_container_gateway() {" and the next "}" using fnStart/fnEnd which is
brittle; instead call the existing extractShellFunctionFromSource helper to
robustly extract the full mark_in_container_gateway function body, then replace
the "/tmp/nemoclaw-gateway-local" path in that returned string and assign to
fnSrc; update references to fnStart/fnEnd so extractShellFunctionFromSource is
used to produce the function source (keeping tmpDir and markerPath logic the
same).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d7a9432a-416d-4821-b929-c5be9474434f

📥 Commits

Reviewing files that changed from the base of the PR and between ecf648f and 8935f94.

📒 Files selected for processing (3)

ci/test-file-size-budget.json
test/nemoclaw-start-gateway-marker.test.ts
test/nemoclaw-start.test.ts

✅ Files skipped from review due to trivial changes (1)

ci/test-file-size-budget.json

🚧 Files skipped from review as they are similar to previous changes (1)

test/nemoclaw-start.test.ts

github-actions · 2026-06-11T05:21:24Z

🌿 Preview your docs: https://nvidia-preview-pr-5116.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-11T05:30:29Z

Selective E2E Results — ❌ Some jobs failed

Run: 27325717417
Target ref: f34bb51c859f7e2247e04fc9898d9c7ab756cfd0
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e
Summary: 0 passed, 2 failed, 0 skipped

Job	Result
cloud-onboard-e2e	❌ failure
sandbox-survival-e2e	❌ failure

Failed jobs: cloud-onboard-e2e, sandbox-survival-e2e. Check run artifacts for logs.

github-actions · 2026-06-11T05:36:57Z

Selective E2E Results — ❌ Some jobs failed

Run: 27326029419
Target ref: c698c883d8bb0c9c22cad27b71678ba06cd9b8d8
Workflow ref: main
Requested jobs: cloud-e2e,sandbox-survival-e2e
Summary: 0 passed, 2 failed, 0 skipped

Job	Result
cloud-e2e	❌ failure
sandbox-survival-e2e	❌ failure

Failed jobs: cloud-e2e, sandbox-survival-e2e. Check run artifacts for logs.

github-actions · 2026-06-11T05:45:52Z

Selective E2E Results — ❌ Some jobs failed

Run: 27326403367
Target ref: 29effb0bdb140766082601f75ada11f723dd7d12
Workflow ref: main
Requested jobs: cloud-e2e,sandbox-survival-e2e
Summary: 0 passed, 2 failed, 0 skipped

Job	Result
cloud-e2e	❌ failure
sandbox-survival-e2e	❌ failure

Failed jobs: cloud-e2e, sandbox-survival-e2e. Check run artifacts for logs.

github-actions · 2026-06-11T06:05:39Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27326894410
Target ref: cfe3b49efa3f55a5a141923db0e6310f19746f4a
Workflow ref: main
Requested jobs: cloud-onboard-e2e,issue-2478-crash-loop-recovery-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
issue-2478-crash-loop-recovery-e2e	✅ success

## Summary - Add v0.0.64 release notes from the release announcement and link them to the relevant deeper docs. - Document that custom policy presets recorded through `policy-add --from-file` and `--from-dir` survive snapshot restore and sandbox recreation. - Refresh generated NemoClaw user skills from the current source docs. ## Source summary - #5104 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/network-policy/customize-network-policy.mdx`: Documents custom policy presets preserved through snapshot restore. - #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Brave web-search pinning and `BRAVE_API_KEY` placeholder preservation. - #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Docker-driver gateway health and rootfs guard stability. - #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note coverage for chat-completions provider selection and Nemotron Ultra 550B tool-less request compatibility. - #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds release-note coverage for messaging render plan refresh, OpenClaw scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency setup. - Current source docs -> `.agents/skills/`: Regenerates user-skill references so agent-facing guidance matches the source documentation. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `npm run typecheck:cli` - Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills verification, TypeScript CLI, and skills YAML checks passed.  ## Summary by CodeRabbit * **Documentation** * Clarified sandbox snapshot restore preserves custom policy presets and restores them without original files. * Switched sandbox setup and remote deployment guidance to Docker-based workflows and emphasized remote onboarding flow. * Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues, and onboarding resume. * Added/updated CLI docs: advanced maintenance, session export, upload/download wrappers, and status recovery guidance. * Added v0.0.64 release notes and links to NemoClaw Community; fixed command reference formatting.

nvshaxie requested review from cv and ericksoa June 10, 2026 06:52

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread test/nemoclaw-start.test.ts Outdated

Comment thread test/nemoclaw-start.test.ts Outdated

wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression labels Jun 10, 2026

ericksoa mentioned this pull request Jun 10, 2026

fix(sandbox): pin gateway.reload=hot and add gateway serving watchdog (#4710) #5181

Open

6 tasks

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Merge branch 'main' into fix/healthcheck-marker-gateway-launch-4710

4c31de1

cv approved these changes Jun 11, 2026

View reviewed changes

cv added 3 commits June 10, 2026 19:40

merge: sync main into healthcheck marker fix

34214f1

fix(sandbox): skip local gateway for host-delivered docker driver

0be8a12

fix(sandbox): avoid env-based gateway skip

f34bb51

docs(skills): sync gateway marker reference

c698c88

test(sandbox): assert launch block gateway marker

29effb0

Merge branch 'main' into fix/healthcheck-marker-gateway-launch-4710

cfe3b49

ericksoa mentioned this pull request Jun 11, 2026

fix(recover): require sustained gateway serving after recovery (#4710) #5182

Merged

6 tasks

cv approved these changes Jun 11, 2026

View reviewed changes

cv merged commit 71278a6 into main Jun 11, 2026
37 checks passed

cv deleted the fix/healthcheck-marker-gateway-launch-4710 branch June 11, 2026 06:17

coderabbitai Bot mentioned this pull request Jun 11, 2026

fix(cli): stabilize platform vitest on macOS and WSL #5209

Merged

14 tasks

cv added the v0.0.64 Release target label Jun 12, 2026

miyoungc mentioned this pull request Jun 12, 2026

docs: refresh v0.0.64 release docs #5358

Merged

Conversation

nvshaxie commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Scope note

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

wscurran commented Jun 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvshaxie commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading