Skip to content

[DGX Spark][Upgrade] post-rebuild gateway HTTP 0 + container Docker-unhealthy on aarch64 (regression vs main) #3975

@wangericnv

Description

@wangericnv

Description

PR #3925 (reopen/pr-3832-upgrade-deps, head fa28360e) post-rebuild leaves the in-sandbox OpenClaw gateway unable to start on aarch64. The gateway-recovery warning fires (#2478), nothing more is logged, and the container stays Docker-unhealthy. Reproduces 3/3 sandboxes across DGX Spark + DGX Station; same Spark box on main HEAD (cfa817b) passes rebuild's own deployment check.

Environment

Device:        DGX Spark (and reproduced on DGX Station GB300)
OS:            Ubuntu 24.04.4 LTS (Noble Numbat)
Architecture:  aarch64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.2.1, build a5c7197
OpenShell CLI: 0.0.39 (host) — sandboxes.json keeps openshellVersion "0.0.39" even though PR's package.json pins 0.0.44
NemoClaw:      v0.1.0 (PR branch reopen/pr-3832-upgrade-deps fa28360e); pre-upgrade v0.0.46
OpenClaw:      2026.5.18 (50a2481) — bundled by PR; pre-upgrade and on main is 2026.4.24 (cbcfdf6)

Steps to Reproduce

  1. On DGX Spark (aarch64): start from NemoClaw v0.0.46 with sandbox my-assistant onboarded and healthy.
  2. Clone the PR branch:
    git clone -b reopen/pr-3832-upgrade-deps --depth 1 \
        https://github.com/NVIDIA/NemoClaw.git ~/NemoClaw-pr3925
  3. Run the PR-branch installer:
    cd ~/NemoClaw-pr3925 && \
      NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 NEMOCLAW_NONINTERACTIVE=1 \
      yes y | bash install.sh
  4. Trigger a sandbox rebuild:
    nemoclaw my-assistant rebuild --yes
  5. Observe rebuild's own deployment-verify step and docker ps after.

Reproduced identically on DGX Station GB300 with two openclaw sandboxes (discord-sb, gemini-sb).

Expected Result

Rebuild's deployment-verify reports Deployment verified — gateway and dashboard are healthy. Container shows (health: starting) briefly then becomes healthy. Gateway log contains [gateway] ready (N plugins: ...) within a few seconds. Dashboard port (18789/18790) is reachable.

This is exactly what happens on main HEAD (cfa817b), which still bundles OpenClaw 2026.4.24.

Actual Result

Rebuild's own deployment-verify on PR branch reports:

✗ gateway: HTTP 0 (gateway not responding)
✗ dashboard: port forward not working (connection refused)
⚠ Deployment verification found issues
  The sandbox was created successfully but may not be fully functional.

docker ps shows new container Up X minutes (unhealthy) — does not transition out of unhealthy. docker exec <ctr> cat /tmp/gateway.log contains only:

[gateway-recovery] WARNING: /tmp/nemoclaw-proxy-env.sh missing - gateway launching without library guards (#2478)

No further gateway init lines ever appear (no [gateway] ready, no plugin register, no HTTP server start). Reproduced 3/3 sandboxes on 2/2 ARM64 boxes (DGX Spark + DGX Station GB300).

Logs

=== Spark PR-branch rebuild ===
  ✓ Sandbox 'my-assistant' rebuilt successfully
    Now running: OpenClaw v2026.5.18
  ✗ gateway: HTTP 0 (gateway not responding)
  ✗ dashboard: port forward not working (connection refused)

Container: openshell-my-assistant-18b489dd-...    Up 15 minutes (unhealthy)

$ docker exec <ctr> cat /tmp/gateway.log
[gateway-recovery] WARNING: /tmp/nemoclaw-proxy-env.sh missing - gateway launching without library guards (#2478)
(EOF -- no further lines)

=== Negative control on same Spark box: main HEAD (cfa817b) rebuild ===
  ✓ Deployment verified - gateway and dashboard are healthy.
    OpenClaw version: 2026.4.24
  Now running: OpenClaw v2026.4.24

Container: openshell-my-assistant-37b15ded-...    Up 1 minute (health: starting)

$ docker exec <ctr> cat /tmp/gateway.log
[sandbox-safety-net] loaded (launcher)
[guard] ciao-network-guard loaded (launcher)
2026-05-21T06:04:06 [gateway] loading configuration...
2026-05-21T06:04:06 [gateway] starting...
[guard] os.networkInterfaces() failed: A system error occurred: uv_interface_addresses returned Unknown system error 1 - returning empty (mDNS disabled)
2026-05-21T06:04:07 [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load: nemoclaw
2026-05-21T06:04:07 [plugins] nemoclaw failed during register from /sandbox/.openclaw/extensions/nemoclaw/dist/index.js: SyntaxError: Unexpected end of JSON input
2026-05-21T06:04:07 [gateway] starting HTTP server...
2026-05-21T06:04:07 [gateway] ready (4 plugins: browser, device-pair, phone-control, talk-voice; 1.6s)

DevTest evidence:

  • 597724 (Spark my-assistant)
  • 597725 (Station discord-sb)
  • 597726 (Station gemini-sb)

All under \DevTest\NemoClaw\NemoClaw Test\v0.0.47\manual\{spark,station,windows-arm-reference-host}.

Related

GH #2478 (closed) — same ARM64 networkInterfaces EPERM context; main has the workaround active and the gateway still reaches [gateway] ready. PR #3925 appears to regress the workaround or remove the proxy-env.sh setup path that triggers the gateway-recovery WARNING.


NVB#6198894

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.platform: dgx-sparkAffects DGX Spark hardware or workflowsv0.0.51Release target

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions