Skip to content

[DGX Spark] Gateway crash loop on startup: @homebridge/ciao networkInterfaces() returns EPERM in OpenShell sandbox #2478

@SeannyQuest

Description

@SeannyQuest

Hit a clean reproducible crash on a fresh DGX Spark install and figured I'd write it up since the workaround is also clean. Setup details first, then the crash, then what I tried.

Setup

Fresh install on:

  • ASUS GX10 (NVIDIA DGX Spark, GB10 Grace Blackwell, 128GB unified memory)
  • DGX OS 24.04
  • NemoClaw 0.1.0
  • OpenClaw v2026.4.2 inside the sandbox
  • Node 22.22.1 inside the sandbox
  • Balanced policy preset (default from the onboard wizard)
  • Local Ollama, model nemotron-3-super:120b
  • One Telegram channel

What's happening

Onboard finishes successfully. nemoclaw <name> status initially shows everything green. But the gateway never actually serves anything. Messages to the Telegram bot get no reply.

Tail the gateway log and you can see why. Every time it boots, it crashes on the same line, then health-monitor restarts it, then it crashes again. Loops forever.

[gateway] listening on ws://127.0.0.1:18789, ws://[::1]:18789 (PID <n>)
[gateway] log file: /tmp/openclaw-998/openclaw-2026-04-25.log
[gateway] security warning: dangerous config flags enabled: ...
[openclaw] Unhandled promise rejection: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1 (Unknown system error 1)
    at Object.networkInterfaces (node:os:218:16)
    at Function.assumeNetworkInterfaceNames (/usr/local/lib/node_modules/openclaw/node_modules/@homebridge/ciao/src/NetworkManager.ts:527:23)
    at NetworkManager.getCurrentNetworkInterfaces (/usr/local/lib/node_modules/openclaw/node_modules/@homebridge/ciao/src/NetworkManager.ts:370:32)

The status output hints at the symptom but the suggested fix doesn't actually work:

OpenClaw: not running

The sandbox is alive but the OpenClaw gateway process is not running.
This typically happens after a gateway restart (e.g., laptop close/open).

To recover, run:
  nemoclaw <name> connect  (auto-recovers on connect)

connect doesn't auto-recover, because the respawned gateway hits the same crash on the same line every time.

What's actually going wrong

@homebridge/ciao (the mDNS/Bonjour library OpenClaw bundles for local network discovery) calls os.networkInterfaces() during init. Inside the OpenShell sandbox the underlying syscall fails with EPERM, because seccomp is blocking the netlink socket family. Node turns that into a SystemError, ciao doesn't catch it, and the unhandled rejection takes the gateway down.

You can confirm the netlink restriction independently from inside the sandbox:

$ ss -tlnp 2>&1 | head -1
Cannot open netlink socket: Operation not permitted

One thing worth flagging: ciao's mDNS isn't actually used by any of the supported channels. Telegram, Slack, and Discord all reach out over plain HTTPS. The crash happens just because the library gets loaded and tries to list interfaces eagerly on startup.

Workaround that worked

Override os.networkInterfaces before ciao loads, via a NODE_OPTIONS preload:

echo 'require("os").networkInterfaces=()=>({});' > /sandbox/.openclaw-data/patch/preload.js

NODE_OPTIONS="--require /sandbox/.openclaw-data/patch/preload.js" \
  openclaw gateway run

That's it. ciao gets {}, gives up on mDNS, gateway stays up. Telegram channel connects, chat with Nemotron 120B works fine.

The real problem

This workaround can't be made persistent today. NemoClaw's config schema doesn't expose env vars or a preload path. I tried every variation I could think of:

$ nemoclaw <name> config set --key gateway.env.NODE_OPTIONS --value "..."
Cannot modify the gateway section directly.

$ nemoclaw <name> config set --key env.NODE_OPTIONS --value "..."
Key validation failed: "env.NODE_OPTIONS" is not a recognized openclaw config path.

(same error for sandbox.env.*, sandbox.envVars.*, etc.)

So in practice I'm running a relaunch script by hand every time the gateway dies (laptop close, container restart, anything that triggers a respawn). Not viable for an always-on assistant. The whole point is being able to ping it from Telegram any time.

Possible fixes

A few options, roughly in order of surgical-ness:

  1. Wrap the ciao NetworkManager calls in try/catch inside OpenClaw, fall back to no mDNS if os.networkInterfaces() throws. Probably the smallest diff.

  2. Add an OPENCLAW_DISABLE_MDNS=1 env var (or a config flag) that skips loading ciao entirely. Most explicit user-facing fix.

  3. Loosen the OpenShell sandbox seccomp profile to allow the netlink syscall family. Probably not what you want for an isolation-focused product, but listing it for completeness.

  4. As a stopgap until any of the above lands, expose gateway.preload or gateway.env.* in the NemoClaw config schema. That way users can persist the workaround through nemoclaw config set instead of running a script by hand.

Repro

  1. ASUS GX10 / DGX Spark with DGX OS 24.04, Docker preinstalled
  2. curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
  3. nemoclaw onboard. Defaults, Local Ollama with nemotron-3-super:120b, Balanced policy preset, Telegram channel.
  4. Onboard reports success.
  5. Send a message to the bot. No reply.
  6. nemoclaw <name> connect, then tail /tmp/openclaw-998/openclaw-*.log. Stack trace above repeats every ~4 minutes (health-monitor restart cadence).

Happy to send full logs or test fixes against my setup. Easy reproduction on a fresh install.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: dgx-sparkAffects DGX Spark hardware or workflows

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions