fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509) by yimoj · Pull Request #5024 · NVIDIA/NemoClaw

yimoj · 2026-06-09T05:25:00Z

Summary

Reopened #4509: on an Ubuntu 24.04 GPU host-network setup, onboard printed "local inference reachable" yet the agent then failed with ECONNREFUSED / "LLM request failed: network connection error". PR #4609 proved reachability with docker exec against the recreated --network host container — whose main network namespace is the host's — but OpenClaw runs in OpenShell's isolated sandbox network namespace, which cannot reach the host loopback even under --network host. So the direct 127.0.0.1 provider URL was unreachable for the agent while the probe falsely passed. This fixes the URL/network mapping and verifies it from the real runtime context.

Related Issue

Fixes #4509

Changes

No direct container-loopback inference URL. For local providers, an opted-in host-network GPU patch (NEMOCLAW_DOCKER_GPU_PATCH_NETWORK=host) is downgraded to the OpenShell bridge so inference routes through the reachable inference.local path. Host networking is unnecessary for GPU device access (that comes from the GPU mode flags). Non-local (cloud/routed/custom) GPU sandboxes are untouched.
Bridge reachability re-checked after the downgrade (with UFW auto-fix), since gateway startup skipped that probe while host networking was still requested.
Runtime-context reachability gate. The post-ready gate now probes https://inference.local/v1/models via openshell sandbox exec — the exact network namespace and route OpenClaw uses — instead of docker exec. Success requires a 2xx; 000 (ECONNREFUSED), 4xx (route/auth misconfig), and 5xx (backend down) fail with actionable recovery. A genuinely missing curl soft-skips (OpenClaw's HTTP client does not need it); a broken sandbox exec path fails rather than masquerading as missing-curl.
GPU E2E (test/e2e/test-gpu-e2e.sh) now proves inference through openshell sandbox exec (the real runtime) and asserts the new gate, removing the docker exec shortcut that masked the bug.
src/lib/onboard.ts stays net-neutral (orchestration lives in src/lib/onboard/).

Type of Change

Code change (feature, bug fix, or refactor)

Verification

npx prek run --files on the changed files (TS/biome/spdx/shellcheck clean; the only failures were unrelated env-flakes — missing plugin node_modules and 5s CLI-spawn timeouts under a loaded host — which pass with deps installed and a normal timeout: 152/152)
npm run build:cli, npm run typecheck:cli
npx vitest run for the gate (21), test/onboard.test.ts (66), docker-gpu-patch (50), inference/local (65), provider-inference (13), docker-gpu-sandbox-create (5)
Tests added/updated for new and changed behavior (runtime-context probe, 2xx-only, local-only downgrade + bridge re-check, exec-failure vs missing-curl)
No secrets, API keys, or credentials committed

Reporter-workflow E2E evidence

Full reporter reproduction requires Ubuntu 24.04 + NVIDIA GPU + native Docker (host-network GPU patch), which is not available on this CI-less dev host. The exact workflow is covered by the GPU pipeline E2E (test/e2e/test-gpu-e2e.sh, Brev GPU runner), which this PR extends to verify local inference through openshell sandbox exec (the agent runtime netns) and to assert the runtime-context gate — so a future regression cannot pass via the container-main-namespace shortcut.

The root-cause mechanism was reproduced locally and hermetically (no GPU needed), modeling the OpenShell Docker-driver topology — a --network host container plus an inner unshare -n namespace (how OpenShell runs the sandbox agent):

[A] container MAIN netns (== host loopback under --network host; what docker exec / PR #4609 hit):
    http_code=200   RESULT: OK-MAIN  (reaches host Ollama)
[B] INNER netns via unshare -n (== OpenShell sandbox agent runtime / openshell sandbox exec):
    http_code=000   RESULT: FAIL-INNER (ECONNREFUSED — matches the reporter)

This confirms why the docker exec probe passed while the agent got ECONNREFUSED, and why routing through the OpenShell-managed inference.local path (on the bridge) is the reachable fix.

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

Bug Fixes
- Verify GPU local inference from inside the sandbox runtime (not via host-network probes), reducing false positives and handling curl/unreachability scenarios more robustly.
Refactor
- Default Docker GPU patching for local providers now uses the OpenShell-managed bridge instead of host networking to improve inference accessibility and consistency.
Tests
- End-to-end and unit tests updated to exercise the sandbox-side inference path and cover success, skip, retry, and failure cases.

coderabbitai · 2026-06-09T05:25:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93d98021-3e81-47e6-8e13-9dadf97804c2

📥 Commits

Reviewing files that changed from the base of the PR and between 1118a90 and b005d86.

📒 Files selected for processing (4)

src/lib/onboard.ts
src/lib/onboard/docker-gpu-local-inference.test.ts
src/lib/onboard/docker-gpu-local-inference.ts
test/e2e/test-gpu-e2e.sh

🚧 Files skipped from review as they are similar to previous changes (3)

src/lib/onboard/docker-gpu-local-inference.test.ts
src/lib/onboard.ts
src/lib/onboard/docker-gpu-local-inference.ts

📝 Walkthrough

Walkthrough

The PR replaces Docker GPU host-network local inference verification with a preserve-network bridge mode and sandbox-runtime reachability probes. For local inference providers, host-network patching is downgraded to the OpenShell-managed bridge and reachability is verified by executing a bounded curl probe inside the sandbox via openshell sandbox exec against inference.local. Unit tests and E2E tests are updated accordingly.

Changes

GPU Sandbox Preserve-Network Inference Verification

Layer / File(s)	Summary
Local provider detection and preserve-network enforcement `src/lib/onboard/docker-gpu-local-inference.ts`	Add `isLocalInferenceProvider()` helper, define `SANDBOX_RUNTIME_INFERENCE_ENDPOINT`, and implement `enforceDockerGpuPatchPreserveNetwork()` to downgrade host-network mode to OpenShell bridge for local providers and optionally reverify bridge reachability.
Sandbox-runtime inference verification types and probe `src/lib/onboard/docker-gpu-local-inference.ts`	Define `SandboxExecResult`, `DockerGpuSandboxInferenceVerifyDeps`, and `DockerGpuSandboxInferenceVerification` types. Implement `probeSandboxRuntimeInference()` to run a bounded curl script via sandbox exec, emit `NO_CURL` sentinel when curl is missing, parse HTTP codes, and distinguish exec failures from unreachable endpoints.
Verification orchestration and diagnostic output `src/lib/onboard/docker-gpu-local-inference.ts`	Implement `verifyDockerGpuSandboxLocalInference()` orchestrator (skip unless patch active + local provider, soft-skip on missing curl, return ok/failed with provider/endpoint/detail/recovery). Add `getSandboxRuntimeInferenceEndpoint()` endpoint selector. Implement `printDockerGpuSandboxInferenceVerificationFailure()` printer for operator diagnostics.
Update main orchestration and options types `src/lib/onboard/docker-gpu-local-inference.ts`	Update `GpuSandboxAfterReadyOptions` to carry `deps?: DockerGpuSandboxInferenceVerifyDeps`. Refactor `verifyGpuSandboxAfterReady()` to skip inference gate when patch inactive, call `verifyDockerGpuSandboxLocalInference()` with sandbox-runtime deps, log HTTP code on success, print failures via new printer, and exit with status 1 on failure.
Integrate preserve-network enforcement into sandbox creation `src/lib/onboard.ts`	Call `enforceDockerGpuPatchPreserveNetwork()` during sandbox setup to downgrade host-network mode for local providers. Set `sandboxInferenceBaseUrlOverride` to `null`. Remove prior `configureLocalInferenceForDockerGpuHostNetwork()` call from onboarding resume.
Unit test coverage for new verification flow `src/lib/onboard/docker-gpu-local-inference.test.ts`	Update imports and mock setup to use typed sandbox exec mock. Add tests for `enforceDockerGpuPatchPreserveNetwork()`, `getSandboxRuntimeInferenceEndpoint()`, and `verifyDockerGpuSandboxLocalInference()` (skip/soft-skip/ok/fail modes, retries for HTTP_000, curl soft-skip, exec failures). Update `verifyGpuSandboxAfterReady()` tests to use sandbox exec stubs and new verification outcomes; update failure-printer assertions.
E2E test updates for preserve-network sandbox inference `test/e2e/test-gpu-e2e.sh`	Update Phase 4 GPU sandbox reachability gate to assert new sandbox-runtime log marker, skip when patch recreation wasn't exercised, and verify host-network downgrade for local inference. Simplify Phase 5 by removing conditional docker exec vs sandbox exec branching. Update `run_sandbox_inference_probe()` to always execute curl via `openshell sandbox exec` inside the sandbox network namespace.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4609: The main PR refactors the #4509 host-network GPU local inference reachability gate by replacing the container-based verification with a preserve-network sandbox-runtime verification flow.
NVIDIA/NemoClaw#4599: Touches the Docker GPU local inference/onboarding verification flow and verifyGpuSandboxAfterReady() orchestration, related to sandbox verification changes.
NVIDIA/NemoClaw#4407: Modifies verifyGpuSandboxAfterReady orchestration and diagnostics that overlap with the verification/error routing changes in this PR.

Suggested labels

Docker, fix, area: inference, Sandbox, Provider: Ollama, v0.0.57

Suggested reviewers

prekshivyas
zyang-dev

Poem

🐰 I hopped through logs and sandbox shells,

probing bridges where the old story dwells.
From host to preserve the network did glide,
"inference.local" answered from the inside. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(onboard): prove GPU sandbox local inference from the agent runtime (`#4509`)' clearly summarizes the main change: shifting from direct container-loopback probing to runtime-context reachability verification.
Linked Issues check	✅ Passed	The PR directly addresses issue `#4509` by implementing runtime-context local inference verification via sandbox exec, downgrading host-network patching to bridge mode for local providers, and adding E2E validation—all coding requirements from the issue are met.
Out of Scope Changes check	✅ Passed	All changes are focused on fixing the GPU sandbox local inference reachability issue: host-network config removal, sandbox runtime verification implementation, and E2E updates align with the stated PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lib/onboard/docker-gpu-local-inference.ts (1)

182-187: ⚡ Quick win

Consider shell-escaping the endpoint parameter to prevent future injection risk.

The endpoint is directly interpolated into the shell script without escaping. Currently safe because getSandboxRuntimeInferenceEndpoint always returns the constant SANDBOX_RUNTIME_INFERENCE_ENDPOINT, but the function signature accepts any string. If the endpoint ever becomes dynamic, this would be a shell injection vulnerability.

🛡️ Recommended fix: shell-escape the endpoint

One approach is to use a shell-safe quoting helper or validate the endpoint format:

 function probeSandboxRuntimeInference(
   sandboxName: string,
   endpoint: string,
   deps: {
     execInSandbox: NonNullable<DockerGpuSandboxInferenceVerifyDeps["execInSandbox"]>;
     sleep: NonNullable<DockerGpuSandboxInferenceVerifyDeps["sleep"]>;
   },
 ): RuntimeProbeOutcome {
+  // Validate endpoint format to prevent shell injection
+  if (!/^https?:\/\/[a-z0-9._\/-]+$/i.test(endpoint)) {
+    return {
+      kind: "exec-failed",
+      detail: `invalid endpoint format: ${endpoint}`,
+    };
+  }
   const script =
     `if ! command -v curl >/dev/null 2>&1; then echo NO_CURL; exit 0; fi; ` +
     `code=$(curl -so /dev/null -w '%{http_code}' ` +

Alternatively, use shell single-quoting (note: even single quotes need escaping if the endpoint contains single quotes):

-    `--max-time ${DOCKER_GPU_INFERENCE_PROBE_MAX_TIME_SECS} ${endpoint} 2>/dev/null || echo 000); ` +
+    `--max-time ${DOCKER_GPU_INFERENCE_PROBE_MAX_TIME_SECS} '${endpoint.replace(/'/g, "'\\''")}' 2>/dev/null || echo 000); ` +

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/docker-gpu-local-inference.ts` around lines 182 - 187, The
interpolated endpoint in the constructed shell `script` string is not
shell-escaped and could permit injection if `getSandboxRuntimeInferenceEndpoint`
ever returns a dynamic value; update the code that builds `script` (the template
string assigned to `script` in docker-gpu-local-inference.ts) to safely
quote/escape the `endpoint` variable before inserting it (e.g., single-quote the
value and replace any embedded single quotes with the POSIX-safe '\'' sequence,
or use a shell-quoting helper such as printf %q), so the curl invocation uses
the escaped endpoint and all tests/usage of DOCKER_GPU_INFERENCE_PROBE_*
constants remain unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-gpu-e2e.sh`:
- Around line 569-576: run_sandbox_inference_probe currently clears
sandbox_probe_failure and discards the exit status of the probe by using "||
true", which hides TIMEOUT_CMD / openshell sandbox exec failures; change the
function so it captures the command's exit code and sets sandbox_probe_failure
on non-zero exit (while still capturing stdout/stderr into sandbox_response),
e.g. run the probe via TIMEOUT_CMD openshell sandbox exec ... -- sh -lc
"$sandbox_curl_cmd", save its output into sandbox_response and its exit code
into a local variable, and if the exit code is non-zero set
sandbox_probe_failure to a descriptive value (or return non-zero) instead of
swallowing the failure. Ensure references: run_sandbox_inference_probe,
sandbox_probe_failure, sandbox_response, TIMEOUT_CMD, openshell sandbox exec,
and $sandbox_curl_cmd.

---

Nitpick comments:
In `@src/lib/onboard/docker-gpu-local-inference.ts`:
- Around line 182-187: The interpolated endpoint in the constructed shell
`script` string is not shell-escaped and could permit injection if
`getSandboxRuntimeInferenceEndpoint` ever returns a dynamic value; update the
code that builds `script` (the template string assigned to `script` in
docker-gpu-local-inference.ts) to safely quote/escape the `endpoint` variable
before inserting it (e.g., single-quote the value and replace any embedded
single quotes with the POSIX-safe '\'' sequence, or use a shell-quoting helper
such as printf %q), so the curl invocation uses the escaped endpoint and all
tests/usage of DOCKER_GPU_INFERENCE_PROBE_* constants remain unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fd7bd0b-e945-4416-827e-5eef9c60f2ec

📥 Commits

Reviewing files that changed from the base of the PR and between d9db199 and 1118a90.

📒 Files selected for processing (4)

src/lib/onboard.ts
src/lib/onboard/docker-gpu-local-inference.test.ts
src/lib/onboard/docker-gpu-local-inference.ts
test/e2e/test-gpu-e2e.sh

NVIDIA#4509) PR NVIDIA#4609 verified host-network GPU local inference with `docker exec` against the recreated `--network host` container, whose main network namespace IS the host's — so the probe passed while the OpenClaw agent, which runs in OpenShell's isolated sandbox network namespace, still got ECONNREFUSED on the direct 127.0.0.1 provider URL. The sandbox namespace cannot reach the host loopback even under `--network host` (see detectSandboxFallbackDns), so the direct-loopback wiring was unreachable. - Never pin OpenClaw to a direct container-loopback inference URL; for local providers, downgrade an opted-in host-network GPU patch to the OpenShell bridge so inference routes through the reachable inference.local path (host networking is not needed for GPU access). - Re-run the sandbox bridge reachability probe (with UFW auto-fix) after the downgrade, since gateway startup skipped it under host mode. - Replace the docker-exec gate with a runtime-context probe via `openshell sandbox exec` that hits inference.local exactly as the agent does, requiring 2xx; 000/4xx/5xx fail with actionable recovery. Soft-skip only when the sandbox image genuinely lacks curl. - Update the GPU E2E to prove inference through `openshell sandbox exec` (the real runtime), removing the docker-exec shortcut that masked the bug. Signed-off-by: Yimo Jiang <yimoj@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yimoj · 2026-06-09T05:35:06Z

Addressed both CodeRabbit findings in b005d8646:

docker-gpu-local-inference.ts: the probe endpoint is now POSIX single-quoted (with embedded-quote escaping) before interpolation into the openshell sandbox exec script, so it stays injection-safe even if the endpoint ever becomes dynamic.
test-gpu-e2e.sh: run_sandbox_inference_probe now captures the exec exit status and sets sandbox_probe_failure (timeout → 124, otherwise the exec error) instead of swallowing it with || true, so timeouts/exec errors fail fast with an accurate reason rather than a misleading "expected PONG".

## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests.  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples.  --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>

yimoj force-pushed the fix/4509-ollama-host-network-runtime branch from 1118a90 to 564a0ae Compare June 9, 2026 05:28

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread test/e2e/test-gpu-e2e.sh

yimoj force-pushed the fix/4509-ollama-host-network-runtime branch from 564a0ae to b005d86 Compare June 9, 2026 05:34

yimoj added the v0.0.62 Release target label Jun 9, 2026

cv approved these changes Jun 9, 2026

View reviewed changes

cv merged commit 4c2de7e into NVIDIA:main Jun 9, 2026
33 checks passed

miyoungc mentioned this pull request Jun 10, 2026

docs: refresh v0.0.62 release docs #5157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509)#5024

fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509)#5024
cv merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/4509-ollama-host-network-runtime

yimoj commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

yimoj commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yimoj commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Reporter-workflow E2E evidence

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yimoj commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yimoj commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading