Skip to content

fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509)#5024

Merged
cv merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/4509-ollama-host-network-runtime
Jun 9, 2026
Merged

fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509)#5024
cv merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/4509-ollama-host-network-runtime

Conversation

@yimoj

@yimoj yimoj commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Reopened #4509: on an Ubuntu 24.04 GPU host-network setup, onboard printed "local inference reachable" yet the agent then failed with ECONNREFUSED / "LLM request failed: network connection error". PR #4609 proved reachability with docker exec against the recreated --network host container — whose main network namespace is the host's — but OpenClaw runs in OpenShell's isolated sandbox network namespace, which cannot reach the host loopback even under --network host. So the direct 127.0.0.1 provider URL was unreachable for the agent while the probe falsely passed. This fixes the URL/network mapping and verifies it from the real runtime context.

Related Issue

Fixes #4509

Changes

  • No direct container-loopback inference URL. For local providers, an opted-in host-network GPU patch (NEMOCLAW_DOCKER_GPU_PATCH_NETWORK=host) is downgraded to the OpenShell bridge so inference routes through the reachable inference.local path. Host networking is unnecessary for GPU device access (that comes from the GPU mode flags). Non-local (cloud/routed/custom) GPU sandboxes are untouched.
  • Bridge reachability re-checked after the downgrade (with UFW auto-fix), since gateway startup skipped that probe while host networking was still requested.
  • Runtime-context reachability gate. The post-ready gate now probes https://inference.local/v1/models via openshell sandbox exec — the exact network namespace and route OpenClaw uses — instead of docker exec. Success requires a 2xx; 000 (ECONNREFUSED), 4xx (route/auth misconfig), and 5xx (backend down) fail with actionable recovery. A genuinely missing curl soft-skips (OpenClaw's HTTP client does not need it); a broken sandbox exec path fails rather than masquerading as missing-curl.
  • GPU E2E (test/e2e/test-gpu-e2e.sh) now proves inference through openshell sandbox exec (the real runtime) and asserts the new gate, removing the docker exec shortcut that masked the bug.
  • src/lib/onboard.ts stays net-neutral (orchestration lives in src/lib/onboard/).

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • npx prek run --files on the changed files (TS/biome/spdx/shellcheck clean; the only failures were unrelated env-flakes — missing plugin node_modules and 5s CLI-spawn timeouts under a loaded host — which pass with deps installed and a normal timeout: 152/152)
  • npm run build:cli, npm run typecheck:cli
  • npx vitest run for the gate (21), test/onboard.test.ts (66), docker-gpu-patch (50), inference/local (65), provider-inference (13), docker-gpu-sandbox-create (5)
  • Tests added/updated for new and changed behavior (runtime-context probe, 2xx-only, local-only downgrade + bridge re-check, exec-failure vs missing-curl)
  • No secrets, API keys, or credentials committed

Reporter-workflow E2E evidence

Full reporter reproduction requires Ubuntu 24.04 + NVIDIA GPU + native Docker (host-network GPU patch), which is not available on this CI-less dev host. The exact workflow is covered by the GPU pipeline E2E (test/e2e/test-gpu-e2e.sh, Brev GPU runner), which this PR extends to verify local inference through openshell sandbox exec (the agent runtime netns) and to assert the runtime-context gate — so a future regression cannot pass via the container-main-namespace shortcut.

The root-cause mechanism was reproduced locally and hermetically (no GPU needed), modeling the OpenShell Docker-driver topology — a --network host container plus an inner unshare -n namespace (how OpenShell runs the sandbox agent):

[A] container MAIN netns (== host loopback under --network host; what docker exec / PR #4609 hit):
    http_code=200   RESULT: OK-MAIN  (reaches host Ollama)
[B] INNER netns via unshare -n (== OpenShell sandbox agent runtime / openshell sandbox exec):
    http_code=000   RESULT: FAIL-INNER (ECONNREFUSED — matches the reporter)

This confirms why the docker exec probe passed while the agent got ECONNREFUSED, and why routing through the OpenShell-managed inference.local path (on the bridge) is the reachable fix.


Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Verify GPU local inference from inside the sandbox runtime (not via host-network probes), reducing false positives and handling curl/unreachability scenarios more robustly.
  • Refactor

    • Default Docker GPU patching for local providers now uses the OpenShell-managed bridge instead of host networking to improve inference accessibility and consistency.
  • Tests

    • End-to-end and unit tests updated to exercise the sandbox-side inference path and cover success, skip, retry, and failure cases.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93d98021-3e81-47e6-8e13-9dadf97804c2

📥 Commits

Reviewing files that changed from the base of the PR and between 1118a90 and b005d86.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard/docker-gpu-local-inference.ts
  • test/e2e/test-gpu-e2e.sh
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-local-inference.ts

📝 Walkthrough

Walkthrough

The PR replaces Docker GPU host-network local inference verification with a preserve-network bridge mode and sandbox-runtime reachability probes. For local inference providers, host-network patching is downgraded to the OpenShell-managed bridge and reachability is verified by executing a bounded curl probe inside the sandbox via openshell sandbox exec against inference.local. Unit tests and E2E tests are updated accordingly.

Changes

GPU Sandbox Preserve-Network Inference Verification

Layer / File(s) Summary
Local provider detection and preserve-network enforcement
src/lib/onboard/docker-gpu-local-inference.ts
Add isLocalInferenceProvider() helper, define SANDBOX_RUNTIME_INFERENCE_ENDPOINT, and implement enforceDockerGpuPatchPreserveNetwork() to downgrade host-network mode to OpenShell bridge for local providers and optionally reverify bridge reachability.
Sandbox-runtime inference verification types and probe
src/lib/onboard/docker-gpu-local-inference.ts
Define SandboxExecResult, DockerGpuSandboxInferenceVerifyDeps, and DockerGpuSandboxInferenceVerification types. Implement probeSandboxRuntimeInference() to run a bounded curl script via sandbox exec, emit NO_CURL sentinel when curl is missing, parse HTTP codes, and distinguish exec failures from unreachable endpoints.
Verification orchestration and diagnostic output
src/lib/onboard/docker-gpu-local-inference.ts
Implement verifyDockerGpuSandboxLocalInference() orchestrator (skip unless patch active + local provider, soft-skip on missing curl, return ok/failed with provider/endpoint/detail/recovery). Add getSandboxRuntimeInferenceEndpoint() endpoint selector. Implement printDockerGpuSandboxInferenceVerificationFailure() printer for operator diagnostics.
Update main orchestration and options types
src/lib/onboard/docker-gpu-local-inference.ts
Update GpuSandboxAfterReadyOptions to carry deps?: DockerGpuSandboxInferenceVerifyDeps. Refactor verifyGpuSandboxAfterReady() to skip inference gate when patch inactive, call verifyDockerGpuSandboxLocalInference() with sandbox-runtime deps, log HTTP code on success, print failures via new printer, and exit with status 1 on failure.
Integrate preserve-network enforcement into sandbox creation
src/lib/onboard.ts
Call enforceDockerGpuPatchPreserveNetwork() during sandbox setup to downgrade host-network mode for local providers. Set sandboxInferenceBaseUrlOverride to null. Remove prior configureLocalInferenceForDockerGpuHostNetwork() call from onboarding resume.
Unit test coverage for new verification flow
src/lib/onboard/docker-gpu-local-inference.test.ts
Update imports and mock setup to use typed sandbox exec mock. Add tests for enforceDockerGpuPatchPreserveNetwork(), getSandboxRuntimeInferenceEndpoint(), and verifyDockerGpuSandboxLocalInference() (skip/soft-skip/ok/fail modes, retries for HTTP_000, curl soft-skip, exec failures). Update verifyGpuSandboxAfterReady() tests to use sandbox exec stubs and new verification outcomes; update failure-printer assertions.
E2E test updates for preserve-network sandbox inference
test/e2e/test-gpu-e2e.sh
Update Phase 4 GPU sandbox reachability gate to assert new sandbox-runtime log marker, skip when patch recreation wasn't exercised, and verify host-network downgrade for local inference. Simplify Phase 5 by removing conditional docker exec vs sandbox exec branching. Update run_sandbox_inference_probe() to always execute curl via openshell sandbox exec inside the sandbox network namespace.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4609: The main PR refactors the #4509 host-network GPU local inference reachability gate by replacing the container-based verification with a preserve-network sandbox-runtime verification flow.
  • NVIDIA/NemoClaw#4599: Touches the Docker GPU local inference/onboarding verification flow and verifyGpuSandboxAfterReady() orchestration, related to sandbox verification changes.
  • NVIDIA/NemoClaw#4407: Modifies verifyGpuSandboxAfterReady orchestration and diagnostics that overlap with the verification/error routing changes in this PR.

Suggested labels

Docker, fix, area: inference, Sandbox, Provider: Ollama, v0.0.57

Suggested reviewers

  • prekshivyas
  • zyang-dev

Poem

🐰 I hopped through logs and sandbox shells,

probing bridges where the old story dwells.
From host to preserve the network did glide,
"inference.local" answered from the inside. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(onboard): prove GPU sandbox local inference from the agent runtime (#4509)' clearly summarizes the main change: shifting from direct container-loopback probing to runtime-context reachability verification.
Linked Issues check ✅ Passed The PR directly addresses issue #4509 by implementing runtime-context local inference verification via sandbox exec, downgrading host-network patching to bridge mode for local providers, and adding E2E validation—all coding requirements from the issue are met.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the GPU sandbox local inference reachability issue: host-network config removal, sandbox runtime verification implementation, and E2E updates align with the stated PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yimoj yimoj force-pushed the fix/4509-ollama-host-network-runtime branch from 1118a90 to 564a0ae Compare June 9, 2026 05:28

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/onboard/docker-gpu-local-inference.ts (1)

182-187: ⚡ Quick win

Consider shell-escaping the endpoint parameter to prevent future injection risk.

The endpoint is directly interpolated into the shell script without escaping. Currently safe because getSandboxRuntimeInferenceEndpoint always returns the constant SANDBOX_RUNTIME_INFERENCE_ENDPOINT, but the function signature accepts any string. If the endpoint ever becomes dynamic, this would be a shell injection vulnerability.

🛡️ Recommended fix: shell-escape the endpoint

One approach is to use a shell-safe quoting helper or validate the endpoint format:

 function probeSandboxRuntimeInference(
   sandboxName: string,
   endpoint: string,
   deps: {
     execInSandbox: NonNullable<DockerGpuSandboxInferenceVerifyDeps["execInSandbox"]>;
     sleep: NonNullable<DockerGpuSandboxInferenceVerifyDeps["sleep"]>;
   },
 ): RuntimeProbeOutcome {
+  // Validate endpoint format to prevent shell injection
+  if (!/^https?:\/\/[a-z0-9._\/-]+$/i.test(endpoint)) {
+    return {
+      kind: "exec-failed",
+      detail: `invalid endpoint format: ${endpoint}`,
+    };
+  }
   const script =
     `if ! command -v curl >/dev/null 2>&1; then echo NO_CURL; exit 0; fi; ` +
     `code=$(curl -so /dev/null -w '%{http_code}' ` +

Alternatively, use shell single-quoting (note: even single quotes need escaping if the endpoint contains single quotes):

-    `--max-time ${DOCKER_GPU_INFERENCE_PROBE_MAX_TIME_SECS} ${endpoint} 2>/dev/null || echo 000); ` +
+    `--max-time ${DOCKER_GPU_INFERENCE_PROBE_MAX_TIME_SECS} '${endpoint.replace(/'/g, "'\\''")}' 2>/dev/null || echo 000); ` +
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/docker-gpu-local-inference.ts` around lines 182 - 187, The
interpolated endpoint in the constructed shell `script` string is not
shell-escaped and could permit injection if `getSandboxRuntimeInferenceEndpoint`
ever returns a dynamic value; update the code that builds `script` (the template
string assigned to `script` in docker-gpu-local-inference.ts) to safely
quote/escape the `endpoint` variable before inserting it (e.g., single-quote the
value and replace any embedded single quotes with the POSIX-safe '\'' sequence,
or use a shell-quoting helper such as printf %q), so the curl invocation uses
the escaped endpoint and all tests/usage of DOCKER_GPU_INFERENCE_PROBE_*
constants remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-gpu-e2e.sh`:
- Around line 569-576: run_sandbox_inference_probe currently clears
sandbox_probe_failure and discards the exit status of the probe by using "||
true", which hides TIMEOUT_CMD / openshell sandbox exec failures; change the
function so it captures the command's exit code and sets sandbox_probe_failure
on non-zero exit (while still capturing stdout/stderr into sandbox_response),
e.g. run the probe via TIMEOUT_CMD openshell sandbox exec ... -- sh -lc
"$sandbox_curl_cmd", save its output into sandbox_response and its exit code
into a local variable, and if the exit code is non-zero set
sandbox_probe_failure to a descriptive value (or return non-zero) instead of
swallowing the failure. Ensure references: run_sandbox_inference_probe,
sandbox_probe_failure, sandbox_response, TIMEOUT_CMD, openshell sandbox exec,
and $sandbox_curl_cmd.

---

Nitpick comments:
In `@src/lib/onboard/docker-gpu-local-inference.ts`:
- Around line 182-187: The interpolated endpoint in the constructed shell
`script` string is not shell-escaped and could permit injection if
`getSandboxRuntimeInferenceEndpoint` ever returns a dynamic value; update the
code that builds `script` (the template string assigned to `script` in
docker-gpu-local-inference.ts) to safely quote/escape the `endpoint` variable
before inserting it (e.g., single-quote the value and replace any embedded
single quotes with the POSIX-safe '\'' sequence, or use a shell-quoting helper
such as printf %q), so the curl invocation uses the escaped endpoint and all
tests/usage of DOCKER_GPU_INFERENCE_PROBE_* constants remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fd7bd0b-e945-4416-827e-5eef9c60f2ec

📥 Commits

Reviewing files that changed from the base of the PR and between d9db199 and 1118a90.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard/docker-gpu-local-inference.ts
  • test/e2e/test-gpu-e2e.sh

Comment thread test/e2e/test-gpu-e2e.sh
NVIDIA#4509)

PR NVIDIA#4609 verified host-network GPU local inference with `docker exec`
against the recreated `--network host` container, whose main network
namespace IS the host's — so the probe passed while the OpenClaw agent,
which runs in OpenShell's isolated sandbox network namespace, still got
ECONNREFUSED on the direct 127.0.0.1 provider URL. The sandbox namespace
cannot reach the host loopback even under `--network host` (see
detectSandboxFallbackDns), so the direct-loopback wiring was unreachable.

- Never pin OpenClaw to a direct container-loopback inference URL; for
  local providers, downgrade an opted-in host-network GPU patch to the
  OpenShell bridge so inference routes through the reachable
  inference.local path (host networking is not needed for GPU access).
- Re-run the sandbox bridge reachability probe (with UFW auto-fix) after
  the downgrade, since gateway startup skipped it under host mode.
- Replace the docker-exec gate with a runtime-context probe via
  `openshell sandbox exec` that hits inference.local exactly as the agent
  does, requiring 2xx; 000/4xx/5xx fail with actionable recovery. Soft-skip
  only when the sandbox image genuinely lacks curl.
- Update the GPU E2E to prove inference through `openshell sandbox exec`
  (the real runtime), removing the docker-exec shortcut that masked the bug.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yimoj yimoj force-pushed the fix/4509-ollama-host-network-runtime branch from 564a0ae to b005d86 Compare June 9, 2026 05:34
@yimoj

yimoj commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Addressed both CodeRabbit findings in b005d8646:

  • docker-gpu-local-inference.ts: the probe endpoint is now POSIX single-quoted (with embedded-quote escaping) before interpolation into the openshell sandbox exec script, so it stays injection-safe even if the endpoint ever becomes dynamic.
  • test-gpu-e2e.sh: run_sandbox_inference_probe now captures the exec exit status and sets sandbox_probe_failure (timeout → 124, otherwise the exec error) instead of swallowing it with || true, so timeouts/exec errors fail fast with an accurate reason rather than a misleading "expected PONG".

@yimoj yimoj added the v0.0.62 Release target label Jun 9, 2026
@cv cv merged commit 4c2de7e into NVIDIA:main Jun 9, 2026
33 checks passed
jyaunches pushed a commit that referenced this pull request Jun 10, 2026
## Summary
- Add v0.0.62 release notes from Discussion #5100 and link release
highlights to the relevant docs pages.
- Document the release's GPU sandbox recreation, sandbox-side local
inference verification, and Hermes dashboard port guard in the command
and inference references.
- Refresh generated NemoClaw user skills for the release-prep docs set.

## Source Summary
- #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU
recreation behavior for Linux Docker-driver sandboxes.
- #5024 -> `docs/inference/use-local-inference.mdx`: Document
sandbox-runtime verification of the `inference.local` local inference
route.
- #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra
device-node group propagation for sandbox CUDA initialization.
- #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`:
Summarize onboarding and recovery reliability fixes, including the
reserved Hermes API port guard.
- #5017 and #5043 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Summarize mutable OpenClaw config
recovery and host-side `agents list` coverage.
- #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes
upstream metadata visibility and WhatsApp QR rendering reliability.
- #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`:
Refresh generated user-skill references from the current docs source.

## Skipped
- #5019 -> skipped for new prose because it touched
`openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`.
Existing source docs remain the source for generated skill
synchronization.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning)
- Pre-commit hooks passed during commit, including docs-to-skills
verification, markdown lint, gitleaks, and skills YAML tests.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added `nemoclaw <name> agents list` command.
* v0.0.62 release notes added summarizing onboarding and recovery
improvements.

* **Bug Fixes**
* Improved GPU sandbox onboarding reliability (NVIDIA CDI path,
Jetson/Tegra device handling).
* Better local inference verification and recovery for Linux
Docker-driver GPU sandboxes.
  * Quieter/earlier handling of onboarding drift and port collisions.

* **Documentation**
* Expanded GPU passthrough, inference verification, writable paths
(`/dev/pts`), port 8642 restriction, and command examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Inference] GPU host-network sandbox cannot reach local Ollama provider

3 participants