Skip to content

fix(onboard): gate host-network GPU local inference reachability (#4509)#4609

Merged
cv merged 2 commits into
NVIDIA:mainfrom
yimoj:fix/4509-host-network-ollama-proof
Jun 1, 2026
Merged

fix(onboard): gate host-network GPU local inference reachability (#4509)#4609
cv merged 2 commits into
NVIDIA:mainfrom
yimoj:fix/4509-host-network-ollama-proof

Conversation

@yimoj

@yimoj yimoj commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

The Docker-driver GPU host-network path recreates the sandbox with --network host and wires OpenClaw to the direct 127.0.0.1 Ollama/vLLM URL, but onboarding declared success without proving the recreated container could actually reach that endpoint. A failed host-network recreate, an unexpected non-host network mode, or a host provider binding/state problem only surfaced later as an opaque ECONNREFUSED during the first agent prompt. This adds a post-recreate reachability gate so onboarding fails early with actionable output.

Related Issue

Fixes #4509

Changes

  • Add verifyDockerGpuHostNetworkLocalInference in src/lib/onboard/docker-gpu-local-inference.ts: on the Docker-driver GPU host-network local-inference path it resolves the recreated OpenShell-managed container, asserts HostConfig.NetworkMode is host, and runs a bounded docker exec curl probe against the direct loopback health endpoint (/api/tags for Ollama, /v1/models for vLLM).
  • On failure, surface the selected endpoint, network mode, container id, and short recovery hints, then fail onboarding (no silent continue). The gate self-skips when the patch is opted out (NEMOCLAW_DOCKER_GPU_PATCH=0), when the network mode is not host, or — to avoid false negatives — when a minimal/custom image lacks curl (soft-skip with a warning).
  • Orchestrate via verifyGpuSandboxAfterReady so src/lib/onboard.ts stays net-neutral per the codebase-growth guardrail (logic lives under src/lib/onboard/).
  • Extend test/e2e/test-gpu-e2e.sh to assert the reachability proof when the direct sandbox URL is active, instead of only discovering failure during the agent prompt.
  • Add focused unit tests for the container-inspection / probe decision logic and the orchestrator.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm test passes (unrelated e2e-scenario framework tests flaked on the shared host's default 5s timeout under concurrent load; they pass green at a 30s timeout in isolation)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • npm run build:cli and npm run typecheck:cli clean; biome check clean

Hardware-gated E2E gap

The full host-network proof requires an Ubuntu 24.04 + NVIDIA GPU + native Docker environment, which the triage host does not have. The container-inspection and probe decision logic is covered by unit tests with mocked Docker/OpenShell adapters; the live GPU host-network proof is exercised by test/e2e/test-gpu-e2e.sh on GPU hardware.


Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • New Features

    • GPU sandbox verification now runs an additional host-network reachability gate when applicable, with retries and clear, actionable failure diagnostics; onboarding may stop if this gate fails.
  • Tests

    • Added comprehensive unit tests for host-network GPU local inference scenarios.
    • Enhanced GPU end-to-end tests to validate host-network reachability when the direct sandbox URL path is used.

…DIA#4509)

The Docker-driver GPU host-network path recreates the sandbox with
--network host and wires OpenClaw to the direct 127.0.0.1 Ollama/vLLM
URL, but onboarding declared success without proving the real container
could reach that endpoint. A failed host-network recreate, an
unexpected non-host network mode, or a host provider binding problem
only surfaced later as an opaque ECONNREFUSED during an agent prompt.

Add a post-recreate verification gate (verifyDockerGpuHostNetworkLocal
Inference) that, only on the Docker-driver GPU host-network local
inference path, resolves the recreated OpenShell-managed container,
asserts HostConfig.NetworkMode is host, and runs a bounded
docker exec curl probe against the direct loopback health endpoint
(/api/tags for Ollama, /v1/models for vLLM). On failure it surfaces the
endpoint, network mode, container id, and recovery hints, then fails
onboarding early. Minimal/custom images lacking curl soft-skip with a
warning instead of a false negative.

The orchestration lives in src/lib/onboard/docker-gpu-local-inference.ts
(verifyGpuSandboxAfterReady) so onboard.ts stays net-neutral per the
codebase-growth guardrail. Extends test/e2e/test-gpu-e2e.sh to assert
the reachability proof when the direct sandbox URL is active.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dee57132-475b-4427-863f-2c97636decca

📥 Commits

Reviewing files that changed from the base of the PR and between 7f0388d and 28ce188.

📒 Files selected for processing (2)
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard/docker-gpu-local-inference.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard/docker-gpu-local-inference.ts

📝 Walkthrough

Walkthrough

The PR adds a post-ready reachability verification gate for GPU sandboxes using Docker host-network patching. Onboarding now runs the GPU proof and, when applicable, verifies the recreated container can reach the provider's local inference endpoint via curl probing with retries, emitting diagnostics or exiting on failure.

Changes

Docker GPU host-network reachability verification

Layer / File(s) Summary
Host-network verification infrastructure
src/lib/onboard/docker-gpu-local-inference.ts
Reworked imports and added timeout/retry constants, extended DockerGpuLocalInferenceOptions with optional platform parameter, and refactored shouldUseDockerGpuPatchHostNetwork to pass environment and platform context to the patch decision helper. Exported types for verification dependencies and results.
Host-network inference verification implementation
src/lib/onboard/docker-gpu-local-inference.ts
Implemented verifyDockerGpuHostNetworkLocalInference as a gated verification step: skips when patch is inactive or provider is non-local, resolves the recreated container, validates host-network mode, probes reachability with curl and structured retries, soft-skips if curl is missing, otherwise returns success or failure with recovery guidance. Added printDockerGpuHostNetworkInferenceVerificationFailure to emit formatted diagnostics including container, network mode, endpoint, and recovery steps.
GPU sandbox post-ready orchestrator
src/lib/onboard/docker-gpu-local-inference.ts
Implemented verifyGpuSandboxAfterReady to orchestrate GPU proof and optional host-network reachability gate: runs direct GPU proof first, skips host-network verification if patch is inactive, otherwise runs reachability verifier, logs success or prints diagnostics and exits with code 1 on failure.
Onboard flow GPU verification integration
src/lib/onboard.ts
Replaced direct verifyDirectSandboxGpu(sandboxName) try/catch with dockerGpuLocalInference.verifyGpuSandboxAfterReady(...), passing sandbox name, gateway, patch state, verifier, selected mode, and runCaptureOpenshell context to coordinate both GPU proof and host-network readiness gating.
Host-network verification test suite
src/lib/onboard/docker-gpu-local-inference.test.ts
Added comprehensive Vitest coverage: shouldUseDockerGpuPatchHostNetwork behavior for Linux Docker-driver host-network path only; verifyDockerGpuHostNetworkLocalInference skip conditions (inactive patch, non-local provider, missing patch), successful host-network reconciliation with endpoint probe, failure cases for missing/incorrect-mode containers, probe retry behavior with sleep timing, soft-skip when curl is unavailable; verifyGpuSandboxAfterReady orchestration with GPU proof + inference gate, failure routing through error sink + exit, and gate bypass when patch is inactive; printDockerGpuHostNetworkInferenceVerificationFailure formatted output with container, network mode, endpoint, and recovery hints.
E2E GPU host-network verification gate
test/e2e/test-gpu-e2e.sh
Added conditional log-based validation that checks for the direct sandbox URL onboarding message and requires a corresponding host-network local inference reachability confirmation, failing the test if the direct URL path is active but reachability is not proven.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 I hopped through logs and curl's small cheer,

Containers greet hosts, no longer fear,
Proof and probe now dance in line,
Sandboxes ready, signals fine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a reachability gate for host-network GPU local inference, which is the core objective of this PR.
Linked Issues check ✅ Passed The PR successfully addresses issue #4509 by implementing a reachability verification gate that ensures GPU host-network sandboxes can reach local Ollama/vLLM endpoints before onboarding completes.
Out of Scope Changes check ✅ Passed All changes are focused and scoped to implementing the host-network reachability verification gate, with no unrelated modifications introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/docker-gpu-local-inference.ts`:
- Around line 293-297: The skip-path currently only calls options.log?.(...) so
the curl-missing warning is dropped when no logger is provided; update the
branch that checks containerHasCurl(containerId, dockerRunFn) to always emit the
warning (e.g., call console.warn or console.info) in addition to calling
options.log?.(...) so the operator always sees why the reachability probe was
skipped; keep the return { status: "skipped", reason: "probe-tool-unavailable" }
unchanged and make sure you reference the existing containerHasCurl,
dockerRunFn, and options.log? symbols.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5d0b971f-a5b6-499d-a4e9-c3792235bd2f

📥 Commits

Reviewing files that changed from the base of the PR and between df7d054 and 7f0388d.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-local-inference.test.ts
  • src/lib/onboard/docker-gpu-local-inference.ts
  • test/e2e/test-gpu-e2e.sh

Comment thread src/lib/onboard/docker-gpu-local-inference.ts
…DIA#4509)

Address CodeRabbit review on PR NVIDIA#4609: the curl-missing soft-skip used
options.log?.() which silently dropped the warning when no logger was
wired, leaving the operator with no explanation for why the reachability
proof was skipped. Fall back to console.warn so the skip is always
visible. Also add concise docstrings to the new helper functions to
clear the docstring-coverage warning.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj yimoj added the v0.0.56 Release target label Jun 1, 2026
@wscurran wscurran added Docker provider: ollama Ollama local model provider behavior labels Jun 1, 2026
@wscurran

wscurran commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@cv cv added v0.0.57 Release target and removed v0.0.56 Release target labels Jun 1, 2026
@cv cv merged commit ca410b5 into NVIDIA:main Jun 1, 2026
30 checks passed
miyoungc added a commit that referenced this pull request Jun 1, 2026
## Summary

- Adds the v0.0.56 release notes section with links to the deeper docs
pages for installer, status, inference, messaging, policy, and lifecycle
changes.
- Updates source docs for the remaining release-prep gaps around `uv` in
the PyPI preset, compact WhatsApp pairing guidance, and `nemoclaw
inference set` command boundaries.
- Refreshes generated `nemoclaw-user-*` skills and removes skipped
experimental command terms from generated skill surfaces.

## Source summary

- #4613 -> `docs/manage-sandboxes/lifecycle.mdx`,
`docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents
that public installs and `nemoclaw update` follow the maintained `lkg`
tag by default.
- #4419 -> `docs/about/release-notes.mdx`: Notes that non-interactive
Linux installs can reactivate Docker group membership and continue in
one installer run when `sg docker` is available.
- #4550 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Captures live sandbox agent-version
probing for status, connect, and upgrade checks.
- #4609 -> `docs/inference/use-local-inference.mdx`,
`docs/about/release-notes.mdx`: Captures the GPU Docker-driver
host-network local-inference reachability gate.
- #4607 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents
compact WhatsApp QR pairing guidance and gateway/session diagnostics.
- #4582 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Reflects
Slack credential validation before enabling the channel.
- #4554 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`:
Keeps Telegram allowlist alias guidance in the generated user skills and
release notes.
- #4563 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Includes the new `nemoclaw <name> skill
remove <skill>` command in command docs and release notes.
- #4566 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Documents the `nemoclaw inference set`
redirect boundary when `--provider` or `--model` is missing.
- #4323 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Captures per-sandbox status JSON
support.
- #4506 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Captures debug command sandbox-name
validation and safer tarball writing.
- #4569 -> `docs/network-policy/integration-policy-examples.mdx`,
`docs/about/release-notes.mdx`: Documents that the `pypi` preset allows
`/usr/local/bin/uv`.
- #4579 -> `docs/network-policy/integration-policy-examples.mdx`,
`docs/about/release-notes.mdx`: Captures observable Jira preset
validation guidance.
- #4229 -> `docs/manage-sandboxes/lifecycle.mdx`,
`docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents
user-data preservation defaults for uninstall.
- #4399 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Captures CPU-only sandbox intent
preservation across rebuilds.
- #4058 -> `docs/reference/commands.mdx`,
`docs/about/release-notes.mdx`: Captures safer snapshot restore behavior
around existing destinations.
- #4155 and #4460 -> skipped by `docs/.docs-skip`: Removed skipped
experimental command terms from source docs and generated skill evals
instead of documenting those features.

## Verification

- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs` (passes; Fern reports the pre-existing light-mode
accent contrast warning)
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" .agents/skills` (no matches)
- `npm run build:cli` (run to refresh local CLI artifacts for the
pre-push TypeScript hook)
- Commit hooks passed, including `NEMOCLAW_* env-var documentation
gate`, `Verify docs-to-skills output`, `markdownlint-cli2`, `gitleaks`,
and `Test (skills YAML)`.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Expanded Model Router setup with YAML examples, flow diagrams, and
credential handling; strengthened agent-config immutability and
integrity guidance; messaging channels updated (Telegram aliases,
WhatsApp pairing/diagnostics); CLI docs revised (GPU detection,
inference set behavior, uninstall/rebuild preservation); overview
rebranded to NemoClaw and added v0.0.56 release notes.

* **New Features**
* Added `nemoclaw <name> channels status` (messaging diagnostics, JSON);
added `nemoclaw <name> skill remove`; Hermes no longer marked
experimental; DGX Spark quickstart sandbox-name note.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: packaging Packages, images, registries, installers, or distribution bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images and removed area: packaging Packages, images, registries, installers, or distribution labels Jun 3, 2026
cv pushed a commit that referenced this pull request Jun 3, 2026
## Summary
- Add the missing `v0.0.57` release-notes section with links to the
detailed docs pages for command, inference, onboarding, messaging,
status, installer, and policy changes.
- Remove public references to docs-skip terms from source docs and
regenerate the NemoClaw user skills from the current Fern MDX docs.
- Carry forward generated references for the per-agent documentation
split, including Hermes-specific reference files.

## Source summary
- #4615 and #4653 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Release notes now cover host-side
`sessions` and `agents` commands plus `NEMOCLAW_EXTRA_AGENTS_JSON`
secondary-agent baking.
- #4163, #4204, #4611, #4619, and #4676 ->
`docs/about/release-notes.mdx`,
`docs/inference/use-local-inference.mdx`: Release notes now cover
managed vLLM progress/readiness, DGX Spark model default changes, local
Ollama streaming usage, and inference route divergence warnings.
- #4267, #4601, #4609, #4642, #4645, and #4661 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release
notes now cover UFW auto-remediation, local-inference reachability
gates, gateway reuse/binding, cancel rollback, and policy selection
persistence.
- #4577, #4582, #4607, and #4660 -> `docs/about/release-notes.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`: Release notes now cover
Slack validation, atomic `channels add`, WhatsApp QR diagnostics, and
Slack placeholder normalization.
- #4388, #4600, #4646, and #4647 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Release notes now cover status failure
layers, paused-container hints, Docker-driver doctor behavior, and
non-destructive stale-registry recovery.
- #4569, #4579, and #4678 -> `docs/about/release-notes.mdx`,
`docs/manage-sandboxes/lifecycle.mdx`,
`docs/network-policy/integration-policy-examples.mdx`: Release notes now
cover installer tag pinning, PyPI `uv` policy access, and observable
Jira validation.
- #4632 -> `.agents/skills/`: Regenerated user skills from the current
per-agent docs source, including newly generated Hermes reference files.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" docs --glob "*.mdx"`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" .agents/skills --glob "*.md"`
- `npm run docs`
- `npm run build:cli`
- Commit hooks: markdownlint, docs-to-skills verification, gitleaks,
skills YAML, commitlint

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Restructured documentation to clearly distinguish OpenClaw and Hermes
agent variants throughout user guides.
* Enhanced security, credential storage, and deployment guidance with
clearer setup flows.
  * Added Hermes plugin installation and ecosystem documentation.
* Improved workspace, messaging, and policy management references with
variant-specific command examples.
  * Refined troubleshooting and CLI reference sections for clarity.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
yimoj added a commit to yimoj/NemoClaw that referenced this pull request Jun 9, 2026
NVIDIA#4509)

PR NVIDIA#4609 verified host-network GPU local inference with `docker exec`
against the recreated `--network host` container, whose main network
namespace IS the host's — so the probe passed while the OpenClaw agent,
which runs in OpenShell's isolated sandbox network namespace, still got
ECONNREFUSED on the direct 127.0.0.1 provider URL. The sandbox namespace
cannot reach the host loopback even under `--network host` (see
detectSandboxFallbackDns), so the direct-loopback wiring was unreachable.

- Never pin OpenClaw to a direct container-loopback inference URL; for
  local providers, downgrade an opted-in host-network GPU patch to the
  OpenShell bridge so inference routes through the reachable
  inference.local path (host networking is not needed for GPU access).
- Re-run the sandbox bridge reachability probe (with UFW auto-fix) after
  the downgrade, since gateway startup skipped it under host mode.
- Replace the docker-exec gate with a runtime-context probe via
  `openshell sandbox exec` that hits inference.local exactly as the agent
  does, requiring 2xx; 000/4xx/5xx fail with actionable recovery. Soft-skip
  only when the sandbox image genuinely lacks curl.
- Update the GPU E2E to prove inference through `openshell sandbox exec`
  (the real runtime), removing the docker-exec shortcut that masked the bug.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yimoj added a commit to yimoj/NemoClaw that referenced this pull request Jun 9, 2026
NVIDIA#4509)

PR NVIDIA#4609 verified host-network GPU local inference with `docker exec`
against the recreated `--network host` container, whose main network
namespace IS the host's — so the probe passed while the OpenClaw agent,
which runs in OpenShell's isolated sandbox network namespace, still got
ECONNREFUSED on the direct 127.0.0.1 provider URL. The sandbox namespace
cannot reach the host loopback even under `--network host` (see
detectSandboxFallbackDns), so the direct-loopback wiring was unreachable.

- Never pin OpenClaw to a direct container-loopback inference URL; for
  local providers, downgrade an opted-in host-network GPU patch to the
  OpenShell bridge so inference routes through the reachable
  inference.local path (host networking is not needed for GPU access).
- Re-run the sandbox bridge reachability probe (with UFW auto-fix) after
  the downgrade, since gateway startup skipped it under host mode.
- Replace the docker-exec gate with a runtime-context probe via
  `openshell sandbox exec` that hits inference.local exactly as the agent
  does, requiring 2xx; 000/4xx/5xx fail with actionable recovery. Soft-skip
  only when the sandbox image genuinely lacks curl.
- Update the GPU E2E to prove inference through `openshell sandbox exec`
  (the real runtime), removing the docker-exec shortcut that masked the bug.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cv pushed a commit that referenced this pull request Jun 9, 2026
#4509) (#5024)

## Summary

Reopened #4509: on an Ubuntu 24.04 GPU host-network setup, onboard
printed "local inference reachable" yet the agent then failed with
`ECONNREFUSED` / "LLM request failed: network connection error". PR
#4609 proved reachability with `docker exec` against the recreated
`--network host` container — whose *main* network namespace is the
host's — but OpenClaw runs in OpenShell's **isolated sandbox network
namespace**, which cannot reach the host loopback even under `--network
host`. So the direct `127.0.0.1` provider URL was unreachable for the
agent while the probe falsely passed. This fixes the URL/network mapping
and verifies it from the real runtime context.

## Related Issue

Fixes #4509

## Changes

- **No direct container-loopback inference URL.** For local providers,
an opted-in host-network GPU patch
(`NEMOCLAW_DOCKER_GPU_PATCH_NETWORK=host`) is downgraded to the
OpenShell bridge so inference routes through the reachable
`inference.local` path. Host networking is unnecessary for GPU device
access (that comes from the GPU mode flags). Non-local
(cloud/routed/custom) GPU sandboxes are untouched.
- **Bridge reachability re-checked after the downgrade** (with UFW
auto-fix), since gateway startup skipped that probe while host
networking was still requested.
- **Runtime-context reachability gate.** The post-ready gate now probes
`https://inference.local/v1/models` via `openshell sandbox exec` — the
exact network namespace and route OpenClaw uses — instead of `docker
exec`. Success requires a `2xx`; `000` (ECONNREFUSED), `4xx` (route/auth
misconfig), and `5xx` (backend down) fail with actionable recovery. A
genuinely missing `curl` soft-skips (OpenClaw's HTTP client does not
need it); a broken sandbox exec path fails rather than masquerading as
missing-curl.
- **GPU E2E** (`test/e2e/test-gpu-e2e.sh`) now proves inference through
`openshell sandbox exec` (the real runtime) and asserts the new gate,
removing the `docker exec` shortcut that masked the bug.
- `src/lib/onboard.ts` stays net-neutral (orchestration lives in
`src/lib/onboard/`).

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx prek run --files` on the changed files
(TS/biome/spdx/shellcheck clean; the only failures were unrelated
env-flakes — missing plugin `node_modules` and 5s CLI-spawn timeouts
under a loaded host — which pass with deps installed and a normal
timeout: 152/152)
- [x] `npm run build:cli`, `npm run typecheck:cli`
- [x] `npx vitest run` for the gate (21), `test/onboard.test.ts` (66),
`docker-gpu-patch` (50), `inference/local` (65), `provider-inference`
(13), `docker-gpu-sandbox-create` (5)
- [x] Tests added/updated for new and changed behavior (runtime-context
probe, 2xx-only, local-only downgrade + bridge re-check, exec-failure vs
missing-curl)
- [x] No secrets, API keys, or credentials committed

### Reporter-workflow E2E evidence

Full reporter reproduction requires Ubuntu 24.04 + NVIDIA GPU + native
Docker (host-network GPU patch), which is not available on this CI-less
dev host. The exact workflow is covered by the **GPU pipeline E2E**
(`test/e2e/test-gpu-e2e.sh`, Brev GPU runner), which this PR extends to
verify local inference **through `openshell sandbox exec`** (the agent
runtime netns) and to assert the runtime-context gate — so a future
regression cannot pass via the container-main-namespace shortcut.

The root-cause *mechanism* was reproduced locally and hermetically (no
GPU needed), modeling the OpenShell Docker-driver topology — a
`--network host` container plus an inner `unshare -n` namespace (how
OpenShell runs the sandbox agent):

```
[A] container MAIN netns (== host loopback under --network host; what docker exec / PR #4609 hit):
    http_code=200   RESULT: OK-MAIN  (reaches host Ollama)
[B] INNER netns via unshare -n (== OpenShell sandbox agent runtime / openshell sandbox exec):
    http_code=000   RESULT: FAIL-INNER (ECONNREFUSED — matches the reporter)
```

This confirms why the `docker exec` probe passed while the agent got
`ECONNREFUSED`, and why routing through the OpenShell-managed
`inference.local` path (on the bridge) is the reachable fix.

---
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Verify GPU local inference from inside the sandbox runtime (not via
host-network probes), reducing false positives and handling
curl/unreachability scenarios more robustly.

* **Refactor**
* Default Docker GPU patching for local providers now uses the
OpenShell-managed bridge instead of host networking to improve inference
accessibility and consistency.

* **Tests**
* End-to-end and unit tests updated to exercise the sandbox-side
inference path and cover success, skip, retry, and failure cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images provider: ollama Ollama local model provider behavior v0.0.57 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Inference] GPU host-network sandbox cannot reach local Ollama provider

3 participants