Skip to content

[Ubuntu 24.04][Inference] inference.local unreachable in sandbox after Ollama host restart #3390

@zNeill

Description

@zNeill

Description

Description

On NemoClaw v0.0.38 with a sandbox that uses ollama-local provider, stopping and restarting the host-side Ollama systemd unit breaks the sandbox's `inference.local` route. Reconnecting to the sandbox triggers the wizard's DNS-proxy auto-repair flow, which reports "DNS verification: 4 passed, 0 failed" — yet immediately prints "Warning: inference.local is still unavailable after DNS proxy repair." The subsequent `openclaw agent --agent main -m "hello"` retry hangs indefinitely on "Waiting for agent reply..." with no response.

Per the T560821 expected result:
  "After provider transient failure recovers, inference resumes WITHOUT
   requiring re-onboard or sandbox destroy; nemoclaw status transitions
   from degraded -> healthy within the documented window."

The current behavior fails this expectation: even after Ollama is fully back up on the host (curl http://localhost:11434/api/tags returns models, systemctl is-active=active), the sandbox cannot reach inference.local without a manual intervention.
Environment
Device:        2u1g-x570-1795 (10.63.136.90)
OS:            Ubuntu 24.04.4 LTS
Architecture:  x86_64
GPU:           NVIDIA RTX 6000 Ada Generation, 46068 MiB
Node.js:       v22.22.2
npm:           10.9.7
Docker:        29.4.3
OpenShell CLI: openshell 0.0.36
NemoClaw:      v0.0.38
OpenClaw:      2026.4.24
Sandbox:       ollama-base (Ollama / qwen2.5:7b)
Ollama:        0.23.1 (host systemd service)
Steps to Reproduce
1. Fresh NemoClaw v0.0.38 install
2. nemoclaw onboard --fresh --name ollama-base; pick Local Ollama, default model
3. nemoclaw ollama-base connect; verify openclaw agent works (e.g.
   openclaw agent --agent main -m "hello" --session-id baseline returns OK)
4. exit sandbox
5. sudo systemctl stop ollama
6. nemoclaw ollama-base connect
   Observe: wizard prints "inference.local is unavailable inside 'ollama-base'.
   Repairing sandbox DNS proxy..." then "DNS verification: 4 passed, 0 failed"
   then "Warning: inference.local is still unavailable after DNS proxy repair."
7. Inside sandbox: openclaw agent --agent main -m "hello" --session-id fail-test
   Observe: clear error mentioning the inference backend is unavailable. PASS.
8. exit sandbox
9. sudo systemctl start ollama; verify systemctl is-active=active and
   curl -s http://localhost:11434/api/tags returns qwen2.5:7b
10. nemoclaw ollama-base connect (AGAIN)
    Observe: SAME warning sequence — DNS verification 4 passed,
    but "Warning: inference.local is still unavailable after DNS proxy repair."
11. Inside sandbox: openclaw agent --agent main -m "hello" --session-id retry-test
    Observe: hangs on "Waiting for agent reply..." indefinitely (>30s, did not
    respond within typical 7B Ollama timeout).
Expected Result
After step 9 (Ollama restored on host), step 10's reconnect should either:
  (a) succeed cleanly without a "DNS proxy repair" hop (because Ollama on host
      is the upstream, and the sandbox's inference.local -> Ollama proxy
      should re-establish without filesystem mutation), OR
  (b) if DNS-proxy auto-repair is needed, the repair must ACTUALLY restore
      inference.local — not declare 4/4 verification PASS while simultaneously
      warning the route is still unavailable.

After step 10/11, openclaw agent retry must return a normal reply within
~30-60s (typical qwen2.5:7b warm response latency). The transition from
degraded -> healthy must happen automatically without `nemoclaw  destroy`
or `nemoclaw onboard --fresh`.
Actual Result
Step 6 wizard output:
  Dashboard port forward to 'ollama-base' is missing or dead.
  Re-establishing...
  → Found forward on sandbox 'ollama-base'
  ✓ Stopped forward of port 18789 for sandbox ollama-base
  ✓ Forwarding port 18789 to sandbox ollama-base in the background
    Access at: http://127.0.0.1:18789/
    Stop with: openshell forward stop 18789 ollama-base
    Failed to re-establish the dashboard port forward.
    Run `openshell forward start --background  ollama-base` manually.
    inference.local is unavailable inside 'ollama-base'. Repairing sandbox DNS proxy...
  Setting up DNS proxy in pod 'ollama-base' (10.200.0.1:53 -> 10.43.0.10)...
    [PASS] DNS forwarder running (pid=637): dns-proxy: 10.200.0.1:53 -> 10.43.0.10:53 pid=637
    [PASS] resolv.conf -> nameserver 10.200.0.1
    [PASS] iptables: UDP 10.200.0.1:53 ACCEPT rule present
    [PASS] getent hosts github.com -> 140.82.116.4    github.com
    DNS verification: 4 passed, 0 failed
    Warning: inference.local is still unavailable after DNS proxy repair.
    ✓ Connecting to sandbox 'ollama-base'

Step 11 inside sandbox (after Ollama restarted on host):
  $ openclaw agent --agent main -m "hello" --session-id retry-test
   OpenClaw 2026.4.24 (cbcfdf6) — Less clicking, more shipping...
  ◓  Waiting for agent reply…
  ◒  Waiting for agent reply….
  (no further output observed within 30s polling window)

Note: same "Warning: inference.local is still unavailable" appears in BOTH
the Ollama-down state (expected) AND the Ollama-up state (NOT expected).
This indicates the inference.local route inside the sandbox is broken
independently of whether host-side Ollama is running, once it has been
disrupted by a host-side restart.
Logs
Full T560821 per-step capture:
  /home/lab/day0-automation/20260511/report-T560821.txt

Related bugs (different scenario / fixed):
  6128212 — curl to inference.local/v1/models returns null exit code, ERR_IPC_CHANNEL_CLOSED (Open, different code path: HTTP 503 rather than DNS auto-repair)
  6115797 — TUI fails with "Connection error" — sandbox security guard blocks inference.local route (Fixed/Verified — onboard-time bug, this is a runtime recovery bug)
  6116337 — Ollama-local returns HTTP 401 (Fixed/Verified — credentials path)

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Policy&Network

[NVB#6166935]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: local-modelsLocal model providers, downloads, launch, or connectivityarea: providersInference provider integrations and provider behaviorarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: ubuntuAffects Ubuntu Linux environments

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions