Tool-call reliability: detect / document Ollama tool-call-leak failure mode and recommend vLLM with parser flag for tool-calling agents

### Description


When NemoClaw is paired with Ollama as the inference backend (the recommended local-inference path during `nemoclaw onboard`), tool-calling agent workloads can fail silently due to a class of bugs in Ollama's per-model chat-template-to-`tool_calls` parser.

### What happens

Under realistic agent prompt shape (multi-tool surface + long system prompt + sender-metadata preamble), the model emits the correct tool-call tokens, but Ollama's parser fails to extract them into the structured `tool_calls` field. The whole tool-call ends up as a stringified JSON blob inside `content` as `type: text`. OpenClaw's gateway reads `content` as plain text and never dispatches the tool. The user sees raw JSON in the TUI with no error indicator.

### What we expected

- Tool-warranted prompts → structured `tool_calls`, tool dispatch, follow-up reply with the tool result.
- Non-tool prompts → plain-text reply.

### What we actually got (with `hermes3:8b` on Ollama, OpenClaw 2026.4.9 agent shape)

- Every tool-warranted prompt → assistant message arrives as `type: text` with stringified JSON like `{"arguments":{"query":"X"},"name":"memory_search"}`. No `tool_calls`. `finish_reason: stop`. No tool dispatch. No follow-up turn. No error logged anywhere user-visible.

The same model on the same hardware works correctly when served via vLLM with `--enable-auto-tool-choice --tool-call-parser hermes` (i.e. the model is fine, the per-model template extraction in Ollama is fragile under load — see companion report `05-ollama-hermes3-tool-call-leak.md`).

### Why this is a NemoClaw issue (not just an Ollama issue)

Even if/when Ollama fixes the upstream parser, today NemoClaw:

1. Recommends Ollama via the `ollama-local` provider during `nemoclaw onboard` as the path of least resistance for local-inference users.
2. Suggests `hermes3:8b` (and other Ollama-bundled models) as tool-calling-friendly choices.
3. Has no documented diagnostic path from the visible symptom (raw JSON in the TUI) to the actual fix (switch inference engine to vLLM with a parser flag).
4. Has no startup-time probe that would catch the failure before the user runs into it in a real conversation.

### What this issue asks for

Two NemoClaw-side changes:

1. **Docs**: a "Tool-calling reliability" page that explains when Ollama-as-inference is sufficient (single-tool, low-complexity prompts, embeddings-only paths) vs. when vLLM with `--enable-auto-tool-choice --tool-call-parser <model>` is required (agent loops with 4+ tools, long system prompts, multi-turn dispatch). Include a known-good vLLM compose snippet plus the canonical `openclaw config set --batch-file` invocation for repointing the sandbox. (Filling these gaps would prevent most "my agent returns JSON in the TUI" support requests for users who do hit this.)

2. **Behavior** (optional follow-up): a startup probe in `nemoclaw <sandbox> onboard` (or `nemoclaw <sandbox> doctor` / `--check`) that issues a synthetic multi-tool request against the configured inference endpoint, inspects the response, and flags the leak symptom early. Convert silent failure into a startup-time error pointing the user at the fix.

Filing both flavors here under one issue; happy to split if maintainers prefer.



### Reproduction Steps


1. Install NemoClaw v0.0.29 on an ARM64 host (DGX Spark / GB10) with an Ollama backend. (Reproduces on x86_64 Ollama too — the bug is in the parser, not the runtime.)
2. `nemoclaw onboard` → choose `ollama-local`, select Hermes-3 (`hermes3:8b`):
   ```
   NEMOCLAW_PROVIDER=ollama NEMOCLAW_MODEL=hermes3:8b \
   NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard
   ```
3. After successful onboarding, connect via TUI: `nemoclaw <sandbox> connect`.
4. Run a tool-warranted prompt against a sandbox with at least the `memory_search` and `sessions_send` tools active. Examples that all reproduce: "search my memory for robotics", "What is 2 + 2?", "Tell me one fact about robotics in one sentence", or simply "hello?".
5. Observe: response arrives as plain text containing stringified JSON like `{"arguments":{"query":"X"},"name":"memory_search"}`. Tool not dispatched. No follow-up turn. No error logged anywhere user-visible.

The user has no path from this symptom to "switch to vLLM with `--tool-call-parser hermes`" without external knowledge. Today the troubleshooting docs and onboarding flows don't cover this failure mode.


### Environment


- OS: Ubuntu 24.04 (Linux <host> 6.17.0-1014-nvidia aarch64)
- Hardware: NVIDIA GB10 (DGX Spark), 96 GB unified memory
- Docker: Engine 27.x
- Node.js: v22.22.2 (from nvm)
- NemoClaw: v0.0.29
- OpenShell (cluster): 0.0.36
- OpenClaw bundled: 2026.4.9 (downgraded from base-image 2026.4.24 via local Dockerfile workaround per `01-bundled-openclaw-version-mismatch.md`)
- Inference engine under test: Ollama (Docker, `ollama/ollama:latest`, ARM64), `OLLAMA_FLASH_ATTENTION=1`, `OLLAMA_KV_CACHE_TYPE=q8_0`, `OLLAMA_NUM_PARALLEL=2`, `OLLAMA_KEEP_ALIVE=0`, `--gpus all`
- Inference engine that resolves the issue: vLLM (Docker, `vllm/vllm-openai:latest`, ARM64), `--enable-auto-tool-choice --tool-call-parser hermes`
- Model: Hermes-3-Llama-3.1-8B (Q4_K_M GGUF on Ollama / FP16 safetensors on vLLM — both reproduce / resolve respectively)
- Tools active in repro session: `sessions_send`, `memory_search`, `web_fetch`, `exec` (default agent toolset for `rtfm` sandbox)


### Debug Output

```shell
Output of `nemoclaw debug --quick --sandbox <sandbox>` captured 2026-04-29 18:47 UTC, full capture at [`debug-output-2026-04-29-1847.txt`](./debug-output-2026-04-29-1847.txt). Focused excerpt below — note `provider: ollama-local` and `model: hermes3:8b` (the configuration that triggers the symptom):


$ nemoclaw --version
nemoclaw v0.0.29

═══ Onboard Session ═══

  "provider": "ollama-local",
  "model": "hermes3:8b",
  "endpointUrl": "http://host.openshell.internal:11435/v1",
  "preferredInferenceApi": "openai-completions",
  "policyPresets": ["npm","pypi","huggingface","brew","brave","local-inference"],
  "lastCompletedStep": "policies",
  "failure": null

═══ Docker ═══

ollama/ollama:latest                       (running, 127.0.0.1:11434->11434)
ghcr.io/nvidia/openshell/cluster:0.0.36    (running, 0.0.0.0:8080->30051) [openshell-cluster-nemoclaw]

═══ OpenShell ═══

Server:  https://127.0.0.1:8080  Status: Connected  Version: 0.0.36
Sandbox: <sandbox>  Namespace: openshell  Phase: Ready  Revision: 7


The bug is not a build/health failure — `nemoclaw debug --quick` reports a healthy stack. The symptom only surfaces inside the agent loop, captured under "Logs" below.
```

### Logs

```shell
Captured 2026-04-29 19:17–19:21 UTC from a live agent session (rtfm sandbox, four user prompts). All four assistant messages from the OpenClaw session log (`/sandbox/.openclaw-data/agents/main/sessions/<sessionId>.jsonl`) have `content` as stringified JSON tool-call, no `tool_calls` field, `stopReason: "stop"`:


{"type":"message","id":"d0ab96a5","timestamp":"2026-04-29T19:17:14.840Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"hello?\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-976"}}
{"type":"message","id":"f2e95dc7","timestamp":"2026-04-29T19:20:22.687Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"What is 2 + 2?\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-326"}}
{"type":"message","id":"7dd727ce","timestamp":"2026-04-29T19:20:55.344Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"Tell me one fact about robotics in one sentence.\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-678"}}
{"type":"message","id":"af9d6e2a","timestamp":"2026-04-29T19:21:32.981Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"ok\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-134"}}


Note `content[0].type: "text"` and `text` containing a literal stringified JSON object with `arguments` and `name` keys — the exact shape the Hermes-3 chat template emits when the `tool_calls` extraction in Ollama doesn't fire. There is no error message in any log surface (gateway log, sandbox log, ollama-auth-proxy log, or `nemoclaw <sandbox> logs`) — the gateway treats the assistant message as a normal text reply.
```

### Checklist

- [x] I confirmed this bug is reproducible
- [x] I searched existing issues and this is not a duplicate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool-call reliability: detect / document Ollama tool-call-leak failure mode and recommend vLLM with parser flag for tool-calling agents #2733

Description

What happens

What we expected

What we actually got (with `hermes3:8b` on Ollama, OpenClaw 2026.4.9 agent shape)

Why this is a NemoClaw issue (not just an Ollama issue)

What this issue asks for

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tool-call reliability: detect / document Ollama tool-call-leak failure mode and recommend vLLM with parser flag for tool-calling agents #2733

Description

Description

What happens

What we expected

What we actually got (with hermes3:8b on Ollama, OpenClaw 2026.4.9 agent shape)

Why this is a NemoClaw issue (not just an Ollama issue)

What this issue asks for

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

What we actually got (with `hermes3:8b` on Ollama, OpenClaw 2026.4.9 agent shape)