Description
When NemoClaw is paired with Ollama as the inference backend (the recommended local-inference path during nemoclaw onboard), tool-calling agent workloads can fail silently due to a class of bugs in Ollama's per-model chat-template-to-tool_calls parser.
What happens
Under realistic agent prompt shape (multi-tool surface + long system prompt + sender-metadata preamble), the model emits the correct tool-call tokens, but Ollama's parser fails to extract them into the structured tool_calls field. The whole tool-call ends up as a stringified JSON blob inside content as type: text. OpenClaw's gateway reads content as plain text and never dispatches the tool. The user sees raw JSON in the TUI with no error indicator.
What we expected
- Tool-warranted prompts → structured
tool_calls, tool dispatch, follow-up reply with the tool result.
- Non-tool prompts → plain-text reply.
What we actually got (with hermes3:8b on Ollama, OpenClaw 2026.4.9 agent shape)
- Every tool-warranted prompt → assistant message arrives as
type: text with stringified JSON like {"arguments":{"query":"X"},"name":"memory_search"}. No tool_calls. finish_reason: stop. No tool dispatch. No follow-up turn. No error logged anywhere user-visible.
The same model on the same hardware works correctly when served via vLLM with --enable-auto-tool-choice --tool-call-parser hermes (i.e. the model is fine, the per-model template extraction in Ollama is fragile under load — see companion report 05-ollama-hermes3-tool-call-leak.md).
Why this is a NemoClaw issue (not just an Ollama issue)
Even if/when Ollama fixes the upstream parser, today NemoClaw:
- Recommends Ollama via the
ollama-local provider during nemoclaw onboard as the path of least resistance for local-inference users.
- Suggests
hermes3:8b (and other Ollama-bundled models) as tool-calling-friendly choices.
- Has no documented diagnostic path from the visible symptom (raw JSON in the TUI) to the actual fix (switch inference engine to vLLM with a parser flag).
- Has no startup-time probe that would catch the failure before the user runs into it in a real conversation.
What this issue asks for
Two NemoClaw-side changes:
-
Docs: a "Tool-calling reliability" page that explains when Ollama-as-inference is sufficient (single-tool, low-complexity prompts, embeddings-only paths) vs. when vLLM with --enable-auto-tool-choice --tool-call-parser <model> is required (agent loops with 4+ tools, long system prompts, multi-turn dispatch). Include a known-good vLLM compose snippet plus the canonical openclaw config set --batch-file invocation for repointing the sandbox. (Filling these gaps would prevent most "my agent returns JSON in the TUI" support requests for users who do hit this.)
-
Behavior (optional follow-up): a startup probe in nemoclaw <sandbox> onboard (or nemoclaw <sandbox> doctor / --check) that issues a synthetic multi-tool request against the configured inference endpoint, inspects the response, and flags the leak symptom early. Convert silent failure into a startup-time error pointing the user at the fix.
Filing both flavors here under one issue; happy to split if maintainers prefer.
Reproduction Steps
- Install NemoClaw v0.0.29 on an ARM64 host (DGX Spark / GB10) with an Ollama backend. (Reproduces on x86_64 Ollama too — the bug is in the parser, not the runtime.)
nemoclaw onboard → choose ollama-local, select Hermes-3 (hermes3:8b):
NEMOCLAW_PROVIDER=ollama NEMOCLAW_MODEL=hermes3:8b \
NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard
- After successful onboarding, connect via TUI:
nemoclaw <sandbox> connect.
- Run a tool-warranted prompt against a sandbox with at least the
memory_search and sessions_send tools active. Examples that all reproduce: "search my memory for robotics", "What is 2 + 2?", "Tell me one fact about robotics in one sentence", or simply "hello?".
- Observe: response arrives as plain text containing stringified JSON like
{"arguments":{"query":"X"},"name":"memory_search"}. Tool not dispatched. No follow-up turn. No error logged anywhere user-visible.
The user has no path from this symptom to "switch to vLLM with --tool-call-parser hermes" without external knowledge. Today the troubleshooting docs and onboarding flows don't cover this failure mode.
Environment
- OS: Ubuntu 24.04 (Linux 6.17.0-1014-nvidia aarch64)
- Hardware: NVIDIA GB10 (DGX Spark), 96 GB unified memory
- Docker: Engine 27.x
- Node.js: v22.22.2 (from nvm)
- NemoClaw: v0.0.29
- OpenShell (cluster): 0.0.36
- OpenClaw bundled: 2026.4.9 (downgraded from base-image 2026.4.24 via local Dockerfile workaround per
01-bundled-openclaw-version-mismatch.md)
- Inference engine under test: Ollama (Docker,
ollama/ollama:latest, ARM64), OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_PARALLEL=2, OLLAMA_KEEP_ALIVE=0, --gpus all
- Inference engine that resolves the issue: vLLM (Docker,
vllm/vllm-openai:latest, ARM64), --enable-auto-tool-choice --tool-call-parser hermes
- Model: Hermes-3-Llama-3.1-8B (Q4_K_M GGUF on Ollama / FP16 safetensors on vLLM — both reproduce / resolve respectively)
- Tools active in repro session:
sessions_send, memory_search, web_fetch, exec (default agent toolset for rtfm sandbox)
Debug Output
Output of `nemoclaw debug --quick --sandbox <sandbox>` captured 2026-04-29 18:47 UTC, full capture at [`debug-output-2026-04-29-1847.txt`](./debug-output-2026-04-29-1847.txt). Focused excerpt below — note `provider: ollama-local` and `model: hermes3:8b` (the configuration that triggers the symptom):
$ nemoclaw --version
nemoclaw v0.0.29
═══ Onboard Session ═══
"provider": "ollama-local",
"model": "hermes3:8b",
"endpointUrl": "http://host.openshell.internal:11435/v1",
"preferredInferenceApi": "openai-completions",
"policyPresets": ["npm","pypi","huggingface","brew","brave","local-inference"],
"lastCompletedStep": "policies",
"failure": null
═══ Docker ═══
ollama/ollama:latest (running, 127.0.0.1:11434->11434)
ghcr.io/nvidia/openshell/cluster:0.0.36 (running, 0.0.0.0:8080->30051) [openshell-cluster-nemoclaw]
═══ OpenShell ═══
Server: https://127.0.0.1:8080 Status: Connected Version: 0.0.36
Sandbox: <sandbox> Namespace: openshell Phase: Ready Revision: 7
The bug is not a build/health failure — `nemoclaw debug --quick` reports a healthy stack. The symptom only surfaces inside the agent loop, captured under "Logs" below.
Logs
Captured 2026-04-29 19:17–19:21 UTC from a live agent session (rtfm sandbox, four user prompts). All four assistant messages from the OpenClaw session log (`/sandbox/.openclaw-data/agents/main/sessions/<sessionId>.jsonl`) have `content` as stringified JSON tool-call, no `tool_calls` field, `stopReason: "stop"`:
{"type":"message","id":"d0ab96a5","timestamp":"2026-04-29T19:17:14.840Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"hello?\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-976"}}
{"type":"message","id":"f2e95dc7","timestamp":"2026-04-29T19:20:22.687Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"What is 2 + 2?\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-326"}}
{"type":"message","id":"7dd727ce","timestamp":"2026-04-29T19:20:55.344Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"Tell me one fact about robotics in one sentence.\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-678"}}
{"type":"message","id":"af9d6e2a","timestamp":"2026-04-29T19:21:32.981Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"ok\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-134"}}
Note `content[0].type: "text"` and `text` containing a literal stringified JSON object with `arguments` and `name` keys — the exact shape the Hermes-3 chat template emits when the `tool_calls` extraction in Ollama doesn't fire. There is no error message in any log surface (gateway log, sandbox log, ollama-auth-proxy log, or `nemoclaw <sandbox> logs`) — the gateway treats the assistant message as a normal text reply.
Checklist
Description
When NemoClaw is paired with Ollama as the inference backend (the recommended local-inference path during
nemoclaw onboard), tool-calling agent workloads can fail silently due to a class of bugs in Ollama's per-model chat-template-to-tool_callsparser.What happens
Under realistic agent prompt shape (multi-tool surface + long system prompt + sender-metadata preamble), the model emits the correct tool-call tokens, but Ollama's parser fails to extract them into the structured
tool_callsfield. The whole tool-call ends up as a stringified JSON blob insidecontentastype: text. OpenClaw's gateway readscontentas plain text and never dispatches the tool. The user sees raw JSON in the TUI with no error indicator.What we expected
tool_calls, tool dispatch, follow-up reply with the tool result.What we actually got (with
hermes3:8bon Ollama, OpenClaw 2026.4.9 agent shape)type: textwith stringified JSON like{"arguments":{"query":"X"},"name":"memory_search"}. Notool_calls.finish_reason: stop. No tool dispatch. No follow-up turn. No error logged anywhere user-visible.The same model on the same hardware works correctly when served via vLLM with
--enable-auto-tool-choice --tool-call-parser hermes(i.e. the model is fine, the per-model template extraction in Ollama is fragile under load — see companion report05-ollama-hermes3-tool-call-leak.md).Why this is a NemoClaw issue (not just an Ollama issue)
Even if/when Ollama fixes the upstream parser, today NemoClaw:
ollama-localprovider duringnemoclaw onboardas the path of least resistance for local-inference users.hermes3:8b(and other Ollama-bundled models) as tool-calling-friendly choices.What this issue asks for
Two NemoClaw-side changes:
Docs: a "Tool-calling reliability" page that explains when Ollama-as-inference is sufficient (single-tool, low-complexity prompts, embeddings-only paths) vs. when vLLM with
--enable-auto-tool-choice --tool-call-parser <model>is required (agent loops with 4+ tools, long system prompts, multi-turn dispatch). Include a known-good vLLM compose snippet plus the canonicalopenclaw config set --batch-fileinvocation for repointing the sandbox. (Filling these gaps would prevent most "my agent returns JSON in the TUI" support requests for users who do hit this.)Behavior (optional follow-up): a startup probe in
nemoclaw <sandbox> onboard(ornemoclaw <sandbox> doctor/--check) that issues a synthetic multi-tool request against the configured inference endpoint, inspects the response, and flags the leak symptom early. Convert silent failure into a startup-time error pointing the user at the fix.Filing both flavors here under one issue; happy to split if maintainers prefer.
Reproduction Steps
nemoclaw onboard→ chooseollama-local, select Hermes-3 (hermes3:8b):nemoclaw <sandbox> connect.memory_searchandsessions_sendtools active. Examples that all reproduce: "search my memory for robotics", "What is 2 + 2?", "Tell me one fact about robotics in one sentence", or simply "hello?".{"arguments":{"query":"X"},"name":"memory_search"}. Tool not dispatched. No follow-up turn. No error logged anywhere user-visible.The user has no path from this symptom to "switch to vLLM with
--tool-call-parser hermes" without external knowledge. Today the troubleshooting docs and onboarding flows don't cover this failure mode.Environment
01-bundled-openclaw-version-mismatch.md)ollama/ollama:latest, ARM64),OLLAMA_FLASH_ATTENTION=1,OLLAMA_KV_CACHE_TYPE=q8_0,OLLAMA_NUM_PARALLEL=2,OLLAMA_KEEP_ALIVE=0,--gpus allvllm/vllm-openai:latest, ARM64),--enable-auto-tool-choice --tool-call-parser hermessessions_send,memory_search,web_fetch,exec(default agent toolset forrtfmsandbox)Debug Output
Logs
Checklist