Skip to content

Tool-call reliability: detect / document Ollama tool-call-leak failure mode and recommend vLLM with parser flag for tool-calling agents #2733

@camerono

Description

@camerono

Description

When NemoClaw is paired with Ollama as the inference backend (the recommended local-inference path during nemoclaw onboard), tool-calling agent workloads can fail silently due to a class of bugs in Ollama's per-model chat-template-to-tool_calls parser.

What happens

Under realistic agent prompt shape (multi-tool surface + long system prompt + sender-metadata preamble), the model emits the correct tool-call tokens, but Ollama's parser fails to extract them into the structured tool_calls field. The whole tool-call ends up as a stringified JSON blob inside content as type: text. OpenClaw's gateway reads content as plain text and never dispatches the tool. The user sees raw JSON in the TUI with no error indicator.

What we expected

  • Tool-warranted prompts → structured tool_calls, tool dispatch, follow-up reply with the tool result.
  • Non-tool prompts → plain-text reply.

What we actually got (with hermes3:8b on Ollama, OpenClaw 2026.4.9 agent shape)

  • Every tool-warranted prompt → assistant message arrives as type: text with stringified JSON like {"arguments":{"query":"X"},"name":"memory_search"}. No tool_calls. finish_reason: stop. No tool dispatch. No follow-up turn. No error logged anywhere user-visible.

The same model on the same hardware works correctly when served via vLLM with --enable-auto-tool-choice --tool-call-parser hermes (i.e. the model is fine, the per-model template extraction in Ollama is fragile under load — see companion report 05-ollama-hermes3-tool-call-leak.md).

Why this is a NemoClaw issue (not just an Ollama issue)

Even if/when Ollama fixes the upstream parser, today NemoClaw:

  1. Recommends Ollama via the ollama-local provider during nemoclaw onboard as the path of least resistance for local-inference users.
  2. Suggests hermes3:8b (and other Ollama-bundled models) as tool-calling-friendly choices.
  3. Has no documented diagnostic path from the visible symptom (raw JSON in the TUI) to the actual fix (switch inference engine to vLLM with a parser flag).
  4. Has no startup-time probe that would catch the failure before the user runs into it in a real conversation.

What this issue asks for

Two NemoClaw-side changes:

  1. Docs: a "Tool-calling reliability" page that explains when Ollama-as-inference is sufficient (single-tool, low-complexity prompts, embeddings-only paths) vs. when vLLM with --enable-auto-tool-choice --tool-call-parser <model> is required (agent loops with 4+ tools, long system prompts, multi-turn dispatch). Include a known-good vLLM compose snippet plus the canonical openclaw config set --batch-file invocation for repointing the sandbox. (Filling these gaps would prevent most "my agent returns JSON in the TUI" support requests for users who do hit this.)

  2. Behavior (optional follow-up): a startup probe in nemoclaw <sandbox> onboard (or nemoclaw <sandbox> doctor / --check) that issues a synthetic multi-tool request against the configured inference endpoint, inspects the response, and flags the leak symptom early. Convert silent failure into a startup-time error pointing the user at the fix.

Filing both flavors here under one issue; happy to split if maintainers prefer.

Reproduction Steps

  1. Install NemoClaw v0.0.29 on an ARM64 host (DGX Spark / GB10) with an Ollama backend. (Reproduces on x86_64 Ollama too — the bug is in the parser, not the runtime.)
  2. nemoclaw onboard → choose ollama-local, select Hermes-3 (hermes3:8b):
    NEMOCLAW_PROVIDER=ollama NEMOCLAW_MODEL=hermes3:8b \
    NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard
    
  3. After successful onboarding, connect via TUI: nemoclaw <sandbox> connect.
  4. Run a tool-warranted prompt against a sandbox with at least the memory_search and sessions_send tools active. Examples that all reproduce: "search my memory for robotics", "What is 2 + 2?", "Tell me one fact about robotics in one sentence", or simply "hello?".
  5. Observe: response arrives as plain text containing stringified JSON like {"arguments":{"query":"X"},"name":"memory_search"}. Tool not dispatched. No follow-up turn. No error logged anywhere user-visible.

The user has no path from this symptom to "switch to vLLM with --tool-call-parser hermes" without external knowledge. Today the troubleshooting docs and onboarding flows don't cover this failure mode.

Environment

  • OS: Ubuntu 24.04 (Linux 6.17.0-1014-nvidia aarch64)
  • Hardware: NVIDIA GB10 (DGX Spark), 96 GB unified memory
  • Docker: Engine 27.x
  • Node.js: v22.22.2 (from nvm)
  • NemoClaw: v0.0.29
  • OpenShell (cluster): 0.0.36
  • OpenClaw bundled: 2026.4.9 (downgraded from base-image 2026.4.24 via local Dockerfile workaround per 01-bundled-openclaw-version-mismatch.md)
  • Inference engine under test: Ollama (Docker, ollama/ollama:latest, ARM64), OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_PARALLEL=2, OLLAMA_KEEP_ALIVE=0, --gpus all
  • Inference engine that resolves the issue: vLLM (Docker, vllm/vllm-openai:latest, ARM64), --enable-auto-tool-choice --tool-call-parser hermes
  • Model: Hermes-3-Llama-3.1-8B (Q4_K_M GGUF on Ollama / FP16 safetensors on vLLM — both reproduce / resolve respectively)
  • Tools active in repro session: sessions_send, memory_search, web_fetch, exec (default agent toolset for rtfm sandbox)

Debug Output

Output of `nemoclaw debug --quick --sandbox <sandbox>` captured 2026-04-29 18:47 UTC, full capture at [`debug-output-2026-04-29-1847.txt`](./debug-output-2026-04-29-1847.txt). Focused excerpt below — note `provider: ollama-local` and `model: hermes3:8b` (the configuration that triggers the symptom):


$ nemoclaw --version
nemoclaw v0.0.29

═══ Onboard Session ═══

  "provider": "ollama-local",
  "model": "hermes3:8b",
  "endpointUrl": "http://host.openshell.internal:11435/v1",
  "preferredInferenceApi": "openai-completions",
  "policyPresets": ["npm","pypi","huggingface","brew","brave","local-inference"],
  "lastCompletedStep": "policies",
  "failure": null

═══ Docker ═══

ollama/ollama:latest                       (running, 127.0.0.1:11434->11434)
ghcr.io/nvidia/openshell/cluster:0.0.36    (running, 0.0.0.0:8080->30051) [openshell-cluster-nemoclaw]

═══ OpenShell ═══

Server:  https://127.0.0.1:8080  Status: Connected  Version: 0.0.36
Sandbox: <sandbox>  Namespace: openshell  Phase: Ready  Revision: 7


The bug is not a build/health failure — `nemoclaw debug --quick` reports a healthy stack. The symptom only surfaces inside the agent loop, captured under "Logs" below.

Logs

Captured 2026-04-29 19:17–19:21 UTC from a live agent session (rtfm sandbox, four user prompts). All four assistant messages from the OpenClaw session log (`/sandbox/.openclaw-data/agents/main/sessions/<sessionId>.jsonl`) have `content` as stringified JSON tool-call, no `tool_calls` field, `stopReason: "stop"`:


{"type":"message","id":"d0ab96a5","timestamp":"2026-04-29T19:17:14.840Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"hello?\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-976"}}
{"type":"message","id":"f2e95dc7","timestamp":"2026-04-29T19:20:22.687Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"What is 2 + 2?\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-326"}}
{"type":"message","id":"7dd727ce","timestamp":"2026-04-29T19:20:55.344Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"query\": \"Tell me one fact about robotics in one sentence.\"\n },\n \"name\": \"memory_search\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-678"}}
{"type":"message","id":"af9d6e2a","timestamp":"2026-04-29T19:21:32.981Z","message":{"role":"assistant","content":[{"type":"text","text":"{\n \"arguments\": {\n \"message\": \"ok\"\n },\n \"name\": \"sessions_send\"\n}"}],"api":"openai-completions","model":"hermes3:8b","stopReason":"stop","responseId":"chatcmpl-134"}}


Note `content[0].type: "text"` and `text` containing a literal stringified JSON object with `arguments` and `name` keys — the exact shape the Hermes-3 chat template emits when the `tool_calls` extraction in Ollama doesn't fire. There is no error message in any log surface (gateway log, sandbox log, ollama-auth-proxy log, or `nemoclaw <sandbox> logs`) — the gateway treats the assistant message as a normal text reply.

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: inferenceInference routing, serving, model selection, or outputsarea: local-modelsLocal model providers, downloads, launch, or connectivityarea: providersInference provider integrations and provider behaviorarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryprovider: ollamaOllama local model provider behavior
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions