Skip to content

[Bug]: In-turn reasoning dropped on multi-turn tool replay for non-400 openai models (gemma4/vLLM) — silent agentic-quality regression #91645

@bfox55

Description

@bfox55

Bug type

Behavior bug (incorrect output/state without crash)

Summary

For openai-completions reasoning models whose API does not require reasoning_content on assistant messages (e.g. gemma4 / Gemma-4-12B on vLLM), OpenClaw drops the model's in-turn reasoning when replaying multi-turn tool conversations. Since the provider doesn't 400 on the missing field (unlike DeepSeek/Moonshot/Xiaomi), the failure is silent: no error, just degraded multi-step tool behavior — tool-call arguments intermittently collapse to {}, and the model re-issues identical tool calls because it can't see its own prior reasoning.

This is distinct from the known API-compat issues (#70392, #81419, #91558, #89660, #91106): those inject an empty reasoning_content: "" to satisfy strict provider schemas and avoid a 400. Here the provider is fine without the field — what's missing for quality is the real in-turn reasoning text, so the model keeps continuity across tool calls.

Evidence (captured off the wire)

TCP tee between OpenClaw and vLLM; one multi-turn tool exchange via openclaw agent --local --thinking high. Deepest request OpenClaw sent (system, user, 3× assistant tool-call, 3× tool):

  • ✅ structured tool_calls preserved
  • ✅ tool-call / tool-result IDs match exactly
  • reasoning_content present on 0 / 3 in-turn assistant tool-call messages

The Gemma 4 chat template re-injects in-turn reasoning (reasoning/reasoning_content<|channel>thought…<channel|>, gated to messages after the last user turn — it correctly keeps in-turn reasoning and drops completed-turn reasoning). OpenClaw sends nothing, so re-injection never fires. Across stored trajectories on this install: 13 exec calls with empty arguments (exec requires a command), plus repeated identical tool calls within sessions.

Suspected root cause

reasoningContent appears empty at replay for these models, so the existing populate path never runs:

if (reasoningContent.length > 0) assistantMsg.reasoning_content = reasoningContent.join("\n");
else if (allowReasoningContentReplay) /* inject empty placeholder */

requiresReasoningContentOnAssistantMessages only governs the else (empty-field API-compat shim) and so does not restore real reasoning. The gap looks upstream: in-turn reasoning isn't carried from the session store into reasoningContent for openai-format models that aren't on the DeepSeek/Xiaomi detect list. (Same populate code in 2026.6.1 and 2026.6.5-beta — not a regression.)

Proposed direction

Preserve real in-turn reasoning (reasoning generated since the last user message) on replayed assistant tool-call messages for openai reasoning models, independent of the empty-field API-compat path. This matches what the Gemma model card + chat template assume, and the community reports of the same symptom (Gemma-4-12B tool-calling PSA on r/LocalLLaMA; Qwen3.6 analogue: earendil-works/pi#3325 — "after 2-3 turns every tool call collapses to arguments: {}").

Steps to reproduce

  1. Configure a gemma4 model via api: openai-completions on vLLM, compat.thinkingFormat: "openai", reasoning: true.
  2. Run a multi-step agent task that needs 2+ sequential tool calls, --thinking high.
  3. Capture the request bodies OpenClaw sends to the model (LLM proxy / tee).
  4. Observe: in-turn assistant tool-call messages carry tool_calls but no reasoning_content; over several turns, arguments degrade / tool calls repeat.

Environment

  • OpenClaw 2026.6.1
  • Gemma-4-12B (NVFP4) via vLLM 0.22, api: openai-completions, compat.thinkingFormat: "openai", reasoning: true, --thinking high

Workaround

Make each tool action self-contained (a single call returns the final value — e.g. server-computed totals, or a gog | jq | date one-liner) so turns don't depend on cross-turn reasoning. Restores correctness for those flows but doesn't help genuinely multi-step reasoning.

Metadata

Metadata

Assignees

Labels

P2Normal backlog priority with limited blast radius.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions