fix: strip response-only reasoning fields from OpenAI Completions req… by sliverp · Pull Request #82101 · openclaw/openclaw

sliverp · 2026-05-15T09:55:41Z

Summary

Problem: After the first assistant turn in a session, every subsequent QQ-bot reply failed with ⚠️ Something went wrong while processing your request. Gateway logs showed OpenRouter returning HTTP 500 for the second and later requests, regardless of which model was selected (z-ai/glm-5.1, glm-5v-turbo, deepseek-v4-flash all reproduced).
Why it matters: A single successful turn permanently poisons the session — every replay request carries the prior assistant message, so the bot is effectively unusable after one round. The 500 looks like an upstream outage and is hard to diagnose.
What changed: In buildOpenAICompletionsParams, after convertMessages and injectToolCallThoughtSignatures, walk the outgoing message list and delete reasoning_details, reasoning_content, and reasoning from any assistant message before it hits the wire.
What did NOT change (scope boundary): No change to the Anthropic Messages path, the OpenAI Responses path, the QQBot extension, transcript persistence, or transport-message-transform.ts. Only the Chat Completions request builder is affected. User messages are untouched.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior or issue addressed: QQ bot replies with the generic external-run-failure text starting from the second turn of any session; gateway logs show repeated HTTP 500 from OpenRouter on the Completions endpoint.
Real environment tested: Local OpenClaw gateway on Linux, QQBot extension connected to a live QQ group, OpenRouter as the provider (routes hit AtlasCloud / Novita / Parasail).
Exact steps or command run after this patch:
1. pnpm build and restart gateway.
2. Send a message in QQ → bot replies normally.
3. Send a second message in the same session → bot replies normally (this is the previously broken turn).
4. Continue multi-turn for ~5 messages to confirm the failure mode does not return.
Evidence after fix: QQ chat now returns successive replies (e.g. cold-joke prompt followed by a second prompt both return real model output instead of the warning text). Gateway log no longer shows status=500 from OpenRouter; prompt.submitted events succeed back-to-back.
Observed result after fix: Every turn in the session produces a real reply; no recurrence of the warning across multiple sessions and multiple models.
What was not tested: Anthropic Messages path, the OpenAI Responses path, and providers that natively echo reasoning_details and expect them on input (none known).
Before evidence (optional but encouraged): Captured request body that triggered the 500 saved as /tmp/openclaw-bad-request-latest.json (84 KB); replaying it via curl to OpenRouter reproduced the 500. Removing only the reasoning_details field flipped the same request to 200.

Root Cause (if applicable)

Root cause: OpenRouter (and several other OpenAI-Completions-compatible providers) echo reasoning_details (array) and reasoning_content (string) inside response choices[].message to expose the model's chain of thought. Downstream message builders preserved those fields when the assistant turn was persisted into the conversation history. On the next turn, convertMessages (pi-ai) serialized the assistant message back into the request body verbatim, so the request now carried e.g. "reasoning_details": "<stringified array echo>". The Chat Completions input schema does not accept this field, and OpenRouter responds with a generic HTTP 500 instead of a 4xx parse error — which is why this looked like an upstream outage rather than a client-side bug.
Missing detection / guardrail: There was no last-mile sanitizer between convertMessages and the wire to drop response-only fields from input messages. We had stripCompletionMessagesToRoleContent for strictMessageKeys providers, but the default path let anything through.
Contributing context (if known): The error was non-deterministic in appearance — the first turn always worked because the history contained only a user message; the failure surfaced from the second turn onward, which made it look like a transient or rate-limit issue. Switching models did not help because the offending field came from our own request body, not the provider.

How the reproducing curl was produced

To rule out a transport bug vs. a body bug, I added temporary instrumentation in src/agents/provider-transport-fetch.ts that wrote every outgoing request body to /tmp/openclaw-bad-request-latest.json along with status, headers, and the response body. After triggering one bad turn from QQ, that file held the exact JSON body the gateway had POSTed.

I then replayed it as-is:

export OPENROUTER_API_KEY=sk-or-v1-...   # replace with your own key

# BAD: assistant message carries `reasoning_details` (string) -> reproduces HTTP 500
curl -sS -o /dev/null -w 'BAD  -> %{http_code}\n' \
  https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role":"user","content":"hello"},
      {"role":"assistant","content":null,
       "reasoning_details":"The user said hello. I should greet back.",
       "tool_calls":[{"id":"call_demo_1","type":"function",
                      "function":{"name":"noop","arguments":"{}"}}]},
      {"role":"tool","tool_call_id":"call_demo_1","content":"{\"ok\":true}"},
      {"role":"user","content":"say one more thing"}
    ]
  }'

# GOOD: same conversation with `reasoning_details` removed -> returns 200
curl -sS -o /dev/null -w 'GOOD -> %{http_code}\n' \
  https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role":"user","content":"hello"},
      {"role":"assistant","content":null,
       "tool_calls":[{"id":"call_demo_1","type":"function",
                      "function":{"name":"noop","arguments":"{}"}}]},
      {"role":"tool","tool_call_id":"call_demo_1","content":"{\"ok\":true}"},
      {"role":"user","content":"say one more thing"}
    ]
  }'

Then I ran a binary search over the message list / fields:

Drop messages[2] (the assistant turn with the echoed reasoning) → 200.
Keep messages[2] but remove reasoning_details only → 200.
Keep messages[2] but coerce reasoning_details from a string back to its original array shape → 200.
Keep only reasoning_details (string) and a minimal user turn → 500.

That isolated reasoning_details (as a stringified echo) as the sole trigger, and was the basis for the strip list (reasoning_details, reasoning_content, reasoning). The instrumentation was reverted; only the sanitizer ships.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/openai-transport-stream.test.ts — new describe("buildOpenAICompletionsParams strips response-only reasoning fields") block (2 cases).
Scenario the test should lock in:
1. An assistant message that carries reasoning_details / reasoning_content / reasoning (any of the three) must produce a request whose corresponding message has none of those fields.
2. The same fields on a user message must be left alone (they are user input, not response echo).
Why this is the smallest reliable guardrail: The defect lives entirely in the request-builder path; a unit test on buildOpenAICompletionsParams exercises it deterministically without needing a network or a live provider. Higher-level tests would only flake or require provider mocks.
Existing test that already covers this: None.
If no new test is added, why not: N/A — tests are added.

User-visible / Behavior Changes

QQ-bot (and any other channel that hits the OpenAI Completions transport) no longer fails with ⚠️ Something went wrong … from the second turn onward. No config or default change.

Diagram (if applicable)

Before:
[turn 1] user -> request OK -> assistant reply (with reasoning_details echo persisted)
[turn 2] user + assistant(turn1, with reasoning_details) -> OpenRouter 500 -> warning text in QQ

After:
[turn 1] user -> request OK -> assistant reply (with reasoning_details echo persisted)
[turn 2] user + assistant(turn1) -> [strip reasoning_* on assistant] -> OpenRouter 200 -> real reply

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No (same endpoint, same auth, payload only loses three fields).
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux
Runtime/container: Local Node gateway, QQBot extension
Model/provider: OpenRouter (verified across z-ai/glm-5.1, glm-5v-turbo, deepseek-v4-flash; routed via AtlasCloud / Novita / Parasail)
Integration/channel (if any): QQBot
Relevant config (redacted): default OpenAI-Completions transport, no strictMessageKeys override

Steps

Start a QQ session with the bot, send any message — succeeds.
Send a second message in the same session — previously failed with the warning text, now succeeds.
Repeat several turns and across model switches.

Expected

Each turn returns a real model reply.

Actual

Each turn returns a real model reply; gateway no longer logs OpenRouter 500.

Evidence

Failing test/log before + passing after (captured request body + curl replay before; passing unit tests after).
Trace/log snippets (gateway 500 logs before; clean prompt.submitted after).
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- Two-plus consecutive QQ turns in the same session.
- curl replay of the captured request body before vs. after stripping reasoning_details (500 → 200).
- New unit tests pass.
Edge cases checked:
- User messages carrying the same field names are not modified.
- Assistant messages without any reasoning field are not modified.
- Both string and array shapes of reasoning_details are handled (we delete the field outright).
What I did not verify:
- Anthropic Messages path and OpenAI Responses path (out of scope; not affected by the change).
- Providers that hypothetically require reasoning_details as input (none known).

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: A future provider may legitimately accept reasoning_details on input as part of a thought-replay protocol; stripping would silently lose context.
- Mitigation: The strip is scoped to the OpenAI Chat Completions request builder only, not the Anthropic or Responses paths. If such a provider appears, the strip can be gated on the provider compat profile (the same place strictMessageKeys is wired). Documented in the inline comment above the strip function.
Risk: Different provider echoes a different response-only field name we didn't list.
- Mitigation: The list is centralized in COMPLETIONS_RESPONSE_ONLY_MESSAGE_FIELDS; adding a name is a one-line change. The proper long-term fix is to make pi-ai's convertMessages drop these at the serializer level.

…uests Prevents providers like OpenRouter from returning HTTP 500 errors when replayed assistant messages include fields such as `reasoning_details`.

clawsweeper · 2026-05-15T09:58:46Z

Codex review: needs changes before merge.

Summary
The PR adds Chat Completions assistant-message reasoning replay sanitization and focused transport tests for OpenAI/OpenRouter request payloads.

Reproducibility: yes. Without running tests in this read-only review, the source path is clear: the PR deletes reasoning_content for thinkingFormat values other than openrouter, while existing Z.AI and DeepSeek tests expect that field to survive the shared builder path.

Real behavior proof
Not applicable: The contributor proof gate does not apply to this member-authored PR, though the body provides useful live QQ/OpenRouter curl and log evidence for the target bug.

Next step before merge
A narrow automated repair can adjust the sanitizer and regression tests without a product or security decision.

Security
Cleared: No security or supply-chain regression was found; the diff only changes local Chat Completions payload shaping and tests, with no new dependencies, scripts, permissions, or secret handling.

Review findings

[P2] Preserve provider-owned reasoning_content replay — src/agents/openai-transport-stream.ts:2468-2470

Review details

Best possible solution:

Make the sanitizer compat-aware enough to strip unsupported stock OpenAI replay fields while leaving provider-owned reasoning_content paths for Z.AI and DeepSeek wrappers, and keep the OpenRouter valid-shape preservation tests.

Do we have a high-confidence way to reproduce the issue?

Yes. Without running tests in this read-only review, the source path is clear: the PR deletes reasoning_content for thinkingFormat values other than openrouter, while existing Z.AI and DeepSeek tests expect that field to survive the shared builder path.

Is this the best way to solve the issue?

No. The updated PR is closer than the first commit, but the sanitizer is still too broad; the narrow maintainable fix is to preserve or defer provider-owned reasoning replay fields for zai and deepseek while still normalizing the invalid OpenRouter string shape.

Full review comments:

[P2] Preserve provider-owned reasoning_content replay — src/agents/openai-transport-stream.ts:2468-2470
With preserveOpenRouterReasoning false, this call deletes reasoning_content from every non-OpenRouter assistant replay message. Z.AI preserved thinking and DeepSeek V4 enabled-thinking replay intentionally rely on that field, so the shared sanitizer would remove native reasoning before those provider wrappers can preserve it, causing follow-up tool/thinking sessions to lose the documented replay shape.
Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.92

Acceptance criteria:

node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "buildOpenAICompletionsParams sanitizes reasoning replay fields"
node scripts/run-vitest.mjs extensions/zai/index.test.ts -t "preserves replayed reasoning_content for Z.AI preserved thinking"
node scripts/run-vitest.mjs extensions/deepseek/index.test.ts -t "preserves replayed reasoning_content when DeepSeek V4 thinking is enabled"

What I checked:

PR sanitizer scope: The PR calls sanitizeCompletionsReasoningReplayFields for all Chat Completions payloads and only preserves provider reasoning when compat.thinkingFormat === "openrouter"; otherwise it deletes reasoning_content along with other reasoning fields. (src/agents/openai-transport-stream.ts:2468, 8d3d9753a6ba)
Current builder boundary: Current main converts messages, injects tool-call thought signatures, and only strips extra keys for strictMessageKeys; the PR is changing this shared request-builder boundary. (src/agents/openai-transport-stream.ts:2433, 0e5f4ea18c29)
Z.AI preserved-thinking contract: The Z.AI docs state that preserved thinking replays prior reasoning_content when preserveThinking is enabled and thinking is on. Public docs: docs/providers/zai.md. (docs/providers/zai.md:147, 0e5f4ea18c29)
Z.AI regression test contract: The current Z.AI test composes the plugin wrapper with buildOpenAICompletionsParams and expects the outgoing assistant message to keep reasoning_content: "prior reasoning". (extensions/zai/index.test.ts:309, 0e5f4ea18c29)
DeepSeek V4 replay contract: The DeepSeek docs state that V4 thinking/tool follow-up requests expect replayed assistant messages to include reasoning_content. Public docs: docs/providers/deepseek.md. (docs/providers/deepseek.md:88, 0e5f4ea18c29)
DeepSeek regression test contract: The current DeepSeek test expects enabled V4 thinking replay to preserve native reasoning_content, while the wrapper only backfills an empty value if the field has already been removed. (extensions/deepseek/index.test.ts:308, 0e5f4ea18c29)

Likely related people:

steipete: Recent current-main commits repeatedly touch the OpenAI-compatible transport and compat surfaces, and the PR follow-up commit 8d3d9753a6ba adjusted this exact sanitizer after the first review. (role: recent area contributor and PR follow-up author; confidence: high; commits: 06e79b2762ce, 325d9ca7cb94, 8d3d9753a6ba; files: src/agents/openai-transport-stream.ts, src/agents/openai-completions-compat.ts)
vincentkoc: Merged prior OpenRouter reasoning_details parser/order work in the same transport area, including the related closed PR linked from this discussion. (role: prior OpenRouter reasoning contributor and merger; confidence: high; commits: 68502c90d1fb; files: src/agents/openai-transport-stream.ts, src/agents/openai-transport-stream.test.ts, src/agents/openai-completions-compat.ts)
lsdsjy: Introduced DeepSeek V4 support and provider-owned reasoning_content replay behavior that this sanitizer can regress. (role: DeepSeek V4 thinking contributor; confidence: medium; commits: 7d1891e6e63d; files: extensions/deepseek/index.ts, extensions/deepseek/index.test.ts, docs/providers/deepseek.md)

Remaining risk / open question:

Merging as-is would likely regress provider-owned preserved-thinking replay for Z.AI and DeepSeek V4 even though the OpenRouter-specific target path is improved.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0e5f4ea18c29.

sliverp · 2026-05-15T10:00:34Z

steipete · 2026-05-15T10:59:00Z

Maintainer follow-up is pushed and verified.

Changed:

Kept this scoped to the openai-completions / Chat Completions request builder.
Stock OpenAI Chat Completions now strips replayed reasoning_details, reasoning_content, reasoning, and reasoning_text fields from assistant history before the request hits the wire.
OpenRouter keeps valid reasoning replay shapes: string reasoning, string reasoning_content, and array reasoning_details.
Invalid leaked OpenRouter string reasoning_details and reasoning_text are normalized to reasoning instead of being sent as bad shapes.
Added the maintainer changelog entry with contributor credit.

Tested:

node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -> 260 passed.
Codex review skill on the local PR diff -> no actionable findings.
Live stock OpenAI Chat Completions (gpt-5.4-mini) with patched replay params -> HTTP 200, assistant replay keys were content,role.
Live OpenRouter negative control (openai/gpt-5.4-nano) with raw string reasoning_details -> reproduced HTTP 500.
Live OpenRouter patched payload (openai/gpt-5.4-nano) -> HTTP 200, assistant replay keys were content,reasoning,role.
GitHub CI on head ef1204dfffd63a2105971d67dd5a886293db20e2 -> green (pending: 0, bad: 0).

Thanks @sliverp.

fix: strip response-only reasoning fields from OpenAI Completions req…

02275e8

…uests Prevents providers like OpenRouter from returning HTTP 500 errors when replayed assistant messages include fields such as `reasoning_details`.

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 15, 2026

openclaw deleted a comment from chatgpt-codex-connector Bot May 15, 2026

fix: preserve valid completions reasoning replay

8d3d975

openclaw-barnacle Bot added size: M and removed size: S labels May 15, 2026

docs: credit OpenRouter reasoning replay fix

ef1204d

steipete merged commit b2dfa98 into main May 15, 2026
114 checks passed

steipete deleted the fix/strip-reasoning-details-from-completions-replay branch May 15, 2026 10:59

This was referenced May 15, 2026

fix(xiaomi): MiMo reasoning models fail multi-turn tool calls (#81419) #81589

Merged

[Bug]: Custom Xiaomi/MiMo provider reasoning models completely broken after 2026.5.x update #82118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: strip response-only reasoning fields from OpenAI Completions req…#82101

fix: strip response-only reasoning fields from OpenAI Completions req…#82101
steipete merged 3 commits into
mainfrom
fix/strip-reasoning-details-from-completions-replay

sliverp commented May 15, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 15, 2026 •

edited

Loading

Uh oh!

sliverp commented May 15, 2026

Uh oh!

steipete commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sliverp commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

How the reproducing curl was produced

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sliverp commented May 15, 2026

Uh oh!

steipete commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sliverp commented May 15, 2026 •

edited

Loading

clawsweeper Bot commented May 15, 2026 •

edited

Loading