Skip to content

fix: strip response-only reasoning fields from OpenAI Completions req…#82101

Merged
steipete merged 3 commits into
mainfrom
fix/strip-reasoning-details-from-completions-replay
May 15, 2026
Merged

fix: strip response-only reasoning fields from OpenAI Completions req…#82101
steipete merged 3 commits into
mainfrom
fix/strip-reasoning-details-from-completions-replay

Conversation

@sliverp

@sliverp sliverp commented May 15, 2026

Copy link
Copy Markdown
Member

Summary

  • Problem: After the first assistant turn in a session, every subsequent QQ-bot reply failed with ⚠️ Something went wrong while processing your request. Gateway logs showed OpenRouter returning HTTP 500 for the second and later requests, regardless of which model was selected (z-ai/glm-5.1, glm-5v-turbo, deepseek-v4-flash all reproduced).
  • Why it matters: A single successful turn permanently poisons the session — every replay request carries the prior assistant message, so the bot is effectively unusable after one round. The 500 looks like an upstream outage and is hard to diagnose.
  • What changed: In buildOpenAICompletionsParams, after convertMessages and injectToolCallThoughtSignatures, walk the outgoing message list and delete reasoning_details, reasoning_content, and reasoning from any assistant message before it hits the wire.
  • What did NOT change (scope boundary): No change to the Anthropic Messages path, the OpenAI Responses path, the QQBot extension, transcript persistence, or transport-message-transform.ts. Only the Chat Completions request builder is affected. User messages are untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: QQ bot replies with the generic external-run-failure text starting from the second turn of any session; gateway logs show repeated HTTP 500 from OpenRouter on the Completions endpoint.
  • Real environment tested: Local OpenClaw gateway on Linux, QQBot extension connected to a live QQ group, OpenRouter as the provider (routes hit AtlasCloud / Novita / Parasail).
  • Exact steps or command run after this patch:
    1. pnpm build and restart gateway.
    2. Send a message in QQ → bot replies normally.
    3. Send a second message in the same session → bot replies normally (this is the previously broken turn).
    4. Continue multi-turn for ~5 messages to confirm the failure mode does not return.
  • Evidence after fix: QQ chat now returns successive replies (e.g. cold-joke prompt followed by a second prompt both return real model output instead of the warning text). Gateway log no longer shows status=500 from OpenRouter; prompt.submitted events succeed back-to-back.
  • Observed result after fix: Every turn in the session produces a real reply; no recurrence of the warning across multiple sessions and multiple models.
  • What was not tested: Anthropic Messages path, the OpenAI Responses path, and providers that natively echo reasoning_details and expect them on input (none known).
  • Before evidence (optional but encouraged): Captured request body that triggered the 500 saved as /tmp/openclaw-bad-request-latest.json (84 KB); replaying it via curl to OpenRouter reproduced the 500. Removing only the reasoning_details field flipped the same request to 200.

Root Cause (if applicable)

  • Root cause: OpenRouter (and several other OpenAI-Completions-compatible providers) echo reasoning_details (array) and reasoning_content (string) inside response choices[].message to expose the model's chain of thought. Downstream message builders preserved those fields when the assistant turn was persisted into the conversation history. On the next turn, convertMessages (pi-ai) serialized the assistant message back into the request body verbatim, so the request now carried e.g. "reasoning_details": "<stringified array echo>". The Chat Completions input schema does not accept this field, and OpenRouter responds with a generic HTTP 500 instead of a 4xx parse error — which is why this looked like an upstream outage rather than a client-side bug.
  • Missing detection / guardrail: There was no last-mile sanitizer between convertMessages and the wire to drop response-only fields from input messages. We had stripCompletionMessagesToRoleContent for strictMessageKeys providers, but the default path let anything through.
  • Contributing context (if known): The error was non-deterministic in appearance — the first turn always worked because the history contained only a user message; the failure surfaced from the second turn onward, which made it look like a transient or rate-limit issue. Switching models did not help because the offending field came from our own request body, not the provider.

How the reproducing curl was produced

To rule out a transport bug vs. a body bug, I added temporary instrumentation in src/agents/provider-transport-fetch.ts that wrote every outgoing request body to /tmp/openclaw-bad-request-latest.json along with status, headers, and the response body. After triggering one bad turn from QQ, that file held the exact JSON body the gateway had POSTed.

I then replayed it as-is:

export OPENROUTER_API_KEY=sk-or-v1-...   # replace with your own key

# BAD: assistant message carries `reasoning_details` (string) -> reproduces HTTP 500
curl -sS -o /dev/null -w 'BAD  -> %{http_code}\n' \
  https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role":"user","content":"hello"},
      {"role":"assistant","content":null,
       "reasoning_details":"The user said hello. I should greet back.",
       "tool_calls":[{"id":"call_demo_1","type":"function",
                      "function":{"name":"noop","arguments":"{}"}}]},
      {"role":"tool","tool_call_id":"call_demo_1","content":"{\"ok\":true}"},
      {"role":"user","content":"say one more thing"}
    ]
  }'

# GOOD: same conversation with `reasoning_details` removed -> returns 200
curl -sS -o /dev/null -w 'GOOD -> %{http_code}\n' \
  https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role":"user","content":"hello"},
      {"role":"assistant","content":null,
       "tool_calls":[{"id":"call_demo_1","type":"function",
                      "function":{"name":"noop","arguments":"{}"}}]},
      {"role":"tool","tool_call_id":"call_demo_1","content":"{\"ok\":true}"},
      {"role":"user","content":"say one more thing"}
    ]
  }'

Then I ran a binary search over the message list / fields:

  • Drop messages[2] (the assistant turn with the echoed reasoning) → 200.
  • Keep messages[2] but remove reasoning_details only → 200.
  • Keep messages[2] but coerce reasoning_details from a string back to its original array shape → 200.
  • Keep only reasoning_details (string) and a minimal user turn → 500.

That isolated reasoning_details (as a stringified echo) as the sole trigger, and was the basis for the strip list (reasoning_details, reasoning_content, reasoning). The instrumentation was reverted; only the sanitizer ships.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/openai-transport-stream.test.ts — new describe("buildOpenAICompletionsParams strips response-only reasoning fields") block (2 cases).
  • Scenario the test should lock in:
    1. An assistant message that carries reasoning_details / reasoning_content / reasoning (any of the three) must produce a request whose corresponding message has none of those fields.
    2. The same fields on a user message must be left alone (they are user input, not response echo).
  • Why this is the smallest reliable guardrail: The defect lives entirely in the request-builder path; a unit test on buildOpenAICompletionsParams exercises it deterministically without needing a network or a live provider. Higher-level tests would only flake or require provider mocks.
  • Existing test that already covers this: None.
  • If no new test is added, why not: N/A — tests are added.

User-visible / Behavior Changes

QQ-bot (and any other channel that hits the OpenAI Completions transport) no longer fails with ⚠️ Something went wrong … from the second turn onward. No config or default change.

Diagram (if applicable)

Before:
[turn 1] user -> request OK -> assistant reply (with reasoning_details echo persisted)
[turn 2] user + assistant(turn1, with reasoning_details) -> OpenRouter 500 -> warning text in QQ

After:
[turn 1] user -> request OK -> assistant reply (with reasoning_details echo persisted)
[turn 2] user + assistant(turn1) -> [strip reasoning_* on assistant] -> OpenRouter 200 -> real reply

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (same endpoint, same auth, payload only loses three fields).
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Local Node gateway, QQBot extension
  • Model/provider: OpenRouter (verified across z-ai/glm-5.1, glm-5v-turbo, deepseek-v4-flash; routed via AtlasCloud / Novita / Parasail)
  • Integration/channel (if any): QQBot
  • Relevant config (redacted): default OpenAI-Completions transport, no strictMessageKeys override

Steps

  1. Start a QQ session with the bot, send any message — succeeds.
  2. Send a second message in the same session — previously failed with the warning text, now succeeds.
  3. Repeat several turns and across model switches.

Expected

  • Each turn returns a real model reply.

Actual

  • Each turn returns a real model reply; gateway no longer logs OpenRouter 500.

Evidence

  • Failing test/log before + passing after (captured request body + curl replay before; passing unit tests after).
  • Trace/log snippets (gateway 500 logs before; clean prompt.submitted after).
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • Two-plus consecutive QQ turns in the same session.
    • curl replay of the captured request body before vs. after stripping reasoning_details (500 → 200).
    • New unit tests pass.
  • Edge cases checked:
    • User messages carrying the same field names are not modified.
    • Assistant messages without any reasoning field are not modified.
    • Both string and array shapes of reasoning_details are handled (we delete the field outright).
  • What I did not verify:
    • Anthropic Messages path and OpenAI Responses path (out of scope; not affected by the change).
    • Providers that hypothetically require reasoning_details as input (none known).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: A future provider may legitimately accept reasoning_details on input as part of a thought-replay protocol; stripping would silently lose context.
    • Mitigation: The strip is scoped to the OpenAI Chat Completions request builder only, not the Anthropic or Responses paths. If such a provider appears, the strip can be gated on the provider compat profile (the same place strictMessageKeys is wired). Documented in the inline comment above the strip function.
  • Risk: Different provider echoes a different response-only field name we didn't list.
    • Mitigation: The list is centralized in COMPLETIONS_RESPONSE_ONLY_MESSAGE_FIELDS; adding a name is a one-line change. The proper long-term fix is to make pi-ai's convertMessages drop these at the serializer level.

…uests

Prevents providers like OpenRouter from returning HTTP 500 errors when replayed assistant messages include fields such as `reasoning_details`.
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 15, 2026
@clawsweeper

clawsweeper Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Summary
The PR adds Chat Completions assistant-message reasoning replay sanitization and focused transport tests for OpenAI/OpenRouter request payloads.

Reproducibility: yes. Without running tests in this read-only review, the source path is clear: the PR deletes reasoning_content for thinkingFormat values other than openrouter, while existing Z.AI and DeepSeek tests expect that field to survive the shared builder path.

Real behavior proof
Not applicable: The contributor proof gate does not apply to this member-authored PR, though the body provides useful live QQ/OpenRouter curl and log evidence for the target bug.

Next step before merge
A narrow automated repair can adjust the sanitizer and regression tests without a product or security decision.

Security
Cleared: No security or supply-chain regression was found; the diff only changes local Chat Completions payload shaping and tests, with no new dependencies, scripts, permissions, or secret handling.

Review findings

  • [P2] Preserve provider-owned reasoning_content replay — src/agents/openai-transport-stream.ts:2468-2470
Review details

Best possible solution:

Make the sanitizer compat-aware enough to strip unsupported stock OpenAI replay fields while leaving provider-owned reasoning_content paths for Z.AI and DeepSeek wrappers, and keep the OpenRouter valid-shape preservation tests.

Do we have a high-confidence way to reproduce the issue?

Yes. Without running tests in this read-only review, the source path is clear: the PR deletes reasoning_content for thinkingFormat values other than openrouter, while existing Z.AI and DeepSeek tests expect that field to survive the shared builder path.

Is this the best way to solve the issue?

No. The updated PR is closer than the first commit, but the sanitizer is still too broad; the narrow maintainable fix is to preserve or defer provider-owned reasoning replay fields for zai and deepseek while still normalizing the invalid OpenRouter string shape.

Full review comments:

  • [P2] Preserve provider-owned reasoning_content replay — src/agents/openai-transport-stream.ts:2468-2470
    With preserveOpenRouterReasoning false, this call deletes reasoning_content from every non-OpenRouter assistant replay message. Z.AI preserved thinking and DeepSeek V4 enabled-thinking replay intentionally rely on that field, so the shared sanitizer would remove native reasoning before those provider wrappers can preserve it, causing follow-up tool/thinking sessions to lose the documented replay shape.
    Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.92

Acceptance criteria:

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "buildOpenAICompletionsParams sanitizes reasoning replay fields"
  • node scripts/run-vitest.mjs extensions/zai/index.test.ts -t "preserves replayed reasoning_content for Z.AI preserved thinking"
  • node scripts/run-vitest.mjs extensions/deepseek/index.test.ts -t "preserves replayed reasoning_content when DeepSeek V4 thinking is enabled"

What I checked:

Likely related people:

  • steipete: Recent current-main commits repeatedly touch the OpenAI-compatible transport and compat surfaces, and the PR follow-up commit 8d3d9753a6ba adjusted this exact sanitizer after the first review. (role: recent area contributor and PR follow-up author; confidence: high; commits: 06e79b2762ce, 325d9ca7cb94, 8d3d9753a6ba; files: src/agents/openai-transport-stream.ts, src/agents/openai-completions-compat.ts)
  • vincentkoc: Merged prior OpenRouter reasoning_details parser/order work in the same transport area, including the related closed PR linked from this discussion. (role: prior OpenRouter reasoning contributor and merger; confidence: high; commits: 68502c90d1fb; files: src/agents/openai-transport-stream.ts, src/agents/openai-transport-stream.test.ts, src/agents/openai-completions-compat.ts)
  • lsdsjy: Introduced DeepSeek V4 support and provider-owned reasoning_content replay behavior that this sanitizer can regress. (role: DeepSeek V4 thinking contributor; confidence: medium; commits: 7d1891e6e63d; files: extensions/deepseek/index.ts, extensions/deepseek/index.test.ts, docs/providers/deepseek.md)

Remaining risk / open question:

  • Merging as-is would likely regress provider-owned preserved-thinking replay for Z.AI and DeepSeek V4 even though the OpenRouter-specific target path is improved.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0e5f4ea18c29.

@sliverp

sliverp commented May 15, 2026

Copy link
Copy Markdown
Member Author
Clipboard_Screenshot_1778839219

@openclaw openclaw deleted a comment from chatgpt-codex-connector Bot May 15, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer follow-up is pushed and verified.

Changed:

  • Kept this scoped to the openai-completions / Chat Completions request builder.
  • Stock OpenAI Chat Completions now strips replayed reasoning_details, reasoning_content, reasoning, and reasoning_text fields from assistant history before the request hits the wire.
  • OpenRouter keeps valid reasoning replay shapes: string reasoning, string reasoning_content, and array reasoning_details.
  • Invalid leaked OpenRouter string reasoning_details and reasoning_text are normalized to reasoning instead of being sent as bad shapes.
  • Added the maintainer changelog entry with contributor credit.

Tested:

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -> 260 passed.
  • Codex review skill on the local PR diff -> no actionable findings.
  • Live stock OpenAI Chat Completions (gpt-5.4-mini) with patched replay params -> HTTP 200, assistant replay keys were content,role.
  • Live OpenRouter negative control (openai/gpt-5.4-nano) with raw string reasoning_details -> reproduced HTTP 500.
  • Live OpenRouter patched payload (openai/gpt-5.4-nano) -> HTTP 200, assistant replay keys were content,reasoning,role.
  • GitHub CI on head ef1204dfffd63a2105971d67dd5a886293db20e2 -> green (pending: 0, bad: 0).

Thanks @sliverp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants