Skip to content

fix(sglang): preserve reasoning replay history#81091

Merged
steipete merged 2 commits into
openclaw:mainfrom
akrimm702:codex/sglang-preserve-reasoning-replay
May 13, 2026
Merged

fix(sglang): preserve reasoning replay history#81091
steipete merged 2 commits into
openclaw:mainfrom
akrimm702:codex/sglang-preserve-reasoning-replay

Conversation

@akrimm702

@akrimm702 akrimm702 commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an SGLang-owned OpenAI-compatible replay policy instead of inheriting the strict fallback that drops historical reasoning by default.
  • Keeps the existing Gemma 4 protection: Gemma 4 openai-completions model ids still drop historical reasoning.
  • Adds focused provider-plugin regression coverage for both the SGLang/Kimi reasoning case and the Gemma 4 guardrail.

Closes #81058.

Root Cause

SGLang models are registered as openai-completions, but the provider did not own its replay policy. That made it fall through to the core strict OpenAI-compatible fallback, which sets dropReasoningFromHistory: true for unowned openai-completions providers. For reasoning-capable self-hosted SGLang/Kimi models, that can strip replayed reasoning history and contribute to empty user-facing responses.

Fix

SGLang now uses buildProviderReplayFamilyHooks({ family: "openai-compatible", dropReasoningFromHistory: false }), matching the existing provider-owned opt-out pattern used by reasoning-capable OpenAI-compatible providers. The helper still forces dropReasoningFromHistory: true for Gemma 4 model ids, preserving the prior parser-safety behavior.

Regression Proof

I checked the fix before republishing the branch:

  • main at validation base ea7f74ff had no SGLang replay-policy hook.
  • The PR adds the provider-owned replay policy only for SGLang.
  • Regression coverage proves Kimi-style SGLang reasoning history is preserved.
  • The Gemma 4 guardrail remains covered and still drops historical reasoning.
  • Published PR head: d162f0c7cf83187bb578d7f90405e1d6b0123f1b.
  • GitHub currently reports the PR as mergeable against main.

Validation

Latest local validation after rebasing and publishing the PR branch:

  • PNPM_CONFIG_OFFLINE=true pnpm test extensions/sglang/index.test.ts src/plugins/provider-replay-helpers.test.ts src/agents/transcript-policy.test.ts src/agents/pi-embedded-runner.sanitize-session-history.test.ts
  • PNPM_CONFIG_OFFLINE=true pnpm exec oxfmt --check extensions/sglang/index.ts extensions/sglang/index.test.ts
  • PNPM_CONFIG_OFFLINE=true pnpm exec tsc -p extensions/sglang/tsconfig.json --noEmit --pretty false
  • PNPM_CONFIG_OFFLINE=true pnpm check:changed

Risk

Low and provider-scoped. This does not change the global OpenAI-compatible fallback. The main compatibility risk is accidental reintroduction of replayed reasoning for Gemma 4-style chat-completions models, covered by the new test.

Maintainer Verification

Maintainer proof after rebasing and adding changelog:

  • PR branch head: d55083d7e126ce1fc4d576fb841b47efe1c9093e.
  • Local targeted regression: pnpm test extensions/sglang/index.test.ts -- --reporter=verbose -> 2 tests passed.
  • Local whitespace check: git diff --check -> passed.
  • Live SGLang proof: Blacksmith Testbox tbx_01krgt6yqe7bte96dwsmker05p, GitHub Actions run 25803983869.
  • Live server: source-built SGLang CPU, sglang 0.0.0.dev1+g9e00b7ca9.d20260513, torch 2.9.0+cpu, model Qwen/Qwen3-0.6B, /v1/models returned the model id.
  • Direct replay request sent prior assistant reasoning_content; SGLang returned content: "replay ok", non-empty reasoning_content, finish: "stop".
  • PR unit regression on the same Testbox: pnpm test extensions/sglang/index.test.ts -- --reporter=verbose -> 2 tests passed.

Real behavior proof

  • Behavior or issue addressed: SGLang OpenAI-compatible chat completions should preserve replayed assistant reasoning_content for thinking-capable local models instead of stripping reasoning history through the strict fallback.
  • Real environment tested: Blacksmith Testbox tbx_01krgt6yqe7bte96dwsmker05p, GitHub Actions run 25803983869, source-built SGLang CPU server, sglang 0.0.0.dev1+g9e00b7ca9.d20260513, torch 2.9.0+cpu, model Qwen/Qwen3-0.6B served on 127.0.0.1:30000.
  • Exact steps or command run after this patch: Launched python -m sglang.launch_server --model Qwen/Qwen3-0.6B --trust-remote-code --disable-overlap-schedule --device cpu --host 127.0.0.1 --port 30000 --tp 1 --reasoning-parser qwen3 --max-total-tokens 4096, then sent a live curl http://127.0.0.1:30000/v1/chat/completions request containing a prior assistant message with reasoning_content.
  • Evidence after fix: Terminal output from the live run: /v1/models returned Qwen/Qwen3-0.6B; the direct replay response JSON was { "model": "Qwen/Qwen3-0.6B", "content": "replay ok", "reasoning": "Okay, the user wants me to answer exactly: ...", "finish": "stop" }.
  • Observed result after fix: SGLang accepted the replayed reasoning_content, returned visible content replay ok, returned non-empty reasoning_content, and finished with stop; the PR regression test on the same Testbox also passed 2/2.
  • What was not tested: Kimi/GPU SGLang was not tested because AWS GPU spot/on-demand quota blocked g5.xlarge and g4dn.xlarge; CPU SGLang with Qwen3 tested the same OpenAI-compatible reasoning_content replay contract.

@openclaw-barnacle openclaw-barnacle Bot added size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 12, 2026
@akrimm702 akrimm702 force-pushed the codex/sglang-preserve-reasoning-replay branch from 7e0064b to 83771e8 Compare May 12, 2026 16:48
@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR adds an SGLang-owned OpenAI-compatible replay-policy hook and regression tests that preserve non-Gemma reasoning history while keeping the Gemma 4 guardrail.

Reproducibility: no. for the full user-visible empty-response symptom: I did not have a live SGLang/Kimi setup or logs proving the failure and fix end to end. The source path is clear, because SGLang uses openai-completions without a replay hook and current core fallback drops reasoning history for that unowned transport.

Real behavior proof
Needs real behavior proof before merge: The PR body lists local tests and reasoning, but no after-fix terminal output, logs, screenshot/video, or artifact from a real SGLang/Kimi run. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
The remaining blocker is contributor-supplied real behavior proof from their SGLang setup, which automation cannot provide on the PR branch.

Security
Cleared: The diff only adds a provider-local replay hook and tests; it does not touch dependencies, CI, secrets handling, package metadata, or downloaded code paths.

Review details

Best possible solution:

Land the provider-scoped SGLang replay hook after the contributor adds real after-fix proof showing a multi-turn SGLang/Kimi reasoning response produces visible final text while Gemma 4 replay remains protected.

Do we have a high-confidence way to reproduce the issue?

No for the full user-visible empty-response symptom: I did not have a live SGLang/Kimi setup or logs proving the failure and fix end to end. The source path is clear, because SGLang uses openai-completions without a replay hook and current core fallback drops reasoning history for that unowned transport.

Is this the best way to solve the issue?

Yes, based on source inspection this is the narrow owner-boundary fix: SGLang should declare its OpenAI-compatible replay policy through the existing plugin SDK helper instead of changing the global fallback. The added Gemma 4 test preserves the known guardrail, but merge should wait for real behavior proof.

What I checked:

  • Current SGLang transport family: Current main builds SGLang discovered models with api: "openai-completions", which is the transport family affected by the replay fallback. (extensions/sglang/models.ts:20, 652a56fc74a9)
  • Current SGLang plugin has no replay hook: Current main registers SGLang auth, catalog, and wizard metadata but does not define buildReplayPolicy, so it falls through to core fallback behavior. (extensions/sglang/index.ts:61, 652a56fc74a9)
  • Strict fallback drops replayed reasoning: The unowned OpenAI-compatible fallback sets dropReasoningFromHistory: true for strict openai-completions providers, and replay-history applies that flag before replaying session history. (src/agents/transcript-policy.ts:146, 652a56fc74a9)
  • Shared provider hook supports this fix: buildProviderReplayFamilyHooks passes dropReasoningFromHistory through to the OpenAI-compatible replay helper, whose Gemma 4 path still forces dropReasoningFromHistory: true. (src/plugin-sdk/provider-model-shared.ts:187, 652a56fc74a9)
  • PR diff is provider-scoped: The PR adds buildProviderReplayFamilyHooks({ family: "openai-compatible", dropReasoningFromHistory: false }) to SGLang and adds tests for Kimi-style preservation plus Gemma 4 dropping. (extensions/sglang/index.ts:70, d162f0c7cf83)
  • Real behavior proof is missing: The PR body lists local test and check commands, but it does not include live output, terminal proof, logs, screenshot/video, or another artifact showing a real SGLang/Kimi run after the fix. (d162f0c7cf83)

Likely related people:

  • @steipete: The related issue review identifies @steipete as author/committer of f66b1d1, which changed strict OpenAI-compatible replay to drop reasoning by default; local blame/log also show Peter Steinberger as the recent SGLang plugin file author. (role: introduced reported default behavior and recent SGLang area contributor; confidence: high; commits: f66b1d173895, c964da8d58e4; files: src/agents/transcript-policy.ts, src/plugins/provider-replay-helpers.ts, extensions/deepseek/index.ts)
  • @medns: The linked issue discussion identifies @medns as author of e7277b4, the adjacent raw reasoning stream refactor involved in the reported interaction. (role: adjacent reasoning stream refactor author; confidence: high; commits: e7277b4e3a4b; files: src/agents/pi-embedded-runner/run/payloads.ts, src/agents/pi-embedded-subscribe.ts, extensions/telegram/src/reasoning-lane-coordinator.ts)
  • @joshp123: The linked issue review identifies prior work by @joshp123 on OpenAI reasoning replay ids, transcript policy, and session-history sanitizer tests in the same replay-policy area. (role: prior reasoning replay contributor; confidence: medium; commits: 68ea063958ee; files: src/agents/transcript-policy.ts, src/agents/pi-embedded-runner.sanitize-session-history.test.ts, src/agents/openai-responses.reasoning-replay.test.ts)

Remaining risk / open question:

  • No after-fix run against a real SGLang/Kimi setup has been posted, so the user-visible empty-response symptom is source-supported but not behavior-proven here.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 652a56fc74a9.

Re-review progress:

@akrimm702 akrimm702 force-pushed the codex/sglang-preserve-reasoning-replay branch 2 times, most recently from 259b3a7 to 886ad78 Compare May 12, 2026 19:28
@akrimm702 akrimm702 force-pushed the codex/sglang-preserve-reasoning-replay branch 2 times, most recently from ca097af to d162f0c Compare May 12, 2026 19:34
@steipete steipete force-pushed the codex/sglang-preserve-reasoning-replay branch from d162f0c to c530230 Compare May 13, 2026 14:14
steipete added a commit to akrimm702/openclaw that referenced this pull request May 13, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 13, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer verification for landing:

  • PR head after maintainer changelog: c53023095a2e190bdfb975f5eb31859623633d84.
  • Local targeted regression: pnpm test extensions/sglang/index.test.ts -- --reporter=verbose -> 2 passed.
  • Local whitespace: git diff --check -> passed.
  • Live SGLang proof: Blacksmith Testbox tbx_01krgt6yqe7bte96dwsmker05p, Actions run https://github.com/openclaw/openclaw/actions/runs/25803983869.
  • Live result: source-built SGLang CPU served Qwen/Qwen3-0.6B; direct replay request with prior assistant reasoning_content returned content: "replay ok", non-empty reasoning_content, finish: "stop".
  • PR CI: prior code checks on d162f0c7cf83187bb578d7f90405e1d6b0123f1b were green except the body-only Real behavior proof gate; latest c53023095a2e190bdfb975f5eb31859623633d84 reran metadata/body gates and Real behavior proof is now green after adding the live proof section.
  • Known gap: Kimi/GPU SGLang not tested because AWS GPU quota blocked g5.xlarge/g4dn.xlarge; CPU SGLang with Qwen3 covers the same OpenAI-compatible reasoning_content replay contract.

Thanks @akrimm702.

@steipete steipete force-pushed the codex/sglang-preserve-reasoning-replay branch from c530230 to d55083d Compare May 13, 2026 14:18
@steipete steipete merged commit 210c7c1 into openclaw:main May 13, 2026
84 of 88 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

Thanks @akrimm702.

l3ocifer pushed a commit to l3ocifer/frack-openclaw that referenced this pull request May 13, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: sglang proof: supplied External PR includes structured after-fix real behavior proof. size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Empty responses from OpenAI-compatible models with reasoning

2 participants