fix(thinking): keep explicit session thinkingLevel when runtime downgrades (#87740) by hoobnn · Pull Request #87923 · openclaw/openclaw

hoobnn · 2026-05-29T06:55:42Z

Summary

Fixes #87740. An explicit session thinkingLevel override (e.g. high) was permanently reset to a supported level (e.g. off) after every agent turn whenever the active model didn't support the stored level.

Root cause: when the stored level was unsupported, the runtime correctly fell back to a supported level for the turn, but also wrote that fallback back onto the persisted session override. The write-back condition fired precisely when the stored value was the user's explicit override, so the explicit choice was clobbered every turn — re-setting it just got overwritten again next turn. Leaving the level inherited/null was unaffected, which matches the report.

Fix: downgrade only the level used for the current turn (resolvedThinkLevel); never persist the support fallback onto the stored override. The same duplicated write-back lived in both the reply path (src/auto-reply/reply/get-reply-run.ts) and the agent-command path (src/agents/agent-command.ts); both are fixed. Removed the now-dead loadSessionStoreRuntime loader in get-reply-run.ts. Production code shrinks (-17 / -23 lines). An explicit off set via directive/sessions.patch still persists through the normal path; only the involuntary support downgrade no longer persists.

Out of scope (sibling sites)

Two sites also persist a support-driven downgrade onto the stored level, but only on a user-initiated model switch (not the per-turn reset this issue reports), and one already notifies the client:

src/gateway/sessions-patch.ts:604 — a PATCH that changes model (without an explicit thinkingLevel) downgrades an inherited unsupported level and persists it (no notification).
src/auto-reply/reply/directive-handling.persist.ts:326 — a /model switch remaps the stored level and persists it, returning a thinkingRemap notice.

These are deliberately coded (explicit-vs-inherited branching); whether a model switch should preserve the original preference is a product/compat decision, so they're intentionally left out. Follow-up: #87925.

Verification

pnpm test src/agents/agent-command.live-model-switch.test.ts — 27 passed, incl. new regression test preserves an explicit session thinkingLevel override when the level is unsupported (red before fix, green after).
Siblings thinking / directive-handling.levels / commands-status.thinking-default / get-reply.config-override — 58 passed.
Regression: directive-handling.model (50) + gateway/sessions-patch (72) green — confirms an explicit off still persists; only the involuntary downgrade stops.
pnpm check:changed — clean (oxlint, import cycles, typecheck, guards), exit 0.

Real behavior proof

Behavior addressed: An explicit session thinkingLevel: "high" override is no longer reset to the support fallback after a turn when the active model rejects that level at runtime; the stored override survives the turn.

Real environment tested: A live agent turn against a real OpenAI-compatible Xiaomi MiMo model (xiaomi/mimo-v2.5-pro, real /v1/chat/completions API, genuine replies) using a source build of this branch (pnpm build) in an isolated profile. The model is configured reasoning: false, so high is unsupported at runtime and the per-turn support downgrade fires — reproducing the #87740 condition with a real model. This exercises the embedded agent path (openclaw agent --local → src/agents/agent-command.ts). The get-reply-run.ts (WebChat) site was not driven through a live inbound channel; it is the byte-identical removal and is covered by the A/B below plus the existing get-reply suites.

A/B method: identical isolated profile, identical session pre-seeded with thinkingLevel: "high", identical model. Only the two fixed source files (src/agents/agent-command.ts, src/auto-reply/reply/get-reply-run.ts) were reverted to the pre-fix parent (2e042fbca8) and rebuilt for the "before" run, then restored and rebuilt for the "after" run — so this PR's change is the only variable.

Exact steps or command run after this patch:

Configure xiaomi/mimo-v2.5-pro with reasoning: false (makes high unsupported at runtime) in an isolated profile; pre-seed session agent:main:main with thinkingLevel: "high" in agents/main/sessions/sessions.json.
XIAOMI_API_KEY=… pnpm openclaw --profile <p> agent --local --to +15555550123 -m "Reply with a single word: …" --json
Re-read agents/main/sessions/sessions.json → ["agent:main:main"].thinkingLevel before and after the turn.

Evidence after fix (live turns, real model replies, same model xiaomi/mimo-v2.5-pro):

code under test	turn result	real reply	per-turn thinking	stored `thinkingLevel` after turn
before fix (parent `2e042fbca8`)	success	`pong`	`off` (downgraded)	`off` ← bug reproduced
after fix (this branch)	success	`ok`	`off` (downgraded)	`high` ← override preserved

Observed result after fix: with stored thinkingLevel: "high" on a model that rejects it at runtime, the turn runs normally (real reply, meta.requestShaping.thinking: "off") and the stored override stays high. Reverting only the two fixed files flips the stored value back to off, confirming this change is what preserves the override.

What was not tested: a live WebChat/Control UI or Voice UI round-trip against a running gateway — the get-reply-run.ts path was not executed via an inbound channel. It carries the identical removed block and is covered by the A/B above and the existing get-reply suites.

@TitanBob2026

…rades (openclaw#87740) When a session's stored thinkingLevel is unsupported by the active model, the runtime fell back to a supported level for the turn AND wrote that fallback back onto the persisted session override. Because the persistence condition fired exactly when the stored value was the explicit override, the user's explicit choice (e.g. "high") was permanently reset to the supported level (e.g. "off") after every turn — re-setting it just got clobbered again next turn. Downgrade only the level used for the current turn; never persist the support fallback onto the stored override. The explicit override is the user's intent and must survive turns (so it re-applies if a supporting model is used later). Both the reply path (get-reply-run) and the agent-command path carried the same duplicated write-back; both are fixed. Reported by @TitanBob2026.

clawsweeper · 2026-05-29T07:02:07Z

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 12:56 PM ET / 16:56 UTC.

Summary
This PR removes per-turn thinkingLevel fallback persistence from the agent-command and get-reply reply paths and adds an agent-command regression test for preserving an explicit stored override.

PR surface: Source -40, Tests +36. Total -4 across 3 files.

Reproducibility: yes. Current main has a clear source-level reproduction path in both per-turn fallback blocks, and the PR body includes A/B live agent output showing the stored level is clobbered before the fix and preserved after it.

Review metrics: 1 noteworthy metric.

Session fallback persistence writes: 2 removed, 0 added. Both per-turn runtime fallback sites stop writing the supported fallback back into stored thinkingLevel, which is the session-state behavior maintainers should notice before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Optional: add a short WebChat or Gateway proof if maintainers want direct coverage of the get-reply path before merge.

Mantis proof suggestion
A short WebChat visual proof would materially reduce the remaining get-reply/session-state confidence gap if maintainers want direct UI-path evidence before merge. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify in WebChat that a stored thinkingLevel high survives a reply when runtime falls back to off.

Risk before merge

[P2] The PR intentionally changes persisted session-state semantics: an explicit unsupported stored override remains after a turn instead of being normalized to the turn's supported fallback.
[P1] The live proof exercises the embedded agent path; the get-reply/WebChat path is changed by the same source pattern and covered by source/test evidence, but it was not driven through a live inbound WebChat or gateway run.

Maintainer options:

Accept the focused session-state fix (recommended)
Land this PR once required checks are satisfied, accepting the agent-path live proof plus symmetric source/test coverage for get-reply and leaving model-switch policy to the linked follow-up.
Ask for direct WebChat proof
If maintainers want direct coverage of the get-reply path before merge, request a short redacted WebChat or Gateway run showing the stored explicit thinkingLevel survives a reply.
Pause for broader policy
Pause this PR only if maintainers want the model-switch persistence contract resolved in the same change instead of the linked follow-up.

Next step before merge

No ClawSweeper repair lane is needed; maintainers should decide whether to accept the focused session-state semantics change and then rely on required checks.

Security
Cleared: The diff only changes TypeScript session-thinking fallback logic and one colocated test; it does not add dependencies, workflows, secret handling, permissions, or code-download paths.

Review details

Best possible solution:

Land the focused per-turn persistence fix after required checks, while keeping model-switch remap policy isolated in #87925.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has a clear source-level reproduction path in both per-turn fallback blocks, and the PR body includes A/B live agent output showing the stored level is clobbered before the fix and preserved after it.

Is this the best way to solve the issue?

Yes for the per-turn reset. The PR removes only the unintended persistence write-back and keeps the turn-local fallback, while the broader model-switch contract remains a separate product decision in #87925.

AGENTS.md: found and applied where relevant.

Codex review notes: reasoning high; reviewed against 67faef01828d.

Label changes

Label justifications:

P2: This is a normal-priority fix for a shipped persisted session preference bug with limited but real WebChat/Gateway and agent impact.
merge-risk: 🚨 session-state: The diff changes whether runtime thinking fallback mutates persisted session thinkingLevel state across future turns.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes redacted live A/B output from a real model turn showing the stored thinkingLevel changes to off before the fix and stays high after the fix.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted live A/B output from a real model turn showing the stored thinkingLevel changes to off before the fix and stays high after the fix.

Evidence reviewed

PR surface:

Source -40, Tests +36. Total -4 across 3 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	8	48	-40
Tests	1	38	2	+36
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	3	46	50	-4

Acceptance criteria:

[P1] Contributor reported: pnpm test src/agents/agent-command.live-model-switch.test.ts.
[P1] Contributor reported: sibling thinking/directive/get-reply/gateway session tests.
[P1] Contributor reported: pnpm check:changed.

What I checked:

Repository policy read and applied: Full root AGENTS.md and the scoped src/agents/AGENTS.md were read; session state, persisted preferences, and fallback behavior are compatibility-sensitive review surfaces for this PR. (AGENTS.md:17, 67faef01828d)
Current main agent path still persists the fallback: On current main, when the stored/resolved thinking level is unsupported, the agent-command path resolves a fallback and persists it back to sessionEntry.thinkingLevel when it matches the prior stored value. (src/agents/agent-command.ts:1153, 67faef01828d)
Current main reply path still persists the fallback: On current main, runPreparedReply performs the same unsupported-level fallback write into sessionEntry, sessionStore, and the backing store file. (src/auto-reply/reply/get-reply-run.ts:883, 67faef01828d)
Latest release still contains the bug: The v2026.5.27 release contains the same fallback persistence blocks, matching the linked report's affected latest-release behavior. (src/auto-reply/reply/get-reply-run.ts:883, 27ae826f6525)
PR removes both per-turn persistence write-backs: The PR head leaves only the turn-local fallback assignment in both paths, so unsupported runtime fallback no longer mutates the stored session override. (src/agents/agent-command.ts:1200, fbfbeee4bee2)
Regression coverage added: The added test seeds a stored thinkingLevel high, mocks the active model as unsupported with fallback off, runs agentCommand, and asserts no persistSessionEntry call writes off over the explicit override. (src/agents/agent-command.live-model-switch.test.ts:950, fbfbeee4bee2)

Likely related people:

steipete: git blame on current main attributes the fallback write-back blocks in both touched runtime files and nearby test harness lines to Peter Steinberger-authored commits, making him the strongest routing signal for this session-thinking behavior. (role: recent area contributor; confidence: medium; commits: 3d7df2bc0747, 27ae826f6525; files: src/agents/agent-command.ts, src/auto-reply/reply/get-reply-run.ts, src/agents/agent-command.live-model-switch.test.ts)
vincentkoc: recent history on the same agent-command/session-store update surface includes lazy-loading and runtime-boundary work, which is useful routing context even though it is not direct blame for this fallback write-back. (role: adjacent agent runtime contributor; confidence: low; commits: 99755fcb2f1f, dd27aa945e43, f126088761b7; files: src/agents/agent-command.ts, src/agents/agent-command.live-model-switch.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

hoobnn · 2026-05-29T16:42:03Z

@clawsweeper re-review

clawsweeper · 2026-05-29T16:42:07Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26649913202
Updated: 2026-05-29T16:50:22.113Z

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 29, 2026

hoobnn mentioned this pull request May 29, 2026

thinkingLevel: model switch silently downgrades and persists an inherited explicit override #87925

Open

openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 29, 2026

openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(thinking): keep explicit session thinkingLevel when runtime downgrades (#87740)#87923

fix(thinking): keep explicit session thinkingLevel when runtime downgrades (#87740)#87923
hoobnn wants to merge 1 commit into
openclaw:mainfrom
hoobnn:fix/issue-87740-thinking-level-persist

hoobnn commented May 29, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

hoobnn commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hoobnn commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Out of scope (sibling sites)

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoobnn commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hoobnn commented May 29, 2026 •

edited

Loading

clawsweeper Bot commented May 29, 2026 •

edited

Loading

clawsweeper Bot commented May 29, 2026 •

edited

Loading