Skip to content

fix(thinking): keep explicit session thinkingLevel when runtime downgrades (#87740)#87923

Open
hoobnn wants to merge 1 commit into
openclaw:mainfrom
hoobnn:fix/issue-87740-thinking-level-persist
Open

fix(thinking): keep explicit session thinkingLevel when runtime downgrades (#87740)#87923
hoobnn wants to merge 1 commit into
openclaw:mainfrom
hoobnn:fix/issue-87740-thinking-level-persist

Conversation

@hoobnn

@hoobnn hoobnn commented May 29, 2026

Copy link
Copy Markdown

Summary

Fixes #87740. An explicit session thinkingLevel override (e.g. high) was permanently reset to a supported level (e.g. off) after every agent turn whenever the active model didn't support the stored level.

Root cause: when the stored level was unsupported, the runtime correctly fell back to a supported level for the turn, but also wrote that fallback back onto the persisted session override. The write-back condition fired precisely when the stored value was the user's explicit override, so the explicit choice was clobbered every turn — re-setting it just got overwritten again next turn. Leaving the level inherited/null was unaffected, which matches the report.

Fix: downgrade only the level used for the current turn (resolvedThinkLevel); never persist the support fallback onto the stored override. The same duplicated write-back lived in both the reply path (src/auto-reply/reply/get-reply-run.ts) and the agent-command path (src/agents/agent-command.ts); both are fixed. Removed the now-dead loadSessionStoreRuntime loader in get-reply-run.ts. Production code shrinks (-17 / -23 lines). An explicit off set via directive/sessions.patch still persists through the normal path; only the involuntary support downgrade no longer persists.

Out of scope (sibling sites)

Two sites also persist a support-driven downgrade onto the stored level, but only on a user-initiated model switch (not the per-turn reset this issue reports), and one already notifies the client:

  • src/gateway/sessions-patch.ts:604 — a PATCH that changes model (without an explicit thinkingLevel) downgrades an inherited unsupported level and persists it (no notification).
  • src/auto-reply/reply/directive-handling.persist.ts:326 — a /model switch remaps the stored level and persists it, returning a thinkingRemap notice.

These are deliberately coded (explicit-vs-inherited branching); whether a model switch should preserve the original preference is a product/compat decision, so they're intentionally left out. Follow-up: #87925.

Verification

  • pnpm test src/agents/agent-command.live-model-switch.test.ts — 27 passed, incl. new regression test preserves an explicit session thinkingLevel override when the level is unsupported (red before fix, green after).
  • Siblings thinking / directive-handling.levels / commands-status.thinking-default / get-reply.config-override — 58 passed.
  • Regression: directive-handling.model (50) + gateway/sessions-patch (72) green — confirms an explicit off still persists; only the involuntary downgrade stops.
  • pnpm check:changed — clean (oxlint, import cycles, typecheck, guards), exit 0.

Real behavior proof

Behavior addressed: An explicit session thinkingLevel: "high" override is no longer reset to the support fallback after a turn when the active model rejects that level at runtime; the stored override survives the turn.

Real environment tested: A live agent turn against a real OpenAI-compatible Xiaomi MiMo model (xiaomi/mimo-v2.5-pro, real /v1/chat/completions API, genuine replies) using a source build of this branch (pnpm build) in an isolated profile. The model is configured reasoning: false, so high is unsupported at runtime and the per-turn support downgrade fires — reproducing the #87740 condition with a real model. This exercises the embedded agent path (openclaw agent --localsrc/agents/agent-command.ts). The get-reply-run.ts (WebChat) site was not driven through a live inbound channel; it is the byte-identical removal and is covered by the A/B below plus the existing get-reply suites.

A/B method: identical isolated profile, identical session pre-seeded with thinkingLevel: "high", identical model. Only the two fixed source files (src/agents/agent-command.ts, src/auto-reply/reply/get-reply-run.ts) were reverted to the pre-fix parent (2e042fbca8) and rebuilt for the "before" run, then restored and rebuilt for the "after" run — so this PR's change is the only variable.

Exact steps or command run after this patch:

  1. Configure xiaomi/mimo-v2.5-pro with reasoning: false (makes high unsupported at runtime) in an isolated profile; pre-seed session agent:main:main with thinkingLevel: "high" in agents/main/sessions/sessions.json.
  2. XIAOMI_API_KEY=… pnpm openclaw --profile <p> agent --local --to +15555550123 -m "Reply with a single word: …" --json
  3. Re-read agents/main/sessions/sessions.json["agent:main:main"].thinkingLevel before and after the turn.

Evidence after fix (live turns, real model replies, same model xiaomi/mimo-v2.5-pro):

code under test turn result real reply per-turn thinking stored thinkingLevel after turn
before fix (parent 2e042fbca8) success pong off (downgraded) off ← bug reproduced
after fix (this branch) success ok off (downgraded) high ← override preserved

Observed result after fix: with stored thinkingLevel: "high" on a model that rejects it at runtime, the turn runs normally (real reply, meta.requestShaping.thinking: "off") and the stored override stays high. Reverting only the two fixed files flips the stored value back to off, confirming this change is what preserves the override.

What was not tested: a live WebChat/Control UI or Voice UI round-trip against a running gateway — the get-reply-run.ts path was not executed via an inbound channel. It carries the identical removed block and is covered by the A/B above and the existing get-reply suites.

…rades (openclaw#87740)

When a session's stored thinkingLevel is unsupported by the active model, the
runtime fell back to a supported level for the turn AND wrote that fallback back
onto the persisted session override. Because the persistence condition fired
exactly when the stored value was the explicit override, the user's explicit
choice (e.g. "high") was permanently reset to the supported level (e.g. "off")
after every turn — re-setting it just got clobbered again next turn.

Downgrade only the level used for the current turn; never persist the support
fallback onto the stored override. The explicit override is the user's intent
and must survive turns (so it re-applies if a supporting model is used later).
Both the reply path (get-reply-run) and the agent-command path carried the same
duplicated write-back; both are fixed.

Reported by @TitanBob2026.
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 29, 2026
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 12:56 PM ET / 16:56 UTC.

Summary
This PR removes per-turn thinkingLevel fallback persistence from the agent-command and get-reply reply paths and adds an agent-command regression test for preserving an explicit stored override.

PR surface: Source -40, Tests +36. Total -4 across 3 files.

Reproducibility: yes. Current main has a clear source-level reproduction path in both per-turn fallback blocks, and the PR body includes A/B live agent output showing the stored level is clobbered before the fix and preserved after it.

Review metrics: 1 noteworthy metric.

  • Session fallback persistence writes: 2 removed, 0 added. Both per-turn runtime fallback sites stop writing the supported fallback back into stored thinkingLevel, which is the session-state behavior maintainers should notice before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Optional: add a short WebChat or Gateway proof if maintainers want direct coverage of the get-reply path before merge.

Mantis proof suggestion
A short WebChat visual proof would materially reduce the remaining get-reply/session-state confidence gap if maintainers want direct UI-path evidence before merge. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify in WebChat that a stored thinkingLevel high survives a reply when runtime falls back to off.

Risk before merge

  • [P2] The PR intentionally changes persisted session-state semantics: an explicit unsupported stored override remains after a turn instead of being normalized to the turn's supported fallback.
  • [P1] The live proof exercises the embedded agent path; the get-reply/WebChat path is changed by the same source pattern and covered by source/test evidence, but it was not driven through a live inbound WebChat or gateway run.

Maintainer options:

  1. Accept the focused session-state fix (recommended)
    Land this PR once required checks are satisfied, accepting the agent-path live proof plus symmetric source/test coverage for get-reply and leaving model-switch policy to the linked follow-up.
  2. Ask for direct WebChat proof
    If maintainers want direct coverage of the get-reply path before merge, request a short redacted WebChat or Gateway run showing the stored explicit thinkingLevel survives a reply.
  3. Pause for broader policy
    Pause this PR only if maintainers want the model-switch persistence contract resolved in the same change instead of the linked follow-up.

Next step before merge

  • No ClawSweeper repair lane is needed; maintainers should decide whether to accept the focused session-state semantics change and then rely on required checks.

Security
Cleared: The diff only changes TypeScript session-thinking fallback logic and one colocated test; it does not add dependencies, workflows, secret handling, permissions, or code-download paths.

Review details

Best possible solution:

Land the focused per-turn persistence fix after required checks, while keeping model-switch remap policy isolated in #87925.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has a clear source-level reproduction path in both per-turn fallback blocks, and the PR body includes A/B live agent output showing the stored level is clobbered before the fix and preserved after it.

Is this the best way to solve the issue?

Yes for the per-turn reset. The PR removes only the unintended persistence write-back and keeps the turn-local fallback, while the broader model-switch contract remains a separate product decision in #87925.

AGENTS.md: found and applied where relevant.

Codex review notes: reasoning high; reviewed against 67faef01828d.

Label changes

Label justifications:

  • P2: This is a normal-priority fix for a shipped persisted session preference bug with limited but real WebChat/Gateway and agent impact.
  • merge-risk: 🚨 session-state: The diff changes whether runtime thinking fallback mutates persisted session thinkingLevel state across future turns.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes redacted live A/B output from a real model turn showing the stored thinkingLevel changes to off before the fix and stays high after the fix.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted live A/B output from a real model turn showing the stored thinkingLevel changes to off before the fix and stays high after the fix.
Evidence reviewed

PR surface:

Source -40, Tests +36. Total -4 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 2 8 48 -40
Tests 1 38 2 +36
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 46 50 -4

Acceptance criteria:

  • [P1] Contributor reported: pnpm test src/agents/agent-command.live-model-switch.test.ts.
  • [P1] Contributor reported: sibling thinking/directive/get-reply/gateway session tests.
  • [P1] Contributor reported: pnpm check:changed.

What I checked:

  • Repository policy read and applied: Full root AGENTS.md and the scoped src/agents/AGENTS.md were read; session state, persisted preferences, and fallback behavior are compatibility-sensitive review surfaces for this PR. (AGENTS.md:17, 67faef01828d)
  • Current main agent path still persists the fallback: On current main, when the stored/resolved thinking level is unsupported, the agent-command path resolves a fallback and persists it back to sessionEntry.thinkingLevel when it matches the prior stored value. (src/agents/agent-command.ts:1153, 67faef01828d)
  • Current main reply path still persists the fallback: On current main, runPreparedReply performs the same unsupported-level fallback write into sessionEntry, sessionStore, and the backing store file. (src/auto-reply/reply/get-reply-run.ts:883, 67faef01828d)
  • Latest release still contains the bug: The v2026.5.27 release contains the same fallback persistence blocks, matching the linked report's affected latest-release behavior. (src/auto-reply/reply/get-reply-run.ts:883, 27ae826f6525)
  • PR removes both per-turn persistence write-backs: The PR head leaves only the turn-local fallback assignment in both paths, so unsupported runtime fallback no longer mutates the stored session override. (src/agents/agent-command.ts:1200, fbfbeee4bee2)
  • Regression coverage added: The added test seeds a stored thinkingLevel high, mocks the active model as unsupported with fallback off, runs agentCommand, and asserts no persistSessionEntry call writes off over the explicit override. (src/agents/agent-command.live-model-switch.test.ts:950, fbfbeee4bee2)

Likely related people:

  • steipete: git blame on current main attributes the fallback write-back blocks in both touched runtime files and nearby test harness lines to Peter Steinberger-authored commits, making him the strongest routing signal for this session-thinking behavior. (role: recent area contributor; confidence: medium; commits: 3d7df2bc0747, 27ae826f6525; files: src/agents/agent-command.ts, src/auto-reply/reply/get-reply-run.ts, src/agents/agent-command.live-model-switch.test.ts)
  • vincentkoc: recent history on the same agent-command/session-store update surface includes lazy-loading and runtime-boundary work, which is useful routing context even though it is not direct blame for this fallback write-back. (role: adjacent agent runtime contributor; confidence: low; commits: 99755fcb2f1f, dd27aa945e43, f126088761b7; files: src/agents/agent-command.ts, src/agents/agent-command.live-model-switch.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 29, 2026
@hoobnn

hoobnn commented May 29, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 29, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Explicit thinkingLevel session override permanently reset to 'off' after each agent turn

1 participant