-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Extended thinking sessions permanently broken after gateway restart / cache miss — no recovery for research agents #90667
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.Can lose, corrupt, or silently drop user/session/config data.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now failsBehavior that previously worked and now fails
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.Can lose, corrupt, or silently drop user/session/config data.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now failsBehavior that previously worked and now fails
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Summary
Extended thinking sessions on claude-sonnet-4-6 become permanently broken after a gateway restart or Anthropic prompt cache expiry, with every subsequent request rejected instantly due to an invalid thinkingSignature in replayed history.
Steps to reproduce
Steps to reproduce
Expected behavior
Expected behavior
Prior to the gateway restart or cache expiry, the same agent and session accepted new messages and responded normally — the session should continue functioning after either event.
Actual behavior
Actual behavior
Every message after the restart/expiry fails instantly (~300–400ms, zero tokens consumed). User-visible error: LLM request failed: provider rejected the request schema or tool payload. Gateway logs show: messages.N.content.N: Invalid signature in thinking block. The session never recovers — each subsequent turn fails identically. Manual recovery requires deleting the session .jsonl transcript file, losing all conversation context.
OpenClaw version
Operating system
macOS 25.3.0 (arm64)
Install method
Homebrew — /opt/homebrew/lib/node_modules/openclaw
Model
anthropic/claude-sonnet-4-6
Provider / routing chain
Direct Anthropic API via OpenClaw embedded runner — no proxy, no router, no fallback chain active at time of failure.
Additional provider/model setup details
Agent configured with thinkingLevel: high (or adaptive). No per-agent model overrides. Standard Anthropic auth via access token. Issue confirmed on at least two separate agents in the same environment on the same day.
Logs, screenshots, and evidence
Impact and severity
Impact and severity
Additional information
Additional information
The recovery helper wrapAnthropicStreamWithRecovery exists in src/agents/pi-embedded-runner/thinking.ts:431 and is designed to handle this exact error class — it strips thinking blocks and retries. It has no production caller for the direct Anthropic / claude-sonnet-4-6 path in 2026.6.1. The session file repaired repair pass fires on failure but does not cover this error class. Regression window unknown — no last known good version available from this environment.