Skip to content

[Bug]: Extended thinking sessions permanently broken after gateway restart / cache miss — no recovery for research agents #90667

@MIHHHMIH

Description

@MIHHHMIH

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Summary
Extended thinking sessions on claude-sonnet-4-6 become permanently broken after a gateway restart or Anthropic prompt cache expiry, with every subsequent request rejected instantly due to an invalid thinkingSignature in replayed history.

Steps to reproduce

Steps to reproduce

Configure an agent with thinkingLevel: high or thinking: adaptive on claude-sonnet-4-6
Run a multi-turn session with at least one tool call (web search, filter, etc.) that produces a thinking block in the transcript
Restart the gateway, or leave the session idle long enough for Anthropic's prompt cache to expire
Send any new message to the agent

Expected behavior

Expected behavior
Prior to the gateway restart or cache expiry, the same agent and session accepted new messages and responded normally — the session should continue functioning after either event.

Actual behavior

Actual behavior
Every message after the restart/expiry fails instantly (~300–400ms, zero tokens consumed). User-visible error: LLM request failed: provider rejected the request schema or tool payload. Gateway logs show: messages.N.content.N: Invalid signature in thinking block. The session never recovers — each subsequent turn fails identically. Manual recovery requires deleting the session .jsonl transcript file, losing all conversation context.

OpenClaw version

  • OpenClaw: 2026.6.1

Operating system

macOS 25.3.0 (arm64)

Install method

Homebrew — /opt/homebrew/lib/node_modules/openclaw

Model

anthropic/claude-sonnet-4-6

Provider / routing chain

Direct Anthropic API via OpenClaw embedded runner — no proxy, no router, no fallback chain active at time of failure.

Additional provider/model setup details

Agent configured with thinkingLevel: high (or adaptive). No per-agent model overrides. Standard Anthropic auth via access token. Issue confirmed on at least two separate agents in the same environment on the same day.


Logs, screenshots, and evidence

runtimeMs: ~300–400ms, tokens used: 0
Error: LLM request failed: provider rejected the request schema or tool payload

Root cause (gateway log): messages.N.content.N: Invalid `signature` in `thinking` block
Session status: failed on every subsequent turn, no recovery without manual .jsonl deletion

Impact and severity

Impact and severity

Affected: Any agent using thinkingLevel: high or thinking: adaptive on claude-sonnet-4-6 with multi-turn sessions involving tool calls. Confirmed on 2 separate agents in the same environment on the same day.
Severity: Blocks workflow completely — agent accepts messages but produces zero output on every turn with no recovery
Frequency: Intermittent trigger, but deterministic once triggered — occurs after gateway restart or Anthropic prompt cache expiry on any session containing thinking blocks
Consequence: Agent is permanently non-functional until session transcript is manually deleted. All accumulated session context is lost on recovery. No user-facing warning before or after failure.

Additional information

Additional information
The recovery helper wrapAnthropicStreamWithRecovery exists in src/agents/pi-embedded-runner/thinking.ts:431 and is designed to handle this exact error class — it strips thinking blocks and retries. It has no production caller for the direct Anthropic / claude-sonnet-4-6 path in 2026.6.1. The session file repaired repair pass fires on failure but does not cover this error class. Regression window unknown — no last known good version available from this environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions