Skip to content

Anthropic thinking blocks with expired signatures cause crash loop on replay #88932

@conini

Description

@conini

Problem

When running with thinking=adaptive (or any thinking mode) against Anthropic models, extended thinking blocks with cryptographically signed replay signatures accumulate in the session transcript. These signatures are time-limited and single-use.

The current transcript hygiene strips thinking blocks with missing, empty, or blank signatures:

Thinking blocks with missing, empty, or blank replay signatures are stripped before provider conversion.

However, it does not strip thinking blocks where the signature is present but expired/invalid. When these are replayed to the Anthropic Messages API, the API rejects the entire request:

messages.9.content.6: Invalid signature in thinking block

This causes a cascade:

  1. Every subsequent LLM call fails because the full history is replayed
  2. The corrupted transcript is persisted to disk
  3. Container restarts reload the same bad transcript → crash loop
  4. No automatic recovery without manual session reset

Expected Behavior

Historical thinking blocks should be stripped from replay regardless of signature state. The Anthropic API uses signatures for single-use replay within the same conversation turn; there is no valid reason to replay thinking blocks from earlier turns. All type: "thinking" blocks in historical assistant messages should be removed before provider conversion.

Actual Behavior

Thinking blocks with expired but non-empty signatures pass through transcript hygiene and are sent to the Anthropic API, which rejects them.

Reproduction

  1. Run an agent with thinking=adaptive on Anthropic Claude
  2. Have a long-running session (10+ turns with thinking responses)
  3. Wait for signatures to expire (or restart the container after some time)
  4. Next LLM call fails with Invalid signature in thinking block
  5. All subsequent calls fail — session is bricked

Impact

  • Two production outages in our setup: 1.5h and 8h downtime
  • Only recoverable via manual session transcript deletion
  • Affects any long-running session using thinking mode with Anthropic

Suggested Fix

In the Anthropic provider adapter's transcript hygiene, change the stripping condition from "missing/empty/blank signature" to "all thinking blocks in historical turns":

// Before: only strips missing/blank
content.filter(b => !(b.type === 'thinking' && (!b.signature || b.signature.trim() === '')))

// After: strip ALL thinking blocks from historical assistant messages
content.filter(b => b.type !== 'thinking')

The current-turn thinking block (if any) should still be preserved for tool-call continuations within the same turn, per existing logic.

Environment

  • OpenClaw: 2026.5.27
  • Model: anthropic/claude-opus-4-6
  • Thinking mode: adaptive

Workaround

Manual session reset to delete all transcript files and restart the gateway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions