Skip to content

[Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry #88020

@bryanbaer

Description

@bryanbaer

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

When a session with extended thinking runs long enough for early thinking block signatures to expire, Anthropic rejects the next API call with Invalid signature in thinking block, but isReplayInvalidErrorMessage() does not match this pattern, so OpenClaw hard-fails the session instead of stripping the stale thinking blocks and retrying.

Steps to reproduce

  1. Start a session with extended thinking enabled (claude-sonnet-4-6 or similar) with thinking: adaptive or explicit thinking level.
  2. Run a long agentic session (~45-60 min, heavy tool use) so early message thinking blocks age out.
  3. Send any new user message.
  4. Observe: Anthropic returns invalid_request_error: messages.1.content.N: Invalid \signature` in `thinking` block`
  5. OpenClaw hard-fails (stopReason: error, totalTokens: 0, runtimeMs: ~300ms) — session is dead.

Expected behavior

OpenClaw should detect Invalid signature in thinking block as a replay_invalid failure kind, strip the stale thinking blocks via stripInvalidThinkingSignatures, and retry the request — same as it does for other replay-invalid patterns like roles must alternate.

Actual behavior

Session hard-fails. Transcript shows:

"stopReason": "error"
"errorMessage": "{\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.1.content.440: Invalid `signature` in `thinking` block\"}}"
"usage": { "input": 0, "output": 0, "totalTokens": 0 }
"runtimeMs": 318

Session is unrecoverable. User must /new.

OpenClaw version

2026.5.27 (27ae826)

Operating system

macOS 25.5.0 Darwin arm64

Install method

npm global (homebrew)

Model

anthropic/claude-sonnet-4-6 (thinking: adaptive)

Provider / routing chain

openclaw -> Anthropic direct (cloudflare tunnel)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

**Root cause** (`dist/errors-DuGAmFy8.js` line 256):


// Current REPLAY_INVALID_RE (missing signature pattern):
const REPLAY_INVALID_RE = /\bprevious_response_id\b.*|\btool_(?:use|call)\.(?:input|arguments)\b.*|\bincorrect role information\b|\broles must alternate\b|\binput item id does not belong to this connection\b/i;


Anthropic returns: `messages.1.content.440: Invalid \`signature\` in \`thinking\` block`
This does **not** match any pattern above.

**Fix** — add two patterns:


const REPLAY_INVALID_RE = /\bprevious_response_id\b.*|\btool_(?:use|call)\.(?:input|arguments)\b.*|\bincorrect role information\b|\broles must alternate\b|\binput item id does not belong to this connection\b|\bInvalid\b.*\bsignature\b.*\bthinking\b|\bsignature\b.*\bthinking\s+block\b/i;


This routes the error to `replay_invalid`, which triggers `stripInvalidThinkingSignatures` + retry.

**Note on `thinking.d.ts` design intent**: `stripInvalidThinkingSignatures` intentionally only strips absent/blank signatures locally and relies on the provider rejecting expired ones. The design is correct — but the rejection error was never added to `REPLAY_INVALID_RE`.

Reproduced 3 times in the same morning (~45-60 min sessions). Patch applied locally to `dist/errors-DuGAmFy8.js` and confirmed gateway loads it correctly.

Impact and severity

  • Affected: Any user with extended thinking enabled on Anthropic models running sessions > ~45 min with heavy tool use
  • Severity: High — blocks workflow; session is dead with no auto-recovery
  • Frequency: Reproducible after sufficient session age; hit 3 times in one morning on the same workload
  • Consequence: User must /new and re-establish context; any in-flight subagent work is orphaned; session cost (~$10) is wasted

Additional information

The 2026.5.27 changelog mentions "strip stale Anthropic thinking" as a fix — that likely covered the absent/blank signature case in stripInvalidThinkingSignatures. This issue is the complementary missing piece: the provider rejection error was not wired into REPLAY_INVALID_RE so the retry path is never reached for cryptographically-expired (but syntactically-present) signatures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.regressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions