fix(embedded-agent): treat stale thinking-block signature as replay-invalid by 0xghost42 · Pull Request #88025 · openclaw/openclaw

0xghost42 · 2026-05-29T13:13:47Z

Summary

When a session with extended thinking runs long enough that early thinking-block signatures expire, Anthropic rejects the next API call with Invalid signature in thinking block. REPLAY_INVALID_RE in src/agents/embedded-agent-helpers/errors.ts did not match that wording, so classifyProviderRuntimeFailureKind returned a non-replay kind and the session hard-failed instead of routing to replay recovery (strip the stale thinking blocks and retry).

Add an \binvalid signature\b.*?\bthinking block\b alternation to REPLAY_INVALID_RE so the message classifies as replay_invalid and the existing recovery path runs.

This is a generic product rule (matches the provider error class, not a hardcoded one-off string); the regex is intentionally scoped to the signature/thinking-block pairing to avoid false positives.

Verification

pnpm test src/agents/embedded-agent-helpers/errors.test.ts — 6 passed (incl. 2 new cases for the bare phrase and the messages.N.content.M: framed variant).

Real behavior proof

Behavior addressed: Anthropic Invalid signature in thinking block now classifies as replay_invalid so the session recovers (strip stale thinking + retry) instead of hard-failing.
Real environment tested: Unit-level via the project test runner (Vitest) against classifyProviderRuntimeFailureKind; no paid live Anthropic call.
Exact steps or command run after this patch: pnpm test src/agents/embedded-agent-helpers/errors.test.ts
Evidence after fix: Test Files 1 passed (1) / Tests 6 passed (6); new assertions classifyProviderRuntimeFailureKind("Invalid signature in thinking block") === "replay_invalid" and the framed-variant both pass.
Observed result after fix: classifier returns replay_invalid for the stale-signature message; prior behavior returned a non-replay kind (no recovery).
What was not tested: a full live long-session reproduction against the Anthropic API (would require an expired-signature session); the change is confined to the error-classification regex and is covered by the unit assertions.

…nvalid Anthropic rejects the next call with 'Invalid signature in thinking block' once early extended-thinking signatures expire on a long session. REPLAY_INVALID_RE did not match it, so classifyProviderRuntimeFailureKind fell through and the session hard-failed instead of routing to replay recovery (strip stale thinking blocks and retry). Add the pattern so the message classifies as replay_invalid.

clawsweeper · 2026-05-29T13:15:44Z

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 9:20 AM ET / 13:20 UTC.

Summary
The PR extends embedded-agent provider failure classification to mark stale Anthropic thinking-block signature errors as replay_invalid and adds focused classifier unit coverage.

PR surface: Source 0, Tests +19. Total +19 across 2 files.

Reproducibility: yes. for the PR defect: evaluating the proposed regex against the linked issue's quoted Anthropic error payload shows it does not match. I did not establish a live long-session Anthropic reproduction in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Update the regex and tests to cover the exact backticked Anthropic payload from the linked issue.
[P1] Add redacted real behavior proof, such as runtime logs or terminal output showing the after-fix recovery path, or obtain a maintainer proof override.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body provides only unit-test output and explicitly no live Anthropic/runtime recovery proof; the contributor should add redacted terminal output, logs, or another real artifact, then update the PR body to trigger re-review or ask for @clawsweeper re-review.

Risk before merge

[P1] Merging as-is may leave the linked long-session Anthropic failure unresolved because the exact quoted provider payload still does not match the new classifier pattern.
[P1] The contributor proof is unit-only; there is no redacted runtime log, terminal output, or live/provider artifact showing after-fix session recovery.

Maintainer options:

Decide the mitigation before merge
Update the classifier and regression test to cover the exact quoted Anthropic payload from [Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry #88020, then provide real after-fix behavior proof or get an explicit maintainer proof override.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P1] The PR needs contributor or maintainer follow-up because the code fix is narrow but the missing real behavior proof cannot be supplied by ClawSweeper automation.

Security
Cleared: The diff only changes a provider-error classification regex and unit tests; I found no concrete security or supply-chain concern.

Review findings

[P1] Match the quoted Anthropic signature error — src/agents/embedded-agent-helpers/errors.ts:356

Review details

Best possible solution:

Update the classifier and regression test to cover the exact quoted Anthropic payload from #88020, then provide real after-fix behavior proof or get an explicit maintainer proof override.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR defect: evaluating the proposed regex against the linked issue's quoted Anthropic error payload shows it does not match. I did not establish a live long-session Anthropic reproduction in this read-only review.

Is this the best way to solve the issue?

No: the approach is narrow, but it needs to match both quoted and unquoted provider wording and prove the exact linked payload. The safer repair is to broaden the alternation around optional punctuation/backticks and add the exact reported payload as a regression case.

Full review comments:

[P1] Match the quoted Anthropic signature error — src/agents/embedded-agent-helpers/errors.ts:356
The linked report's actual payload is messages.1.content.440: Invalid `signature` in `thinking` block, but this alternation requires literal invalid signature and thinking block with spaces. That means the reported provider error still falls through instead of becoming replay_invalid; please cover the quoted-word shape in the regex and test the exact payload.
Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.93

AGENTS.md: found and applied where relevant.

Codex review notes: reasoning high; reviewed against 7fb91317ba2d.

Label changes

Label justifications:

P1: The PR is intended to fix a high-impact agent session recovery regression where Anthropic extended-thinking sessions can hard-fail instead of recovering.
rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides only unit-test output and explicitly no live Anthropic/runtime recovery proof; the contributor should add redacted terminal output, logs, or another real artifact, then update the PR body to trigger re-review or ask for @clawsweeper re-review.

Evidence reviewed

PR surface:

Source 0, Tests +19. Total +19 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	1	1	0
Tests	1	20	1	+19
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	21	2	+19

What I checked:

Root and scoped policy read: Read the full root AGENTS.md and src/agents/AGENTS.md; the OpenClaw review policy and agent test guidance were applied to proof expectations and the agent runtime surface review. (AGENTS.md:1, 7fb91317ba2d)
Current classifier surface: Current main uses REPLAY_INVALID_RE as the replay-invalid classifier gate, and the current pattern does not include the Anthropic stale thinking-signature wording. (src/agents/embedded-agent-helpers/errors.ts:355, 7fb91317ba2d)
Proposed regex on PR branch: The PR branch adds \binvalid signature\b.*?\bthinking block\b, which only matches the unquoted phrase form added in the new tests. (src/agents/embedded-agent-helpers/errors.ts:356, b4f533eaeceb)
Exact reported payload still misses: A direct regex check showed the PR pattern returns false for messages.1.content.440: Invalid signatureinthinking block, the exact shape quoted by the linked issue, while returning true for the unquoted test strings. (src/agents/embedded-agent-helpers/errors.ts:356, b4f533eaeceb)
Recovery path context: The Anthropic thinking recovery wrapper already recognizes broader thinking/signature errors and retries once without thinking blocks, so the classifier fix should align with the same quoted provider wording rather than only the simplified phrase. (src/agents/embedded-agent-runner/thinking.ts:11, 7fb91317ba2d)
PR proof is unit-only: The PR body reports pnpm test src/agents/embedded-agent-helpers/errors.test.ts and explicitly says no full live long-session Anthropic reproduction was tested. (b4f533eaeceb)

Likely related people:

steipete: Local blame and git log -S 'REPLAY_INVALID_RE' both point the current classifier and adjacent recovery files at commit c4a5bba by Peter Steinberger; the local history is collapsed to that grafted source commit, so this is a routing signal rather than exclusive ownership. (role: recent area contributor; confidence: medium; commits: c4a5bba800d7; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-runner/thinking.ts, src/agents/embedded-agent-runner/run/attempt.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(embedded-agent): treat stale thinking-block signature as replay-invalid#88025

fix(embedded-agent): treat stale thinking-block signature as replay-invalid#88025
0xghost42 wants to merge 1 commit into
openclaw:mainfrom
0xghost42:fix/replay-invalid-thinking-signature

0xghost42 commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

0xghost42 commented May 29, 2026

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 29, 2026 •

edited

Loading