fix(agents): classify expired thinking signatures by BryanTegomoh · Pull Request #88072 · openclaw/openclaw

BryanTegomoh · 2026-05-29T16:32:37Z

Summary

Classify Anthropic expired thinking-signature rejections as replay_invalid so the existing recovery path can strip stale thinking blocks and retry.

Add the missing Invalid signature in thinking block patterns to the replay-invalid classifier.
Keep generic Invalid signature errors out of replay recovery.
Add regression coverage for the exact wrapped Anthropic payload shape from the issue.

Linked context

Closes #88020

Real behavior proof (required for external PRs)

Behavior addressed: Anthropic Invalid signature in thinking block errors are now classified as replay_invalid instead of unclassified, allowing the existing stale-thinking replay recovery path to run.
Real environment tested: Local OpenClaw source checkout on macOS, current origin/main base, Node with the repo tsx loader, direct classifier import from src/agents/embedded-agent-helpers/errors.ts.
Exact steps or command run after this patch:

node --import tsx - <<'EOF'
import { classifyProviderRuntimeFailureKind } from './src/agents/embedded-agent-helpers/errors.ts';

const payload = '{"type":"error","error":{"type":"invalid_request_error","message":"messages.1.content.440: Invalid `signature` in `thinking` block"}}';
console.log(`expired-thinking-signature=${classifyProviderRuntimeFailureKind(payload)}`);
console.log(`generic-invalid-signature=${classifyProviderRuntimeFailureKind('Invalid signature')}`);
EOF

Evidence after fix:

expired-thinking-signature=replay_invalid
generic-invalid-signature=unclassified

Observed result after fix: The issue payload enters the replay_invalid path, while a generic invalid-signature message remains unclassified.
What was not tested: A live 45-60 minute Anthropic extended-thinking session that waits for provider-side signature expiry.
Proof limitations or environment constraints: The live expiry condition is time-dependent and provider-side. This PR verifies the exact post-rejection classifier boundary that gates the existing recovery retry.
Before evidence: Issue [Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry #88020 shows the same Anthropic payload hard-failing because the classifier did not match it.

Tests and validation

node scripts/run-vitest.mjs src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts
pnpm exec oxfmt --check --threads=1 src/agents/embedded-agent-helpers/errors.ts src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts
node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/embedded-agent-helpers/errors.ts src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts
pnpm changed:lanes --json
pnpm check:changed
git diff --check

Regression coverage added:

Wrapped Anthropic invalid_request_error with Invalid signature in thinking block classifies as replay_invalid.
ValidationException: invalid signature on thinking block classifies as replay_invalid.
Generic Invalid signature does not classify as replay_invalid.

Risk checklist

Did user-visible behavior change? Yes
Did config, environment, or migration behavior change? No
Did security, auth, secrets, network, or tool execution behavior change? No

Highest-risk area: Provider runtime failure classification.
Mitigation: The new match requires both signature and thinking-block language, and the test proves generic invalid-signature errors do not enter replay recovery.

Current review state

Next action: Maintainer review.
Waiting on: CI and any maintainer request for live Anthropic expiry proof.
Bot or reviewer comments addressed: None yet.

chatgpt-codex-connector · 2026-05-29T16:32:43Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

clawsweeper · 2026-05-29T16:34:51Z

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 12:38 PM ET / 16:38 UTC.

Summary
The PR expands the provider replay-invalid classifier for Anthropic thinking-signature errors and adds regression coverage for the wrapped payload plus a generic-invalid-signature guard.

PR surface: Source 0, Tests +14. Total +14 across 2 files.

Reproducibility: yes. Current main lacks the signature/thinking-block replay-invalid match, and the linked issue provides the exact Anthropic payload that falls through the current classifier.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

none.

Risk before merge

[P1] A live 45-60 minute Anthropic expiry session was not rerun; merge confidence comes from the exact reported payload, source path, existing replay sanitization coverage, and the PR's direct classifier proof.

Maintainer options:

Decide the mitigation before merge
Land the focused classifier and regression-test patch once CI is acceptable; request live Anthropic retry proof only if maintainers need end-to-end expiry assurance.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

No repair lane is needed because the branch already contains the focused code and test change; maintainer review and CI are the remaining path.

Security
Cleared: The diff only changes a local error-classification regex and colocated tests; I found no concrete security or supply-chain concern.

Review details

Best possible solution:

Land the focused classifier and regression-test patch once CI is acceptable; request live Anthropic retry proof only if maintainers need end-to-end expiry assurance.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main lacks the signature/thinking-block replay-invalid match, and the linked issue provides the exact Anthropic payload that falls through the current classifier.

Is this the best way to solve the issue?

Yes. Extending the existing replay-invalid classifier with a narrow signature plus thinking-block match, while preserving the generic Invalid signature negative case, is the smallest maintainable fix path.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dc7bd4abf556.

Label changes

Label changes:

add P1: The linked bug hard-fails active Anthropic extended-thinking sessions and kills the workflow instead of using the existing replay recovery path.
add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real source checkout directly exercising the changed classifier boundary, which is the behavior this patch changes.
add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix terminal output from a real source checkout directly exercising the changed classifier boundary, which is the behavior this patch changes.

Label justifications:

P1: The linked bug hard-fails active Anthropic extended-thinking sessions and kills the workflow instead of using the existing replay recovery path.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix terminal output from a real source checkout directly exercising the changed classifier boundary, which is the behavior this patch changes.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real source checkout directly exercising the changed classifier boundary, which is the behavior this patch changes.

Evidence reviewed

PR surface:

Source 0, Tests +14. Total +14 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	1	1	0
Tests	1	14	0	+14
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	15	1	+14

What I checked:

Current classifier gap: Current main's REPLAY_INVALID_RE handles prior replay-state patterns but has no signature/thinking-block pattern, so the reported Anthropic payload is not covered before this PR. (src/agents/embedded-agent-helpers/errors.ts:355, dc7bd4abf556)
Replay-invalid classification path: classifyProviderRuntimeFailureKind returns replay_invalid only when isReplayInvalidErrorMessage matches the message, making the missing regex term the gating bug. (src/agents/embedded-agent-helpers/errors.ts:1053, dc7bd4abf556)
Existing recovery boundary: Replay-history sanitization already calls stripInvalidThinkingSignatures for signed-thinking providers, and the thinking helper intentionally relies on providers to reject opaque expired signatures. (src/agents/embedded-agent-runner/replay-history.ts:738, dc7bd4abf556)
Existing Anthropic thinking coverage: Current tests already prove invalid Anthropic/Bedrock thinking signatures are stripped during replay sanitization once the recovery path is reached. (src/agents/embedded-agent-runner.sanitize-session-history.test.ts:1589, dc7bd4abf556)
PR diff coverage: The PR adds the signature/thinking regex terms and tests the exact wrapped Anthropic invalid_request_error payload, a ValidationException wording, and the negative generic Invalid signature case. (src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts:1561, 794dbaf4dcdc)
Real behavior proof supplied: The PR body includes after-fix terminal output from a direct classifier invocation showing expired-thinking-signature=replay_invalid and generic-invalid-signature=unclassified. (794dbaf4dcdc)

Likely related people:

joshavant: Local blame on current main attributes the replay-invalid regex, classifyProviderRuntimeFailureKind path, and stripInvalidThinkingSignatures comment block to the same existing agent-helper source commit; history is shallow/grafted, so this is a routing signal rather than a full authorship trail. (role: current classifier and replay-sanitization area contributor; confidence: medium; commits: ab84c8cc0949; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-runner/thinking.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Takhoffman · 2026-05-30T12:49:33Z

@clawsweeper visualize

clawsweeper · 2026-05-30T12:49:35Z

🦞👀
ClawSweeper visual brief is being prepared.

I queued a read-only visual pass. It will create or update one marker-backed visual brief comment and will not trigger close, merge, repair, label, or branch changes.

Lens: auto

clawsweeper · 2026-05-30T12:50:58Z

Source: #88072 (comment)
Visual model: gpt-5.5, reasoning low.

Visual brief

Lens: flow
Advisory only: maintainers remain the final judges.

PR: #88072
Linked bug: #88020

Current broken path from issue #88020

Long Anthropic thinking session
        |
        v
Provider rejects old thinking block signature
        |
        v
"Invalid `signature` in `thinking` block"
        |
        v
Classifier: unclassified ❌
        |
        v
No replay cleanup
        |
        v
Session hard-fails 🐛

Proposed path in PR #88072

Same provider rejection
        |
        v
"Invalid `signature` in `thinking` block"
        |
        v
Classifier: replay_invalid ✅
        |
        v
Existing stale-thinking cleanup path
stripInvalidThinkingSignatures(...)
        |
        v
Retry request ✅

Classifier boundary

Input pattern                                      Result
------------------------------------------------- ---------------
Invalid `signature` in `thinking` block            replay_invalid ✅
ValidationException: invalid signature on thinking block
                                                   replay_invalid ✅
Invalid signature                                  unclassified ✅

Changed surface

src/agents/embedded-agent-helpers/errors.ts
  REPLAY_INVALID_RE gains thinking-signature patterns

src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts
  Adds regression coverage for:
  - wrapped Anthropic invalid_request_error payload ✅
  - ValidationException wording ✅
  - generic Invalid signature guard ✅

Proof map

Reported bug proof:
  Real long-session failures, 3 times, provider-side expiry path 🐛

PR proof:
  Exact rejected payload now classifies as replay_invalid ✅
  Generic invalid-signature text does not enter replay recovery ✅

Not proven:
  Live 45-60 minute Anthropic expiry retry after patch ⚠️

Maintainer judgment point 🧑‍⚖️

Benefit:
  Small classifier change may restore existing recovery for a high-impact
  session-state failure.

Risk:
  Replay recovery expands to thinking-signature errors; guard test limits this
  to messages containing signature + thinking/thinking block language.

Decision hinge:
  Is exact payload classifier proof enough, or is live provider-expiry proof
  required before accepting the behavior change?

Legend: ✅ expected/proven; ❌ broken path; ⚠️ unresolved concern; 🐛 confirmed bug path; 🧑‍⚖️ maintainer judgment point.

Maintainer ruling

Benefit: Routes the reported Anthropic expired-thinking-signature rejection into the existing replay recovery path.
Risk: Classifier broadening could over-route provider errors if future wording overlaps, though the generic Invalid signature guard reduces that risk.
Proof needed: Optional live 45-60 minute Anthropic expiry retry proof if maintainers require end-to-end provider validation.
Recommended next action: Maintainer review of the classifier boundary and whether supplied proof is sufficient.
Question presented: Should exact rejected-payload proof be accepted for this XS recovery-path classifier fix?

Takhoffman · 2026-05-30T13:09:55Z

@clawsweeper automerge

clawsweeper · 2026-05-30T13:09:59Z

🦞🔧
ClawSweeper saw the passing review, but the PR needs another repair pass before merge.

Source: clawsweeper[bot]
Feedback: - No repair lane is needed because the branch already contains the focused code and test change; maintainer review and CI are the remaining path.; Cleared: The diff only changes a local error-classification regex and colocated tests; I found no concrete security or supply-chain concern. (sha=794dbaf4dcdc577b3d8e076b27e5fe270b1ea87d); later maintainer automerge opt-in approves landing the canonical PR; failed required checks before automerge: check-additional-boundaries-bcd:FAILURE
Action: repair worker queued. Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589
Model: gpt-5.5

I will update this PR branch, or open a safe credited replacement, if the repair worker finds a narrow CI fix.

Automerge progress:

2026-05-30 13:09:55 UTC review queued 794dbaf4dcdc (queued)

2026-05-30 13:10:44 UTC repair queued 794dbaf4dcdc (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589

2026-05-30 13:11:59 UTC repair started (running) in 0s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 automerge-openclaw-openclaw-88072

2026-05-30 13:12:13 UTC validation plan (passed) in 15s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 pnpm check:changed; pnpm lint; pnpm check:test-types

2026-05-30 13:12:26 UTC Codex write preflight (passed) in 28s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 danger-full-access

2026-05-30 13:20:17 UTC Codex edit 1 eba1e4634661 (complete) in 8m 18s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 exit 0

2026-05-30 13:26:14 UTC validation and review 1 57c80d951512 (base moved) in 14m 16s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 rebased

2026-05-29 16:38:44 UTC review passed 794dbaf4dcdc (- No repair lane is needed because the branch already contains the focused code...)

2026-05-30 13:26:48 UTC repair finished 57c80d951512 (opened) in 14m 49s Run: https://github.com/openclaw/clawsweeper/actions/runs/26684662589 open_fix_pr

clawsweeper · 2026-05-30T13:26:46Z

ClawSweeper 🐠 reef update

Thanks for the work here. ClawSweeper could not write to the source branch, so it opened a replacement PR rather than letting the fix drift. attribution still points back here.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #88340
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
Closing this one because the run was configured to close superseded source PRs after opening the replacement.
Credit follows the fix over to the replacement PR. no sneaky treasure grab.
Co-author credit kept:

@BryanTegomoh: Co-authored-by: Bryan Tegomoh, MD, MPH 67350434+BryanTegomoh@users.noreply.github.com

fish notes: model gpt-5.5, reasoning high; reviewed against 57c80d9.

fix(agents): classify expired thinking signatures

794dbaf

openclaw-barnacle Bot added agents Agent runtime and tooling size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 29, 2026

clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 30, 2026

clawsweeper Bot mentioned this pull request May 30, 2026

fix(agents): classify expired thinking signatures #88340

Merged

clawsweeper Bot closed this May 30, 2026

bryanbaer mentioned this pull request May 30, 2026

[Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry #88020

Closed

clawsweeper Bot mentioned this pull request Jun 4, 2026

fix(agents): strip stale compaction thinking signatures before Anthropic replay #90163

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): classify expired thinking signatures#88072

fix(agents): classify expired thinking signatures#88072
BryanTegomoh wants to merge 1 commit into
openclaw:mainfrom
BryanTegomoh:bryan/fix-anthropic-thinking-replay-invalid

BryanTegomoh commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Takhoffman commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 30, 2026

Uh oh!

Takhoffman commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BryanTegomoh commented May 29, 2026

Summary

Linked context

Real behavior proof (required for external PRs)

Tests and validation

Risk checklist

Current review state

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 30, 2026

Visual brief

Maintainer ruling

Uh oh!

Takhoffman commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 29, 2026 •

edited

Loading

clawsweeper Bot commented May 30, 2026 •

edited

Loading

clawsweeper Bot commented May 30, 2026 •

edited

Loading