fix(agents): classify expired thinking signatures by clawsweeper[bot] · Pull Request #88340 · openclaw/openclaw

clawsweeper · 2026-05-30T13:26:26Z

Makes #88072 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
Known failing checks:

Failing check: check-additional-boundaries-bcd:FAILURE (https://github.com/openclaw/openclaw/actions/runs/26649416864/job/78543703620)

ClawSweeper 🐠 replacement reef notes:

Cluster: automerge-openclaw-openclaw-88072
Source PRs: fix(agents): classify expired thinking signatures #88072
Credit: Source PR: fix(agents): classify expired thinking signatures #88072
Validation: pnpm check:changed
Replacement reason: ClawSweeper could not update the source PR branch directly, so it opened a writable replacement PR instead.
Automerge requested by: @Takhoffman

Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Inherited issue-closing references from the source PR:
Closes #88020

Co-author credit kept:

@BryanTegomoh: Co-authored-by: Bryan Tegomoh, MD, MPH 67350434+BryanTegomoh@users.noreply.github.com

fish notes: model gpt-5.5, reasoning high; reviewed against 57c80d9.

clawsweeper · 2026-05-30T13:27:47Z

Codex review: passed. Reviewed May 31, 2026, 8:10 AM ET / 12:10 UTC.

Summary
The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error events without thinking blocks before output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.

PR surface: Source +57, Tests +177. Total +234 across 6 files.

Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid match, and the linked report supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Optional maintainer confidence booster: add redacted live Anthropic or Bedrock expiry proof if the session-state risk needs evidence beyond the current tests.

Risk before merge

[P1] Merging intentionally changes session continuation for provider errors matching thinking-signature wording; tests cover terminal stream errors and no-retry-after-output, but not a live 45-60 minute provider-side signature expiry session.
[P2] Static catalog fallback model params will now flow through fallback resolution, so existing fallback users could see provider request-parameter changes after upgrade.
[P1] The replay-invalid classifier expands recovery eligibility; the tests constrain generic invalid-signature text, but maintainers still own the provider-error classification boundary.

Maintainer options:

Land With Bounded Recovery Risk (recommended)
Proceed once exact-head checks are green because the branch has focused tests for the classifier and retry boundaries while maintainers accept the missing live-expiry proof.
Pause For Live Provider Proof
Ask for a redacted long-running Anthropic or Bedrock expiry proof if maintainers want evidence beyond the terminal classifier proof and simulated stream-recovery tests.
Narrow The Fallback Params Change
If maintainers do not want provider fallback behavior included in this bug fix, split or remove the static-catalog params change before merge.

Next step before merge

[P2] No repair lane is needed because the reviewed head has no blocking findings; remaining action is exact-head checks, mergeability, and maintainer risk acceptance.

Security
Cleared: The diff does not change workflows, dependencies, credentials, permissions, install scripts, or other code-execution supply-chain surfaces.

Review details

Best possible solution:

Land this replacement branch if exact-head checks stay green and maintainers accept the bounded provider/session risk, then close the linked bug as fixed by the merged PR.

Do we have a high-confidence way to reproduce the issue?

Yes for the classifier boundary: current main lacks a thinking-signature replay-invalid match, and the linked report supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Is this the best way to solve the issue?

Yes; the branch fixes the narrow classifier and stream-recovery gates and adds regression coverage for positive and negative cases. The remaining question is maintainer acceptance of the bounded provider/session merge risk, not a clear code defect.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 94b1427fdfa1.

Label changes

Label justifications:

P1: The linked bug breaks long-running Anthropic thinking sessions by hard-failing instead of recovering, which is an urgent agent workflow regression.
merge-risk: 🚨 compatibility: The branch changes replay recovery and fallback model parameter behavior that existing users may observe during upgrades.
merge-risk: 🚨 auth-provider: The fallback params change affects provider/model request resolution for static catalog fallback users.
merge-risk: 🚨 session-state: The recovery path strips thinking blocks and retries a session turn, which can change continuation behavior after provider-side signature errors.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The source PR body provides terminal output showing the issue payload classifies as replay-invalid after the patch while generic invalid-signature text stays unclassified; live provider expiry proof is explicitly not included.
proof: sufficient: Contributor real behavior proof is sufficient. The source PR body provides terminal output showing the issue payload classifies as replay-invalid after the patch while generic invalid-signature text stays unclassified; live provider expiry proof is explicitly not included.

Evidence reviewed

PR surface:

Source +57, Tests +177. Total +234 across 6 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	3	70	13	+57
Tests	3	177	0	+177
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	6	247	13	+234

What I checked:

Root policy read: Read the full repository AGENTS.md and applied its guidance that provider routing, fallback behavior, auth/session state, and compatibility paths are merge-risk-sensitive review surfaces. (AGENTS.md:1, 94b1427fdfa1)
Scoped policy read: Read the scoped agent and extension policies; the touched agent runtime path needs source/test proof, and the Copilot test change stays within extension test boundaries. (src/agents/AGENTS.md:1, 94b1427fdfa1)
Current main classifier gap: Current main's replay-invalid classifier matches prior replay and ordering failures but has no thinking-signature-specific branch, matching the linked bug's source-level failure mode. (src/agents/embedded-agent-helpers/errors.ts:357, 94b1427fdfa1)
Classifier fix in PR head: The PR head adds a thinking-signature error regex gated by the word thinking, so generic invalid-signature errors are not automatically replay-invalid. (src/agents/embedded-agent-helpers/errors.ts:359, b65f2b8bda22)
Stream recovery fix in PR head: The PR head detects terminal assistant error events before any output, retries once without thinking blocks, and preserves the no-retry-after-output guard. (src/agents/embedded-agent-runner/thinking.ts:488, b65f2b8bda22)
Regression coverage: The PR adds classifier coverage for the exact wrapped Anthropic payload plus Bedrock wording and stream-wrapper coverage for retry, non-thinking errors, and no retry after output. (src/agents/embedded-agent-runner/thinking.test.ts:618, b65f2b8bda22)

Likely related people:

David: Current-main blame in this shallow checkout attributes the central agent helper, thinking recovery, model fallback, and Copilot hook test files to commit 778c4f9. (role: recent area contributor; confidence: medium; commits: 778c4f90b9b3; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-runner/thinking.ts, src/agents/embedded-agent-runner/model.ts)
Peter Steinberger: Commit 778c4f9 carries Peter Steinberger as a co-author in the current-main history for the same affected files. (role: adjacent owner; confidence: medium; commits: 778c4f90b9b3; files: src/agents/embedded-agent-runner/thinking.ts, src/agents/embedded-agent-runner/model.ts)
BryanTegomoh: The replacement branch preserves Bryan Tegomoh's initial classifier commit and source PR context for the linked bug, so they are relevant to reviewer questions about the reported provider error shape. (role: source fix contributor; confidence: medium; commits: 6a08869c8d68; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-30T13:37:42Z

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=b65f2b8bda2243e2a3806669c7370f62b3cb8667)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-05-31T12:11:30Z
Merge commit: fdf8dddf0af3

What merged:

The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.
PR surface: Source +57, Tests +177. Total +234 across 6 files.
Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Automerge notes:

PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures
PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors
PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures
PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

The automerge loop is complete.

Automerge progress:

2026-05-31 11:15:33 UTC repair finished 075951fffe0f (pushed) in 22m 15s Run: https://github.com/openclaw/clawsweeper/actions/runs/26710562264 repair_contributor_branch

2026-05-31 11:25:28 UTC review requested repair 075951fffe0f (structured ClawSweeper marker: fix-required (finding=security-review sha=075951...)

2026-05-31 11:25:41 UTC repair queued 075951fffe0f (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763

2026-05-31 11:27:32 UTC review queued 075951fffe0f (queued)

2026-05-31 11:28:24 UTC repair started (running) in 1s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 automerge-openclaw-openclaw-88072

2026-05-31 11:28:42 UTC validation plan (passed) in 18s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 pnpm check:changed; pnpm lint; pnpm check:test-types

2026-05-31 11:28:55 UTC Codex write preflight (passed) in 31s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 danger-full-access

2026-05-31 11:37:49 UTC Codex edit 1 1dbc0ce945f4 (complete) in 9m 26s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 exit 0

2026-05-31 11:44:27 UTC validation and review 1 b65f2b8bda22 (base moved) in 16m 4s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 rebased

2026-05-31 11:44:32 UTC repair completed b65f2b8bda22 (branch updated) in 16m 9s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 initial automerge rebase is delegated to Codex repair

2026-05-31 11:44:32 UTC review queued b65f2b8bda22 (after repair)

2026-05-31 11:52:33 UTC automerge wait b65f2b8bda22 (ready) in 24m 9s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 checks and exact-head review are ready

2026-05-31 12:11:17 UTC review passed b65f2b8bda22 (structured ClawSweeper verdict: pass (sha=b65f2b8bda2243e2a3806669c7370f62b3cb8...)

2026-05-31 11:52:34 UTC repair finished b65f2b8bda22 (pushed) in 24m 11s Run: https://github.com/openclaw/clawsweeper/actions/runs/26711287763 repair_contributor_branch

2026-05-31 12:04:25 UTC review queued b65f2b8bda22 (queued)

2026-05-31 12:11:33 UTC merged b65f2b8bda22 (merged by ClawSweeper automerge)

BryanTegomoh · 2026-05-30T17:48:58Z

Thanks for carrying this forward. I’m good with #88340 replacing #88072 as the canonical PR. Happy to help if maintainers prefer a fresh contributor branch.

Co-authored-by: Bryan Tegomoh, MD, MPH <67350434+BryanTegomoh@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>

…2 (validation-1)

Summary: - The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture. - PR surface: Source +57, Tests +177. Total +234 across 6 files. - Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures - PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors - PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807… Validation: - ClawSweeper review passed for head b65f2b8. - Required merge gates passed before the squash merge. Prepared head SHA: b65f2b8 Review: openclaw#88340 (comment) Co-authored-by: Bryan Tegomoh <bryan.tegomoh@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

clawsweeper Bot mentioned this pull request May 30, 2026

fix(agents): classify expired thinking signatures #88072

Closed

openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 30, 2026

clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 57c80d9 to 68a448a Compare May 30, 2026 13:56

openclaw-barnacle Bot added size: M and removed size: XS labels May 30, 2026

clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 68a448a to f1b1783 Compare May 30, 2026 14:04

clawsweeper Bot added size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 30, 2026

openclaw-barnacle Bot removed the size: XS label May 30, 2026

clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 30, 2026

openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 30, 2026

bryanbaer mentioned this pull request May 30, 2026

[Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry #88020

Closed

clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from f1b1783 to 075951f Compare May 31, 2026 11:14

openclaw-barnacle Bot added the extensions: copilot label May 31, 2026

BryanTegomoh and others added 6 commits May 31, 2026 11:44

fix(agents): classify expired thinking signatures

6a08869

fix(agents): classify expired thinking signatures

526173f

fix(agents): recover thinking signature stream errors

1d38a01

Co-authored-by: Bryan Tegomoh, MD, MPH <67350434+BryanTegomoh@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>

fix(agents): recover expired thinking signatures

f9dd08c

fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

4c3e241

…2 (validation-1)

fix(agents): classify expired thinking signatures

b65f2b8

clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 075951f to b65f2b8 Compare May 31, 2026 11:44

clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 31, 2026

clawsweeper Bot merged commit fdf8ddd into main May 31, 2026
151 of 153 checks passed

clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-88072 branch May 31, 2026 12:11

github-actions Bot mentioned this pull request May 31, 2026

📡 Upstream Digest — 2026-05-31 13:19 UTC curtismercier/openclaw-mods#988

Open

clawsweeper Bot mentioned this pull request Jun 4, 2026

fix(agents): strip stale compaction thinking signatures before Anthropic replay #90163

Merged

3 tasks

clawsweeper Bot mentioned this pull request Jun 10, 2026

Feature: Graceful error handling for LLM failures — never expose raw errors to users #39612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): classify expired thinking signatures#88340

fix(agents): classify expired thinking signatures#88340
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-88072

clawsweeper Bot commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 30, 2026 •

edited

Loading

Uh oh!

BryanTegomoh commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

clawsweeper Bot commented May 30, 2026

Uh oh!

clawsweeper Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BryanTegomoh commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 30, 2026 •

edited

Loading

clawsweeper Bot commented May 30, 2026 •

edited

Loading