Skip to content

fix(agents): classify expired thinking signatures#88340

Merged
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-88072
May 31, 2026
Merged

fix(agents): classify expired thinking signatures#88340
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-88072

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Makes #88072 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
Known failing checks:

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Inherited issue-closing references from the source PR:
Closes #88020

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against 57c80d9.

@clawsweeper clawsweeper Bot added agents Agent runtime and tooling size: XS clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. clawsweeper Tracked by ClawSweeper automation labels May 30, 2026
@clawsweeper

clawsweeper Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: passed. Reviewed May 31, 2026, 8:10 AM ET / 12:10 UTC.

Summary
The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error events without thinking blocks before output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.

PR surface: Source +57, Tests +177. Total +234 across 6 files.

Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid match, and the linked report supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Optional maintainer confidence booster: add redacted live Anthropic or Bedrock expiry proof if the session-state risk needs evidence beyond the current tests.

Risk before merge

  • [P1] Merging intentionally changes session continuation for provider errors matching thinking-signature wording; tests cover terminal stream errors and no-retry-after-output, but not a live 45-60 minute provider-side signature expiry session.
  • [P2] Static catalog fallback model params will now flow through fallback resolution, so existing fallback users could see provider request-parameter changes after upgrade.
  • [P1] The replay-invalid classifier expands recovery eligibility; the tests constrain generic invalid-signature text, but maintainers still own the provider-error classification boundary.

Maintainer options:

  1. Land With Bounded Recovery Risk (recommended)
    Proceed once exact-head checks are green because the branch has focused tests for the classifier and retry boundaries while maintainers accept the missing live-expiry proof.
  2. Pause For Live Provider Proof
    Ask for a redacted long-running Anthropic or Bedrock expiry proof if maintainers want evidence beyond the terminal classifier proof and simulated stream-recovery tests.
  3. Narrow The Fallback Params Change
    If maintainers do not want provider fallback behavior included in this bug fix, split or remove the static-catalog params change before merge.

Next step before merge

  • [P2] No repair lane is needed because the reviewed head has no blocking findings; remaining action is exact-head checks, mergeability, and maintainer risk acceptance.

Security
Cleared: The diff does not change workflows, dependencies, credentials, permissions, install scripts, or other code-execution supply-chain surfaces.

Review details

Best possible solution:

Land this replacement branch if exact-head checks stay green and maintainers accept the bounded provider/session risk, then close the linked bug as fixed by the merged PR.

Do we have a high-confidence way to reproduce the issue?

Yes for the classifier boundary: current main lacks a thinking-signature replay-invalid match, and the linked report supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Is this the best way to solve the issue?

Yes; the branch fixes the narrow classifier and stream-recovery gates and adds regression coverage for positive and negative cases. The remaining question is maintainer acceptance of the bounded provider/session merge risk, not a clear code defect.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 94b1427fdfa1.

Label changes

Label justifications:

  • P1: The linked bug breaks long-running Anthropic thinking sessions by hard-failing instead of recovering, which is an urgent agent workflow regression.
  • merge-risk: 🚨 compatibility: The branch changes replay recovery and fallback model parameter behavior that existing users may observe during upgrades.
  • merge-risk: 🚨 auth-provider: The fallback params change affects provider/model request resolution for static catalog fallback users.
  • merge-risk: 🚨 session-state: The recovery path strips thinking blocks and retries a session turn, which can change continuation behavior after provider-side signature errors.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The source PR body provides terminal output showing the issue payload classifies as replay-invalid after the patch while generic invalid-signature text stays unclassified; live provider expiry proof is explicitly not included.
  • proof: sufficient: Contributor real behavior proof is sufficient. The source PR body provides terminal output showing the issue payload classifies as replay-invalid after the patch while generic invalid-signature text stays unclassified; live provider expiry proof is explicitly not included.
Evidence reviewed

PR surface:

Source +57, Tests +177. Total +234 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 3 70 13 +57
Tests 3 177 0 +177
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 247 13 +234

What I checked:

  • Root policy read: Read the full repository AGENTS.md and applied its guidance that provider routing, fallback behavior, auth/session state, and compatibility paths are merge-risk-sensitive review surfaces. (AGENTS.md:1, 94b1427fdfa1)
  • Scoped policy read: Read the scoped agent and extension policies; the touched agent runtime path needs source/test proof, and the Copilot test change stays within extension test boundaries. (src/agents/AGENTS.md:1, 94b1427fdfa1)
  • Current main classifier gap: Current main's replay-invalid classifier matches prior replay and ordering failures but has no thinking-signature-specific branch, matching the linked bug's source-level failure mode. (src/agents/embedded-agent-helpers/errors.ts:357, 94b1427fdfa1)
  • Classifier fix in PR head: The PR head adds a thinking-signature error regex gated by the word thinking, so generic invalid-signature errors are not automatically replay-invalid. (src/agents/embedded-agent-helpers/errors.ts:359, b65f2b8bda22)
  • Stream recovery fix in PR head: The PR head detects terminal assistant error events before any output, retries once without thinking blocks, and preserves the no-retry-after-output guard. (src/agents/embedded-agent-runner/thinking.ts:488, b65f2b8bda22)
  • Regression coverage: The PR adds classifier coverage for the exact wrapped Anthropic payload plus Bedrock wording and stream-wrapper coverage for retry, non-thinking errors, and no retry after output. (src/agents/embedded-agent-runner/thinking.test.ts:618, b65f2b8bda22)

Likely related people:

  • David: Current-main blame in this shallow checkout attributes the central agent helper, thinking recovery, model fallback, and Copilot hook test files to commit 778c4f9. (role: recent area contributor; confidence: medium; commits: 778c4f90b9b3; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-runner/thinking.ts, src/agents/embedded-agent-runner/model.ts)
  • Peter Steinberger: Commit 778c4f9 carries Peter Steinberger as a co-author in the current-main history for the same affected files. (role: adjacent owner; confidence: medium; commits: 778c4f90b9b3; files: src/agents/embedded-agent-runner/thinking.ts, src/agents/embedded-agent-runner/model.ts)
  • BryanTegomoh: The replacement branch preserves Bryan Tegomoh's initial classifier commit and source PR context for the linked bug, so they are relevant to reviewer questions about the reported provider error shape. (role: source fix contributor; confidence: medium; commits: 6a08869c8d68; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 30, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 30, 2026
@clawsweeper

clawsweeper Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=b65f2b8bda2243e2a3806669c7370f62b3cb8667)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-05-31T12:11:30Z
Merge commit: fdf8dddf0af3

What merged:

  • The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.
  • PR surface: Source +57, Tests +177. Total +234 across 6 files.
  • Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures
  • PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors
  • PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures
  • PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

The automerge loop is complete.

Automerge progress:

  • 2026-05-31 11:25:28 UTC review requested repair 075951fffe0f (structured ClawSweeper marker: fix-required (finding=security-review sha=075951...)
  • 2026-05-31 11:27:32 UTC review queued 075951fffe0f (queued)
  • 2026-05-31 11:44:32 UTC review queued b65f2b8bda22 (after repair)
  • 2026-05-31 12:11:17 UTC review passed b65f2b8bda22 (structured ClawSweeper verdict: pass (sha=b65f2b8bda2243e2a3806669c7370f62b3cb8...)
  • 2026-05-31 12:04:25 UTC review queued b65f2b8bda22 (queued)
  • 2026-05-31 12:11:33 UTC merged b65f2b8bda22 (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 57c80d9 to 68a448a Compare May 30, 2026 13:56
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels May 30, 2026
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 68a448a to f1b1783 Compare May 30, 2026 14:04
@clawsweeper clawsweeper Bot added size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 30, 2026
@clawsweeper clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 30, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 30, 2026
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 30, 2026
@BryanTegomoh

Copy link
Copy Markdown
Contributor

Thanks for carrying this forward. I’m good with #88340 replacing #88072 as the canonical PR. Happy to help if maintainers prefer a fresh contributor branch.

@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from f1b1783 to 075951f Compare May 31, 2026 11:14
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels May 31, 2026
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-88072 branch from 075951f to b65f2b8 Compare May 31, 2026 11:44
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 31, 2026
@clawsweeper clawsweeper Bot merged commit fdf8ddd into main May 31, 2026
151 of 153 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-88072 branch May 31, 2026 12:11
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 1, 2026
Summary:
- The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.
- PR surface: Source +57, Tests +177. Total +234 across 6 files.
- Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors
- PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

Validation:
- ClawSweeper review passed for head b65f2b8.
- Required merge gates passed before the squash merge.

Prepared head SHA: b65f2b8
Review: openclaw#88340 (comment)

Co-authored-by: Bryan Tegomoh <bryan.tegomoh@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
Summary:
- The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.
- PR surface: Source +57, Tests +177. Total +234 across 6 files.
- Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors
- PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

Validation:
- ClawSweeper review passed for head b65f2b8.
- Required merge gates passed before the squash merge.

Prepared head SHA: b65f2b8
Review: openclaw#88340 (comment)

Co-authored-by: Bryan Tegomoh <bryan.tegomoh@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Summary:
- The branch adds thinking-signature replay-invalid classification, retries matching terminal stream-error eve ... output, preserves static fallback model params, and updates related tests including a Copilot hook fixture.
- PR surface: Source +57, Tests +177. Total +234 across 6 files.
- Reproducibility: yes. for the classifier boundary: current main lacks a thinking-signature replay-invalid ma ... ort supplies the exact provider error payload. The time-dependent live expiry path was not reproduced here.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): classify expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(agents): recover thinking signature stream errors
- PR branch already contained follow-up commit before automerge: fix(agents): recover expired thinking signatures
- PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8807…

Validation:
- ClawSweeper review passed for head b65f2b8.
- Required merge gates passed before the squash merge.

Prepared head SHA: b65f2b8
Review: openclaw#88340 (comment)

Co-authored-by: Bryan Tegomoh <bryan.tegomoh@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation extensions: copilot merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: REPLAY_INVALID_RE missing Anthropic 'Invalid signature in thinking block' — hard session failure instead of recovery retry

1 participant