fix(embedded-agent): treat stale thinking-block signature as replay-invalid#88025
fix(embedded-agent): treat stale thinking-block signature as replay-invalid#880250xghost42 wants to merge 1 commit into
Conversation
…nvalid Anthropic rejects the next call with 'Invalid signature in thinking block' once early extended-thinking signatures expire on a long session. REPLAY_INVALID_RE did not match it, so classifyProviderRuntimeFailureKind fell through and the session hard-failed instead of routing to replay recovery (strip stale thinking blocks and retry). Add the pattern so the message classifies as replay_invalid.
|
Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 9:20 AM ET / 13:20 UTC. Summary PR surface: Source 0, Tests +19. Total +19 across 2 files. Reproducibility: yes. for the PR defect: evaluating the proposed regex against the linked issue's quoted Anthropic error payload shows it does not match. I did not establish a live long-session Anthropic reproduction in this read-only review. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Update the classifier and regression test to cover the exact quoted Anthropic payload from #88020, then provide real after-fix behavior proof or get an explicit maintainer proof override. Do we have a high-confidence way to reproduce the issue? Yes for the PR defect: evaluating the proposed regex against the linked issue's quoted Anthropic error payload shows it does not match. I did not establish a live long-session Anthropic reproduction in this read-only review. Is this the best way to solve the issue? No: the approach is narrow, but it needs to match both quoted and unquoted provider wording and prove the exact linked payload. The safer repair is to broaden the alternation around optional punctuation/backticks and add the exact reported payload as a regression case. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: reasoning high; reviewed against 7fb91317ba2d. Label changesLabel justifications:
Evidence reviewedPR surface: Source 0, Tests +19. Total +19 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Summary
Closes #88020.
When a session with extended thinking runs long enough that early thinking-block signatures expire, Anthropic rejects the next API call with
Invalid signature in thinking block.REPLAY_INVALID_REinsrc/agents/embedded-agent-helpers/errors.tsdid not match that wording, soclassifyProviderRuntimeFailureKindreturned a non-replay kind and the session hard-failed instead of routing to replay recovery (strip the stale thinking blocks and retry).Add an
\binvalid signature\b.*?\bthinking block\balternation toREPLAY_INVALID_REso the message classifies asreplay_invalidand the existing recovery path runs.This is a generic product rule (matches the provider error class, not a hardcoded one-off string); the regex is intentionally scoped to the signature/thinking-block pairing to avoid false positives.
Verification
pnpm test src/agents/embedded-agent-helpers/errors.test.ts— 6 passed (incl. 2 new cases for the bare phrase and themessages.N.content.M:framed variant).Real behavior proof
Behavior addressed: Anthropic
Invalid signature in thinking blocknow classifies asreplay_invalidso the session recovers (strip stale thinking + retry) instead of hard-failing.Real environment tested: Unit-level via the project test runner (Vitest) against
classifyProviderRuntimeFailureKind; no paid live Anthropic call.Exact steps or command run after this patch:
pnpm test src/agents/embedded-agent-helpers/errors.test.tsEvidence after fix:
Test Files 1 passed (1) / Tests 6 passed (6); new assertionsclassifyProviderRuntimeFailureKind("Invalid signature in thinking block") === "replay_invalid"and the framed-variant both pass.Observed result after fix: classifier returns
replay_invalidfor the stale-signature message; prior behavior returned a non-replay kind (no recovery).What was not tested: a full live long-session reproduction against the Anthropic API (would require an expired-signature session); the change is confined to the error-classification regex and is covered by the unit assertions.