Skip to content

fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn#35846

Closed
fesalfayed wants to merge 1 commit into
NousResearch:mainfrom
fesalfayed:fix/anthropic-thinking-sig-orphan-strip
Closed

fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn#35846
fesalfayed wants to merge 1 commit into
NousResearch:mainfrom
fesalfayed:fix/anthropic-thinking-sig-orphan-strip

Conversation

@fesalfayed

@fesalfayed fesalfayed commented May 31, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes a non-retryable HTTP 400 crash-loop that occurs with extended-thinking Claude models (4.6+, e.g. Opus 4.8) when a parallel tool batch is interrupted before every tool result returns.

Extended-thinking models emit a signed thinking block on assistant turns that also fire tool_use blocks. Anthropic signs that block against the full, original turn content. When the next request replays that turn, the signed block must be passed back byte-for-byte — if the turn was modified, Anthropic rejects it:

messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were
in the original response.

_strip_orphaned_tool_blocks() legitimately removes a tool_use whose matching tool_result never arrived (parallel batch interrupted, context compression, session truncation). But that mutates the latest assistant turn, and _manage_thinking_signatures() then replays the now-stale signed thinking block verbatim → HTTP 400. The error is classified non-retryable, so the gateway falls back / retries and reloads the same poisoned transcript from the persisted store every turn — an infinite crash-loop with no self-recovery (a soft session reset does not clear it, because history is rebuilt from the store). The drifting content index in the error message is simply the changing count of stripped tool_use blocks across rebuilds.

This is a clean reproduction: turn = [thinking(signed), tool_use_A, tool_use_B], only tool_result_A comes back → tool_use_B is stripped → signature over the original 3-block turn is now dead.

Related Issue

Fixes #35847
Related (separate bug, also filed): #35848 — the fallback path raises 'NoneType' object is not iterable downstream of this 400.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • agent/anthropic_adapter.py
    • _strip_orphaned_tool_blocks(): when stripping an orphaned tool_use mutates a turn that also carries a thinking/redacted_thinking block, set an internal _thinking_signature_invalidated flag on that message (its signature is now computed against content that no longer exists).
    • _merge_consecutive_roles(): propagate the flag onto the surviving (prev) dict when consecutive assistant messages are merged, so it isn't lost before signature management runs.
    • _manage_thinking_signatures(): in the latest-assistant branch, when the flag is set, demote all thinking blocks on that turn to plain text blocks (preserving the reasoning text) instead of replaying a dead signature. An intact turn is unaffected — its signed thinking is still replayed verbatim. The internal flag is stripped before the payload is sent.
  • tests/agent/test_anthropic_adapter.py
    • test_orphan_stripped_tool_use_demotes_dead_signed_thinking — regression for the crash-loop: orphaned parallel tool_use stripped → signed thinking demoted to text, reasoning preserved, answered tool_use survives, internal flag never leaks.
    • test_signed_thinking_preserved_when_no_tool_use_stripped — control: an intact latest turn keeps its signed thinking verbatim (guards against the fix over-firing).

How to Test

  1. pytest tests/agent/test_anthropic_adapter.py -k "thinking or signature or orphan or merge or preserved or redacted" -q → 27 passed.
  2. Targeted: pytest tests/agent/test_anthropic_adapter.py -k "orphan_stripped or no_tool_use_stripped" -q → 2 passed.
  3. Repro (pre-fix): build an assistant turn with one signed thinking block + two tool_use blocks, supply only one tool_result, and call convert_messages_to_anthropic. Before the fix, the latest turn keeps the signed thinking block whose signature no longer matches the stripped content (Anthropic 400s on replay). After the fix, the block is demoted to text and the turn replays cleanly.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(anthropic): …)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run the relevant suite and tests pass (3 unrelated TestRunOauthSetupToken failures are pre-existing on the base commit — a MagicMock/subprocess mock issue, not touched by this PR)
  • I've added tests for my changes
  • I've tested on my platform: macOS 26.4 (Apple Silicon), Python 3.11.15

Documentation & Housekeeping

  • N/A — no documentation/config/tool-schema changes; behavior fix only, fully covered by existing docstrings updated inline.

…tes the latest turn

Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.

When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:

    messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
    assistant message cannot be modified. These blocks must remain as they were
    in the original response.

Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.

Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.

Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
@fesalfayed

Copy link
Copy Markdown
Author

Closing in favor of #35855 — same fix, minimal in-style change (drops the now-stale thinking blocks in the orphan-strip path, mirroring the existing _merge_consecutive_roles behavior) instead of the heavier flag-threading approach here.

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #35859 — your commit was cherry-picked onto current main with your authorship preserved in git log (commit 64628ea).

Verified the fix end-to-end on real convert_messages_to_anthropic (not mocks): the orphan-stripped latest turn demotes the dead signed thinking block to text with reasoning preserved, the answered tool_use survives, and an intact turn still replays its signed thinking verbatim (no over-firing). Full CI matrix green (19/19).

Excellent bug report and clean, well-scoped fix — the flag-propagation through _merge_consecutive_roles was the subtle part and you got it right. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround provider/anthropic Anthropic native Messages API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: extended-thinking + interrupted parallel tool batch → non-retryable HTTP 400 crash-loop (stale thinking signature)

3 participants