fix(compressor): keep truncated tool_call arguments as valid JSON#11617
fix(compressor): keep truncated tool_call arguments as valid JSON#11617handsdiff wants to merge 1 commit into
Conversation
cdeaa3c to
1c7cd8b
Compare
1c7cd8b to
45219d9
Compare
When _prune_old_tool_results truncates a large tool_call arguments
string, it was byte-slicing to 200 chars and appending the literal
sentinel "...[truncated]". The resulting string is not valid JSON
(unterminated mid-value), and some providers — notably Anthropic via
LiteLLM — re-parse function.arguments when rebuilding tool_use
blocks. The next API call fails with HTTP 400 ("Unterminated string
starting at: line 1 column 35"), and every subsequent turn replays
the same corrupted history, so the session is permanently stuck.
Observed in the wild: a session that compressed with 11 prior patch
tool_calls (each >500 chars) then returned HTTP 400 on every inbound
message. The 3 retries are identical payloads, so they all fail the
same way. The user sees only a canned error fallback.
Replace the byte slice with a compact JSON sentinel that preserves
provenance (original length + 200-char preview) while staying
parseable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
45219d9 to
a692c39
Compare
|
Closing as duplicate of PR #12259 (merged). You independently diagnosed the same bug on Apr 17 within hours of #11788 and #11821 — thanks for the solid root-cause analysis and the smallest-diff approach. We went with #11788's structural-shrink approach because it preserves arg shape (paths, ints, lists stay intact) instead of replacing with a sentinel object, but your diagnosis was spot-on. Credited in the salvage PR body. |
|
Confirming this is superseded by upstream commit Upstream's fix introduces a |
…branch-off-upstream - Adds NousResearch#11617 (compressor fix) and the two pending MCP PRs to the PR table. - Documents the root cause that bit three PR branches: cutting feature branches from fork main (an octopus merge) instead of upstream/main. - Extends the rebase workflow to the three new branches. - Documents the integrations tool as a fork-only change (exe.dev-specific). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…flow PR NousResearch#11617 (compressor tool-call args valid JSON) was closed Apr 18; upstream commit 3128d9f landed a strictly-better fix via _truncate_tool_call_args_json, which shrinks string values inside the JSON blob instead of replacing the whole payload with a sentinel. Rebasing fix/compressor-tool-args-valid-json on upstream/main leaves zero commits ahead. Move to the closed/superseded section, remove from the rebase loop and merge sequence so future rebuilds don't try to merge an empty branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
agent/context_compressor.pyPass 3 of_prune_old_tool_resultstruncates large tool_callargumentsstrings by byte-slicing to 200 chars and appending the literal sentinel...[truncated]. The result is invalid JSON — mid-value, unterminated string.Some providers (notably Anthropic via LiteLLM) re-parse
function.argumentswhen rebuildingtool_useblocks. Once a session is compressed with any tool_call whose original args exceeded 500 chars (routine for tools likepatch), the next API call fails with:All 3 retries send the same corrupted messages array so they fail identically, and every subsequent user turn replays the poisoned history. The session is permanently stuck and the user sees only a canned error fallback with no way to recover (the corruption is in server-side state).
Reproduction
Any session that compresses while carrying a >500-char assistant tool_call will produce invalid-JSON
arguments. On the next turn, the provider adapter (e.g. litellm's Anthropic translator) cannot re-parse the tool_call and rejects the request.Fix
Replace the byte slice with a compact JSON-sentinel object that preserves provenance (original length + 200-char preview) while staying parseable:
{"_truncated": true, "_original_length": 1247, "_preview": "..."}Test plan
test_pruned_tool_call_args_remain_valid_jsonasserting pruned arguments parse viajson.loadsand carry the sentinel shapeTestTokenBudgetTailProtectiontests still passObserved in production
Caught after a user's agent became unresponsive. A session had 11 prior
patchtool_calls poisoned by this truncation after compression. Every subsequent inbound message triggered a new 3-retry failure cycle — user received 3 cryptic LiteLLM 400 error strings back-to-back.