Skip to content

fix(session-repair): strip malformed tool_use blocks to prevent permanent session corruption#6667

Closed
NSEvent wants to merge 3 commits intoopenclaw:mainfrom
NSEvent:fix/session-corruption-malformed-tool-use
Closed

fix(session-repair): strip malformed tool_use blocks to prevent permanent session corruption#6667
NSEvent wants to merge 3 commits intoopenclaw:mainfrom
NSEvent:fix/session-corruption-malformed-tool-use

Conversation

@NSEvent
Copy link

@NSEvent NSEvent commented Feb 1, 2026

Summary

  • Strips malformed tool_use/toolCall/functionCall blocks from assistant messages BEFORE the existing pairing repair runs
  • Adds droppedMalformedToolUseCount to the repair report for observability
  • Prevents creating synthetic error results for blocks that were never valid tool calls

Problem

When tool calls are interrupted (by error, timeout, content filtering, or process termination), sessions become permanently corrupted. Every subsequent API request fails with:

  • unexpected tool_use_id found in tool_result blocks
  • tool result's tool id not found (2013)

Root cause: The existing extractToolCallsFromAssistant() skips malformed blocks (missing id) but leaves them in the message content. The blocks remain in the transcript, causing API rejections.

Solution

Add a pre-processing step that strips malformed tool_use blocks before the pairing repair runs:

Malformed conditions detected:

  • Missing or empty id field (tool call wasn't fully initialized)
  • Has partialJson field (Anthropic SDK streaming artifact)
  • Has partial field set to true (generic streaming indicator)
  • Has incomplete field set to true (OpenAI-style indicator)

The name field is intentionally NOT required - extractToolCallsFromAssistant already handles missing names gracefully by defaulting to undefined.

Test plan

  • Added comprehensive tests for malformed block detection
  • Existing tests pass (pnpm test src/agents/session-transcript-repair.test.ts)
  • Full test suite passes (pnpm test)

Fixes #5497, #5481, #5430, #5518

🤖 Generated with Claude Code

Greptile Overview

Greptile Summary

This PR hardens session transcript repair by stripping malformed tool_use/toolCall/functionCall blocks from assistant message content before running the existing tool call/result pairing repair. It also surfaces droppedMalformedToolUseCount in the repair report and logs a warning in the Google embedded runner when malformed blocks were removed, improving observability for the permanent-session-corruption failures reported in related issues.

The new logic lives in src/agents/session-transcript-repair.ts and is exercised by expanded unit tests in src/agents/session-transcript-repair.test.ts. src/agents/pi-embedded-runner/google.ts is updated to use the richer repairToolUseResultPairing report rather than only returning sanitized messages.

Confidence Score: 4/5

  • This PR is generally safe to merge and addresses a real transcript-corruption failure mode.
  • Changes are localized to transcript sanitization, preserve message order/metadata, and add comprehensive tests around the new malformed-block stripping behavior. Remaining risk is mainly around edge-case block shapes (e.g., optional properties like partialJson existing with an undefined value) that could cause valid tool calls to be stripped unexpectedly.
  • src/agents/session-transcript-repair.ts (malformed block detection heuristics)

Context used:

  • Context from dashboard - CLAUDE.md (source)
  • Context from dashboard - AGENTS.md (source)

NSEvent and others added 3 commits January 31, 2026 09:21
…uption

When tool calls are interrupted (by error, timeout, content filtering, or
process termination), sessions can become permanently corrupted. Every
subsequent API request fails with errors like:
- "unexpected tool_use_id found in tool_result blocks"
- "tool result's tool id not found (2013)"

Root cause: extractToolCallsFromAssistant() skips malformed tool_use blocks
but leaves them in the message content. The blocks remain in the transcript
causing API rejections.

Fix: Strip malformed tool_use blocks (missing id, missing name, or with
partialJson field) BEFORE the pairing repair runs. This prevents creating
synthetic results for invalid blocks and allows sessions to auto-recover.

Fixes openclaw#5497, openclaw#5481, openclaw#5430, openclaw#5518

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add additional streaming/partial indicators beyond partialJson:
- partial === true (generic streaming indicator)
- incomplete === true (OpenAI-style indicator)

This ensures we catch malformed tool_use blocks from all provider
SDK shapes, not just Anthropic's partialJson field.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle bot added the agents Agent runtime and tooling label Feb 1, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +34 to +37
if (rec.type !== "toolCall" && rec.type !== "toolUse" && rec.type !== "functionCall") {
return true;
}
// Malformed: missing/invalid id (tool call wasn't fully initialized)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] isValidToolUseBlock treats partialJson !== undefined as malformed, which will also strip tool calls where partialJson is present but explicitly undefined. If any upstream serializer includes partialJson: undefined (or the property exists with an undefined value), valid tool calls could get dropped.

Consider checking for property presence instead (e.g. "partialJson" in rec) or checking for a non-empty string, depending on the actual shape of streaming artifacts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/session-transcript-repair.ts
Line: 34:37

Comment:
[P1] `isValidToolUseBlock` treats `partialJson !== undefined` as malformed, which will also strip tool calls where `partialJson` is present but explicitly `undefined`. If any upstream serializer includes `partialJson: undefined` (or the property exists with an undefined value), valid tool calls could get dropped.

Consider checking for property presence instead (e.g. `"partialJson" in rec`) or checking for a non-empty string, depending on the actual shape of streaming artifacts.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +45 to +49
if (rec.partial === true) {
return false;
}
if (rec.incomplete === true) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The partial/incomplete checks only drop when the value is the boolean true. If some providers encode these as strings (e.g. "true") or numbers, they won’t be stripped. If you expect loosely-typed JSON from session logs, you may want to treat any truthy value as malformed (or explicitly handle common encodings).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/session-transcript-repair.ts
Line: 45:49

Comment:
[P3] The `partial`/`incomplete` checks only drop when the value is the boolean `true`. If some providers encode these as strings (e.g. `"true"`) or numbers, they won’t be stripped. If you expect loosely-typed JSON from session logs, you may want to treat any truthy value as malformed (or explicitly handle common encodings).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +351 to +359
let repairedTools = sanitizedThinking;
if (policy.repairToolUseResultPairing) {
const report = repairToolUseResultPairing(sanitizedThinking);
repairedTools = report.messages;
if (report.droppedMalformedToolUseCount > 0) {
log.warn(
`session repair: stripped ${report.droppedMalformedToolUseCount} malformed tool_use block(s)`,
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The warning log runs only when report.droppedMalformedToolUseCount > 0, but the report also contains other useful counters (added, droppedDuplicateCount, droppedOrphanCount, moved). If debugging transcript issues, logging the whole report (or at least non-zero counters) could make diagnosing pairing repairs easier.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/google.ts
Line: 351:359

Comment:
[P3] The warning log runs only when `report.droppedMalformedToolUseCount > 0`, but the report also contains other useful counters (`added`, `droppedDuplicateCount`, `droppedOrphanCount`, `moved`). If debugging transcript issues, logging the whole report (or at least non-zero counters) could make diagnosing pairing repairs easier.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orphaned tool_result after mid-stream assistant error causes permanent session breakage

1 participant