Skip to content

Session context corruption: orphaned tool_use ID causes permanent 400 loop after abort #37834

@bryanbaer

Description

@bryanbaer

Bug Report

OpenClaw Version: 2026.3.2 (inferred from recent upgrade)
Session ID: 2d882ee3-8503-4685-8eff-1cffc1c5835c
Provider/Model: anthropic/claude-sonnet-4-6
Date: 2026-03-06 ~12:28 EST


Summary

A long-running EPIC-002 Hive build session entered a permanent, unrecoverable error loop after being aborted mid-generation. Every subsequent request to the session returned HTTP 400 from the Anthropic API with the same error about an orphaned tool_use ID. The session had to be force-reset (.reset file created at 2026-03-06T13:06:47Z). All other concurrent sessions were unaffected.


Error Message

400 {"type":"error","error":{"type":"invalid_request_error","message":"messages.259: `tool_use` ids were found without `tool_result` blocks immediately after: exec1772800125283396. Each `tool_use` block must have a corresponding `tool_result` block in the next message."}}

Root Cause (Reconstructed from Session JSONL)

The session JSONL (2d882ee3-8503-4685-8eff-1cffc1c5835c.jsonl) contained 291 entries. Timeline of failure:

JSONL Entry Timestamp Role Type Notes
266 12:28:45 UTC assistant toolCall:exec (id=exec_1772800125283_396) Exec tool called successfully
267 12:28:45 UTC toolResult result text Tool result received
268 12:28:51 UTC assistant ABORTED, content=[] Session aborted mid-generation (empty content)
269 12:28:51 UTC user new inbound message Next user turn
270–278 12:28:52→13:06:37 UTC assistant stop=error (x5) All subsequent requests fail with same 400

The abort at entry 268 resulted in an empty content array being persisted to the JSONL. When OpenClaw reconstructs the session context for the Anthropic API, the serialized history creates a position (messages.259 in API context, 0-indexed) where exec1772800125283396 appears as a tool_use block without an immediately following tool_result block.

Notably, entry 260 already contained text:[openclaw] missing tool result in session history, indicating OpenClaw's placeholder injection mechanism was active — but it did not catch this second abort, or the placeholder was applied to an earlier orphan while the 12:28:51 abort created a new one.


Impact

  • Session rendered permanently unrecoverable (100% 400 error rate on every request)
  • User unable to resume active work context; required manual force-reset
  • All EPIC-002 in-progress session state lost
  • Non-trivial to diagnose/recover without digging into raw JSONL

Reproduction Steps

  1. Start a long-running session with heavy tool calls (exec/process in particular)
  2. Abort/interrupt the session mid-stream (connection drop, TUI disconnect) while an exec toolCall is the last item in the assistant's streaming buffer
  3. Session is persisted with an empty content array [] for the aborted assistant message
  4. Send any new message — Anthropic returns 400 on all future requests to the session
  5. Note: the [openclaw] missing tool result placeholder may already have fired for a prior orphan; this new abort creates a second orphan that is not repaired

Expected Behavior

When an abort produces an empty assistant content array ([]) after a tool_use entry:

  • OpenClaw's context reconstruction should detect the preceding tool_use has no matching tool_result
  • Inject a synthetic error result: {"type":"tool_result","tool_use_id":"...","content":"[session interrupted]","is_error":true}
  • Session should remain fully operational after an abort, not brick

Suggested Fix

In the session context serializer (the code that reconstructs message history for the Anthropic API):

  1. After reconstruction, walk the serialized message array
  2. For every assistant message containing a tool_use block, verify the immediately following message contains a tool_result with the matching tool_use_id
  3. If not, inject a synthetic error tool_result before the next user message
  4. Ensure this validation runs on every API context reconstruction, not only on explicit missing tool result events
  5. Consider also: do not persist empty-content assistant messages (content=[]) to JSONL on abort — or flag them so the reconstructor knows to handle them specially

Workaround

Force-reset the session via the TUI or by renaming/deleting the session JSONL. This loses all session context.


Environment

  • OS: macOS Darwin 25.3.0 (arm64)
  • Node: v25.6.1
  • Provider: Anthropic (claude-sonnet-4-6)
  • Session length at failure: 279 message entries, 268 before error loop
  • Concurrent sessions: unaffected (confirms session-scoped, not global corruption)

Filed by Cypher (OpenClaw Brain) on behalf of Bryan (Principal Systems Architect)

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions