[Bug]: Compaction re-injection produces stale thinking signatures → Anthropic API rejection

### Bug type

Crash (process/app exits or hangs)

### Beta release blocker

No

### Summary

Session compaction re-injects pre-compaction assistant messages with their original `thinkingSignature` values into a new conversation prefix the signatures don't match, causing every subsequent Anthropic API call to fail with `Invalid signature in thinking block` and permanently stalling the session.

### Steps to reproduce

1. In agent config, enable extended thinking by default (e.g., `thinkingDefault: "extended"`) or enable it per-session.

2. Run the session through enough turns that (a) at least one assistant message accumulates a thinking block with a non-blank `thinkingSignature`, and (b) the context fills enough to trigger automatic compaction. In observed incidents this happened after ~120+ messages over ~60 minutes.

3. Send any prompt immediately after compaction completes.

4. The Anthropic API returns `Invalid signature in thinking block at messages.N.content.M`. The session is permanently stuck — the error persists across gateway restarts because the corrupt context is in the `.jsonl` session file.

Observed 5+ times on OpenClaw 2026.6.1 (2e08f0f) with `claude-sonnet-4`. First observed on 2026.5.27 (27ae826).

### Expected behavior

After compaction, the next Anthropic API call should succeed and the session should remain usable — i.e., compaction is a transparent operation from the agent's perspective. Grounded reference: OpenClaw already implements `stripInvalidThinkingSignatures` (in `dist/compaction-successor-transcript-CUmEvaGX.js`) as a defense to prevent invalid-signature thinking blocks from reaching the Anthropic API. The existence of this function establishes that the maintainers' intended behavior is for invalid thinking signatures to be removed before submission, not sent. The current implementation's predicate `hasReplayableThinkingSignature` only classifies signatures as invalid when they are absent or blank, missing the contextually-stale-but-non-blank case produced by compaction re-injection — but the architectural intent is clear from the existing defense.

### Actual behavior

After session compaction completed, the next Anthropic API call failed with:

    Invalid signature in thinking block at messages.N.content.M

Every subsequent API call in that session returned the same error — the session became permanently unable to communicate with the Anthropic API. Restarting the gateway did not recover the session; the error persisted across restarts because the corrupted context is stored in the `.jsonl` session file and reloaded identically on boot.

Cited evidence from the session JSONL:
- Lines 254–266 (post-compaction injected entries) carry `thinkingSignature` values that are byte-for-byte identical to their originals at lines 125–137 (pre-compaction). The signatures were preserved exactly, not corrupted in transit.
- The original and re-injected copies were separated by approximately 63 minutes in the session timeline.

Observed 5+ times on OpenClaw 2026.6.1 (2e08f0f) with `claude-sonnet-4`. First observed on 2026.5.27 (27ae826).

### OpenClaw version

2026.6.1 

### Operating system

MacOS Sonoma 14.7.1

### Install method

npm global (npm install -g openclaw / Homebrew — installed at /opt/homebrew/lib/node_modules/openclaw)

### Model

anthropic/claude-sonnet-4-6

### Provider / routing chain

openclaw → anthropic (direct, no proxy or gateway intermediary)

### Additional provider/model setup details

hosted locally on mac studio

### Logs, screenshots, and evidence

```shell

```

### Impact and severity

completely blocked all processes, forced a manual restart via terminal with multiple hours of troubleshooting to come up with a work around fix.

### Additional information

When OpenClaw compacts a session that contains extended-thinking
assistant turns, it
re-injects the kept pre-compaction messages as new JSONL entries into
the post-compaction
context. These messages carry original `thinkingSignature` values that
were cryptographically
bound to the pre-compaction conversation prefix. The new context has a
different prefix, so
Anthropic's server-side validation rejects the signatures with
`Invalid signature in thinking
block`. The session enters a permanent error loop that survives
gateway restarts.

This is a confirmed root cause (H1a), not a hypothesis.

---

## Environment

- **OpenClaw version:** 2026.6.1 (2e08f0f) — also present in 2026.5.27 (27ae826)
- **OS:** macOS (arm64)
- **Model:** claude-sonnet-4 / claude-opus-4 (any Anthropic model with
extended thinking)
- **Provider:** Anthropic (Messages API)

---

## Symptoms

**Error message (from Anthropic API):**

```
Invalid signature in thinking block at messages.N.content.M
```

(where N and M are wire-format indices into the replayed conversation)

**When it occurs:**
- On the first API call after a session compaction event, when the
session has accumulated
  assistant turns with extended-thinking blocks (`type: "thinking"`
with `thinkingSignature`
  present and non-blank)

**Effect:**
- The session is permanently stuck. Every subsequent API call replays
the same invalid
  conversation context and receives the same error.
- The error survives gateway restarts because the corrupt context is
stored in the `.jsonl`
  session file and reloaded identically.
- `stripInvalidThinkingSignatures` does not help — it only strips
blocks where the signature
  is absent or blank; stale-but-present signatures pass through unchanged.

**Frequency:** Occurs on every session that (a) uses extended thinking
and (b) grows large
enough to trigger compaction. Confirmed across 5 incidents on the same version.

---

## Reproduction

1. Configure a session with extended thinking enabled
(`thinkingDefault: "extended"` or
   `thinking: "on"`).
2. Run enough turns to accumulate multiple thinking-block assistant messages.
3. Continue until context overflow triggers compaction (or force it
manually if the CLI
   supports that).
4. Issue any prompt after compaction completes.
5. Observe `Invalid signature in thinking block` from the Anthropic API.

**Prerequisite:** The session must have at least one pre-compaction
assistant turn with a
`thinkingSignature` value that ends up in the compaction's "kept" message set.

---

## Root Cause Analysis

### The compaction re-injection mechanism

When compaction runs, it:

1. Selects a "keep boundary" — the earliest message to preserve
(`firstKeptEntryId`).
2. Writes post-compaction new work (all turns that happened after
compaction started) as
   new JSONL entries branching from an early anchor.
3. **Re-injects the kept pre-compaction messages as new JSONL
entries** with new UUIDs but
   identical content, timestamped at the compaction time, as children
of the last new-work
   message.

This produces an active-branch context where the kept pre-compaction
messages appear at the
**end**, after all the post-compaction work.

### Why signatures become invalid

Anthropic's extended-thinking `thinkingSignature` is a cryptographic
commitment to the
conversation state at the time the thinking block was generated.
Specifically, it encodes
the full conversation prefix (all messages before that assistant turn)
at generation time.

After compaction, the re-injected kept messages appear in a completely
different conversation
prefix: a handful of early messages plus potentially many
post-compaction turns. Anthropic's
server-side validation computes the expected signature for the new
prefix position and finds
a mismatch.

### Byte-identical signature evidence

In the confirmed incident specimen, JSONL entries at lines 254–266
(compaction-injected
copies of lines 125–137) have `thinkingSignature` values that are
byte-for-byte identical
to their originals at lines 125–137. The signatures are not stale by
any code transformation
— they are preserved exactly, and become invalid purely because the
conversation prefix they
encode no longer matches the position they're replayed in.

### The deduplication window cannot prevent this

`collectDuplicateUserMessageEntryIdsForCompaction` uses a 60-second window
(`DEFAULT_DUPLICATE_USER_MESSAGE_WINDOW_MS = 6e4`). In a typical
session long enough to
trigger compaction, the original and re-injected copies of a message
are separated by many
minutes (confirmed incident gap: ~63 minutes). The window misses it by
a factor of ~63×.

### Why `stripInvalidThinkingSignatures` does not catch it

```javascript
function hasReplayableThinkingSignature(block) {
    // Returns true if signature is PRESENT AND NON-BLANK
    return [...fieldVariants].some(v => typeof v === "string" && v.length > 0);
}
```

Re-injected thinking blocks have signatures that are present and
non-blank. The function
returns `true` and keeps them. It cannot distinguish "present and
valid" from "present and
contextually stale."

---

## Proposed Fix (J-1)

**Location:** The compaction re-injection path — wherever kept
messages are written as new
JSONL entries (in `buildSuccessorEntries` or the equivalent in the
compaction module).

**Change:** Before writing re-injected kept messages, strip
`thinkingSignature` from all
thinking blocks in assistant messages. The thinking text can be
preserved for continuity,
but the signature must not carry over because it was issued for a
different context prefix.

```javascript
// Strip thinking signatures from assistant messages before re-injection.
// Signatures are bound to the original conversation prefix; replaying them
// in a structurally different context causes Anthropic API rejection.
// Thinking text is preserved; stripInvalidThinkingSignatures will convert
// unsigned blocks to synthetic text on the next API call.
function stripThinkingSignaturesForReinject(message) {
  if (message.role !== "assistant" || !Array.isArray(message.content)) {
    return message;
  }
  let changed = false;
  const newContent = message.content.map(block => {
    if (block.type === "thinking" && block.thinkingSignature) {
      changed = true;
      const { thinkingSignature, ...rest } = block;
      return rest;
    }
    return block;
  });
  return changed ? { ...message, content: newContent } : message;
}
```

Apply this transform to all kept entries before
`orderSuccessorEntries` writes them.

**Why this is safe:** `stripInvalidThinkingSignatures` already handles
absent-signature
thinking blocks by converting them to
`buildOmittedAssistantReasoningContent()` (a synthetic
text block). The thinking text is preserved in the JSONL for context
continuity; only the
cryptographically invalid signature is removed. No reasoning integrity
regression beyond what
compaction already introduces.

**Scope note:** All kept entries (not just re-injected ones) should
have their signatures
stripped. Any kept message that was generated in a now-different
context prefix has an
invalid signature by the same logic. Stripping all of them is correct,
not over-broad.

---

## Current Workaround

Two mitigations can be applied via config without code changes:

**Primary workaround:** Disable extended thinking by default.

```json
{
  "agents": {
    "defaults": {
      "thinkingDefault": "off"
    }
  }
}
```

This prevents thinking blocks from accumulating, so there are no
signatures to become
invalid at compaction. Extended thinking can be re-enabled per-session
as needed.

**Secondary hardening (J-3):** Enable `truncateAfterCompaction`.

```json
{
  "agents": {
    "defaults": {
      "compaction": {
        "truncateAfterCompaction": true
      }
    }
  }
}
```

This causes the session to rotate to a clean successor JSONL after
compaction, preventing
dead-branch transcript accumulation across multiple compaction cycles.
It does **not** prevent
signature corruption on the first post-compaction API call — the
re-injected messages still
carry stale signatures in the successor JSONL. It limits the blast
radius to a single
compaction cycle.

**Note:** Neither workaround recovers an already-corrupted session.
Manual recovery
requires identifying the thinking blocks at the error position in the
session JSONL and
removing their `thinkingSignature` fields.

---

## Plugin Feasibility Analysis (Why This Cannot Be Fixed by an External Plugin)

The correct insertion point requires mutating session messages between
compaction re-injection
and API submission. Every relevant hook in the plugin system fails for
a specific reason:

| Hook | Status | Why it cannot fix this |
|---|---|---|
| `before_compaction` | Void | `runVoidHook()` — mutations to
`event.messages` not processed |
| `after_compaction` | Void | Same; receives `sessionFile` path, not
mutable messages |
| `before_agent_run` | Modifying | Return type is `{outcome:
"block"\|"pass"}` only; cannot rewrite messages |
| `before_prompt_build` | Additive | Additive context injection only;
cannot rewrite existing session messages |
| `llm_input` | Observation | Read-only |

The fix must be inside `buildSuccessorEntries` (or the equivalent
re-injection path), which
has no external hook seam. A custom provider plugin wrapping the
Anthropic provider via
`sanitizeReplayHistory` is theoretically possible but requires
proxying all Anthropic API
traffic and has no clean handoff point from the compaction module.

---

## Suggested Companion Fix (J-2)

As defense-in-depth, a post-error recovery path would allow
already-corrupted sessions to
recover without manual JSONL editing:

On receiving `invalid_request_error` matching
`Invalid.*signature.*thinking block`:
1. Parse the wire-format position from the error message.
2. Map the wire position back to the internal session JSONL entry.
3. Strip `thinkingSignature` from that block.
4. Retry the API call.
5. Cap at N retry attempts to prevent infinite loops.

This handles sessions corrupted before J-1 ships, and provides
resilience against any
future scenario where thinking signatures become invalid by other means.


Hook	Status	Why it cannot fix this
`before_compaction`	Void	`runVoidHook()` — mutations to
`event.messages` not processed
`after_compaction`	Void	Same; receives `sessionFile` path, not
mutable messages
`before_agent_run`	Modifying	Return type is `{outcome:
"block"\|"pass"}` only; cannot rewrite messages
`before_prompt_build`	Additive	Additive context injection only;
cannot rewrite existing session messages
`llm_input`	Observation	Read-only

Uh oh!

[Bug]: Compaction re-injection produces stale thinking signatures → Anthropic API rejection #90108

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Environment

Symptoms

Reproduction

Root Cause Analysis

The compaction re-injection mechanism

Why signatures become invalid

Byte-identical signature evidence

The deduplication window cannot prevent this

Why stripInvalidThinkingSignatures does not catch it

Proposed Fix (J-1)

Current Workaround

Plugin Feasibility Analysis (Why This Cannot Be Fixed by an External Plugin)

Suggested Companion Fix (J-2)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `stripInvalidThinkingSignatures` does not catch it