Compaction produces invalid tool_use/tool_result ordering → silent fallback to wrong model

## Summary

After session compaction, the replayed conversation history sent to the Anthropic API contains `tool_use` blocks without matching `tool_result` blocks. Anthropic rejects this with `400 invalid_request_error`. The gateway then silently falls back to the next model in the fallback chain (e.g. `chatgpt/gpt-5.4`) and **never automatically recovers** to the primary model.

This causes the session to silently run on the wrong model indefinitely until manually corrected via `session_status(model=...)` or `/model`.

## Environment

- **OpenClaw version:** 2026.4.9 (0512059)
- **OS:** macOS 26.3.1 (arm64)
- **Primary model:** `anthropic/claude-opus-4-6`
- **Fallbacks:** `chatgpt/gpt-5.4`, `anthropic/claude-opus-4-20250514`, `venice/claude-opus-4-6`, `venice/gpt-5.4`, `venice/claude-sonnet-4-6`
- **Affected session:** `agent:main:telegram:group:-1003753641666` (session ID `59f592d3-6901-43e3-ab07-3bf93f75bae3`)
- **Compactions in session:** 31+

## Exact Error

```
[agent] embedded run agent end: isError=true model=claude-opus-4-6 provider=anthropic
error=LLM request rejected: messages.17: `tool_use` ids were found without `tool_result`
blocks immediately after: call8ouuvbBmWSXgA8nv98RDlHeQ. Each `tool_use` block must have
a corresponding `tool_result` block in the next message.
rawError=400 {"type":"error","error":{"type":"invalid_request_error",...}}
```

## Evidence from gateway.err.log (2026-04-08)

Multiple occurrences throughout the evening, each triggering the same pattern:

| Time (PDT) | tool_use ID | Message index | Fallback target |
|---|---|---|---|
| 20:13:35 | `call8ouuvbBmWSXgA8nv98RDlHeQ` | messages.17 | chatgpt/gpt-5.4 |
| 20:30:15 | `call8ouuvbBmWSXgA8nv98RDlHeQ` | messages.11 | chatgpt/gpt-5.4 |
| 20:36:20 | `call8ouuvbBmWSXgA8nv98RDlHeQ` | messages.11 | chatgpt/gpt-5.4 |
| 20:41:45 | `callPAoAtqCzFifk2JSOMQPmDIcN` | messages.3 | chatgpt/gpt-5.4 |
| 23:11:08 | `callIXN6A0bDrDpHjAfWDECc0xY2` | messages.7 | chatgpt/gpt-5.4 |
| 23:31:11 | `callH6nRwHU5OWsAzGuhjddMuOoz` | messages.13 | chatgpt/gpt-5.4 |

Note the **same tool_use ID** (`call8ouuvbBmWSXgA8nv98RDlHeQ`) appears in the first three errors at different message indices, suggesting the orphaned `tool_use` persists across compactions and shows up at different positions depending on what else was compacted.

## Sequence of Events

1. Session runs normally on `anthropic/claude-opus-4-6`
2. Context grows → auto-compaction triggers
3. Compaction summary is generated (often by `chatgpt/gpt-5.4` per gateway.log: `auto-compaction succeeded for chatgpt/gpt-5.4`)
4. Compacted conversation replayed to Anthropic API
5. **Replay contains orphaned `tool_use` without matching `tool_result`**
6. Anthropic returns `400 invalid_request_error`
7. Gateway logs `[model-fallback] decision=candidate_failed reason=overloaded` (note: mislabeled as "overloaded" — it is actually an invalid request)
8. Falls back to `chatgpt/gpt-5.4` which succeeds
9. **Session stays on fallback model permanently** — no recovery mechanism

## Two bugs

### Bug 1: Compaction produces invalid conversation history
The compacted/summarized conversation is missing `tool_result` blocks for some `tool_use` calls. Anthropic strictly requires every `tool_use` to have a corresponding `tool_result` in the next message.

### Bug 2: No automatic recovery to primary model
After a fallback succeeds, the session remains on the fallback model. There is no mechanism to retry the primary model on subsequent messages. The user/agent must manually reset via `session_status(model=...)`.

### Bug 3 (minor): Incorrect fallback reason
The fallback log labels the reason as `reason=overloaded` when the actual error is a `400 invalid_request_error` (malformed conversation). This makes debugging harder.

## Expected Behavior

1. Compacted conversation should always produce valid `tool_use`/`tool_result` pairs, or strip orphaned `tool_use` blocks during compaction
2. After a temporary fallback, the gateway should attempt the primary model again on the next user message
3. Fallback reason should accurately reflect the error type (e.g. `reason=invalid_request` not `reason=overloaded`)

## Workaround

Manually run `session_status(model="anthropic/claude-opus-4-6")` after each detected drift, or start a new session with `/new`. Neither is automatic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compaction produces invalid tool_use/tool_result ordering → silent fallback to wrong model #63608

Summary

Environment

Exact Error

Evidence from gateway.err.log (2026-04-08)

Sequence of Events

Two bugs

Bug 1: Compaction produces invalid conversation history

Bug 2: No automatic recovery to primary model

Bug 3 (minor): Incorrect fallback reason

Expected Behavior

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time (PDT)	tool_use ID	Message index	Fallback target
20:13:35	`call8ouuvbBmWSXgA8nv98RDlHeQ`	messages.17	chatgpt/gpt-5.4
20:30:15	`call8ouuvbBmWSXgA8nv98RDlHeQ`	messages.11	chatgpt/gpt-5.4
20:36:20	`call8ouuvbBmWSXgA8nv98RDlHeQ`	messages.11	chatgpt/gpt-5.4
20:41:45	`callPAoAtqCzFifk2JSOMQPmDIcN`	messages.3	chatgpt/gpt-5.4
23:11:08	`callIXN6A0bDrDpHjAfWDECc0xY2`	messages.7	chatgpt/gpt-5.4
23:31:11	`callH6nRwHU5OWsAzGuhjddMuOoz`	messages.13	chatgpt/gpt-5.4

Uh oh!

Compaction produces invalid tool_use/tool_result ordering → silent fallback to wrong model #63608

Description

Summary

Environment

Exact Error

Evidence from gateway.err.log (2026-04-08)

Sequence of Events

Two bugs

Bug 1: Compaction produces invalid conversation history

Bug 2: No automatic recovery to primary model

Bug 3 (minor): Incorrect fallback reason

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions