[Bug]: Embedded agent failover treats session-file mutation as model failure and exhausts all fallbacks

### Bug type

Regression (worked before, now fails)

### Beta release blocker

No

### Summary

OpenClaw 2026.5.16-beta.6 can fail an embedded agent turn before replying when the active session JSONL changes while the embedded prompt lock is released. The failure is classified as a model/candidate failure, so the same local session-file mutation is retried across unrelated fallback models. This exhausts the fallback chain and surfaces as “All models failed”, even though the first two failures are not provider/model failures.

### Steps to reproduce

1. Use a long-lived embedded agent session with automatic context/maintenance activity enabled.
2. Trigger an embedded agent turn on the active main session.
3. While the embedded prompt lock is released, allow a local process such as context compaction, maintenance, memory sync, or another gateway write path to append/update the same session JSONL file.
4. Observe model fallback behavior for the turn.

### Expected behavior

A local session takeover/session-file mutation should be treated as a runtime coordination failure, not as a model failure.

The system should either:
- abort the current turn with a clear non-model error and avoid consuming fallback models, or
- restart/rebase the turn safely after re-reading the updated session state, if that is supported.

Fallback should be reserved for provider/model failures such as timeout, rate limit, auth, or provider runtime errors.

### Actual behavior

The embedded run fails before reply with All models failed. The same session-file mutation is counted against multiple model candidates:

- openai/gpt-5.5 fails with: session file changed while embedded prompt lock was released
- claude-bridge/claude-opus-4-7 fails with the same local session-file mutation
- google/gemini-3.1-pro-preview then times out

The user sees a generic model-chain failure even though the primary cause is local session state changing underneath the embedded run.

### OpenClaw version

2026.5.16-beta6

### Operating system

macOS 26.5

### Install method

npm global

### Model

openai/gpt-5.5

### Provider / routing chain

openclaw->openai/gpt-5.5

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

```

### Impact and severity

Severity: High for interactive reliability.

Impact:
- User-facing turns fail before any assistant reply.
- Fallback models are wasted on a non-model local coordination error.
- The final error message misleads operators toward provider/model diagnosis instead of session lock/session mutation handling.
- Repeated occurrences make long-lived or maintenance-heavy sessions unreliable.
- This can mask the true cause when the last fallback happens to time out, because the final surfaced error is All models failed rather than a deterministic session takeover error.

Suggested direction:
- Classify EmbeddedAttemptSessionTakeoverError/session-file mutation as a non-provider runtime coordination error.
- Do not count this error against model fallback candidates.
- Either fail fast with a clear session-concurrency message or retry only after safely rebuilding prompt/context from the updated session file.

### Additional information

## Environment
- Product: OpenClaw
- Version observed in gateway/session UI: 2026.5.16-beta.6
- Host: macOS 26.5
- Node runtime in logs: node 24.15.0
- Main session key: agent:main:main
- Affected session file: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl
- Models/fallback chain observed: openai/gpt-5.5 -> claude-bridge/claude-opus-4-7 -> google/gemini-3.1-pro-preview

## Logs, screenshots, and evidence
Gateway log evidence from /tmp/openclaw/openclaw-2026-05-18.log showed repeated occurrences of this error pattern. A local grep found 61 occurrences of:

session file changed while embedded prompt lock was released

Concrete user-visible failure at 2026-05-18T14:09:40.850+07:00:

Embedded agent failed before reply: All models failed (3): openai/gpt-5.5: session file changed while embedded prompt lock was released: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl (unknown) | claude-bridge/claude-opus-4-7: session file changed while embedded prompt lock was released: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl (unknown) | google/gemini-3.1-pro-preview: LLM idle timeout (120s): no response from model (timeout) | LLM request timed out.

Nearby diagnostic/fallback evidence showed the same local session mutation being recorded as model fallback candidate failures:

lane task error: lane=main durationMs=36956 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl"

model_fallback_decision: candidate_failed, candidate=openai/gpt-5.5, errorPreview="session file changed while embedded prompt lock was released: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl", fallbackStepFinalOutcome="next_fallback"

model_fallback_decision: candidate_failed, candidate=claude-bridge/claude-opus-4-7, errorPreview="session file changed while embedded prompt lock was released: /Users/sompisjunsui/.openclaw/agents/main/sessions/67ebbe47-a99f-4eec-9524-728658b5f6a2.jsonl", fallbackStepFinalOutcome="next_fallback"

model_fallback_decision: candidate_failed, candidate=google/gemini-3.1-pro-preview, reason="timeout", errorPreview="LLM idle timeout (120s): no response from model", fallbackStepFinalOutcome="chain_exhausted"

Additional context from nearby logs:
- lossless-claw/context maintenance was active around the same session.
- The session file was below the auto-rotate size threshold, so this was not simply a large-file rotation case.
- The same session path appears consistently across the failed candidates.

## Workaround
Rollback to 2026.5.16-beta5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Embedded agent failover treats session-file mutation as model failure and exhausts all fallbacks #83510

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Environment

Logs, screenshots, and evidence

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Embedded agent failover treats session-file mutation as model failure and exhausts all fallbacks #83510

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Environment

Logs, screenshots, and evidence

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions