Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state

## Summary

When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source:

- `abortedLastRun` is persisted on abort, but later overwritten by subsequent turn results (`result.meta.aborted ?? false`)
- `reply` injects an aborted hint (`The previous agent run was aborted by the user ...`)
- `sanitizeThinkingForRecovery()` treats `incomplete-text` as `prefill: true`

This combination appears to let the next turn see **partial assistant text + a transient aborted marker**, which can lead to inconsistent self-reporting and unstable follow-up behavior.

## Observed behavior

Using controlled sacrificial-session experiments:

1. A long assistant output is aborted
2. A follow-up is sent in the same session
3. The follow-up sometimes:
   - claims the previous response completed successfully when it did not
   - cannot reliably tell whether the previous aborted output was its own last turn
   - becomes aborted itself before giving a stable answer

This reproduces as a **soft recovery-state inconsistency**, even when hard stuck-lane behavior does not reproduce.

## Why this looks like a runtime issue rather than just model behavior

We tested multiple models with the same pattern:

- MiniMax-M2.7 tends to "fill narrative gaps"
- gemma-4-31b-it is less agentic but still unstable under aborted follow-ups
- a lite model is more direct, but can still make incorrect state judgments

So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain.

## Relevant implementation points (current main also appears affected)

- `src/agents/pi-embedded-runner/thinking.ts`
  - `RecoveryAssessment = "valid" | "incomplete-thinking" | "incomplete-text"`
  - `sanitizeThinkingForRecovery(...)`
  - `incomplete-text` path uses `prefill: true`
- `src/auto-reply/reply/body.ts`
  - `abortedLastRun` is converted into an aborted hint for the next reply
- `src/agents/command/session-store.ts`
  - `next.abortedLastRun = result.meta.aborted ?? false`
- `src/process/command-queue.ts`
  - `lane wait exceeded` appears diagnostic, not self-healing

## Suggested direction

At minimum, please consider whether one of these should change:

1. When `abortedLastRun=true`, do **not** allow `incomplete-text -> prefill: true`
2. Preserve abort state as a stable recovery fact instead of a one-turn transient flag
3. Prefer structured recovery facts over partial assistant-body continuation when the previous run was aborted

## Environment

- Local installed version: `2026.4.5`
- Latest release checked: `v2026.4.5`
- Main branch logic still appears consistent with the above paths

## Related but not identical

I saw #54964 (zombie session after embedded init failure), which seems adjacent, but this report is specifically about aborted-run recovery semantics and inconsistent next-turn state.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state #62322

Summary

Observed behavior

Why this looks like a runtime issue rather than just model behavior

Relevant implementation points (current main also appears affected)

Suggested direction

Environment

Related but not identical

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state #62322

Description

Summary

Observed behavior

Why this looks like a runtime issue rather than just model behavior

Relevant implementation points (current main also appears affected)

Suggested direction

Environment

Related but not identical

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions