Skip to content

Inbound user messages are not persisted to session JSONL when the agent attempt throws #86592

@aacodex401

Description

@aacodex401

Summary

role: "user" entries are only written to the session transcript JSONL after the agent attempt completes successfully. If the attempt throws (e.g., context-window exceeded, upstream API error, model rejection), the user prompt is never appended. Subsequent restart/recovery and history-replay see a transcript that contains assistant turns but none of the user turns that triggered them. When the session reaches totalTokens > contextTokens, every subsequent turn throws → no user lines ever persist → preflight compaction sees an empty/one-sided transcript and throws Preflight compaction required but failed: no real conversation messages, which then fires on every inbound forever (session is permanently wedged).

Evidence

A Telegram topic session ran for ~15 minutes with multiple real inbounds; the resulting jsonl
(11 lines) contained:

  • 0 role:"user" entries
  • 7 role:"assistant", 1 role:"thinking_level_change", 1 role:"custom", 1 session header

sessions.json showed totalTokens=1,131,370 over contextTokens=1,048,576 — every turn likely overflowed and threw upstream.

Comparable healthy sessions in the same dir contain paired user/assistant lines:

  • standard session jsonl → 1 user / 6 assistant
  • topic-suffixed session jsonl (different topic) → 1 user / 4 assistant

Root cause

In agent-command (bundled file dist/agent-command-QBBzz2Au.js, ~L1136-1250):

```js
try {
// … runAgentAttempt …
break;
} catch (err) {
// … emit lifecycle error …
throw err; // <-- exits before persistence block
}
try {
// updateSessionStoreAfterAgentRun + persistCliTurnTranscript / persistAcpTurnTranscript
}
```

persistTextTurnTranscript (dist/attempt-execution-DwY67bt5.js L110) is the only caller that emits role: "user" to the transcript, and it writes BOTH the user prompt and assistant reply in a single post-run pass. When the attempt throws, neither is written, so the inbound user message is silently dropped from the transcript even though it was received, dispatched, and counted in token usage.

Reproduce

  1. Create a session whose persisted totalTokens exceeds contextTokens for the chosen model (or stub the provider call to throw).
  2. Send any inbound (Telegram, CLI, ACP).
  3. Observe: *.jsonl gains no role:"user" line; session metadata still records the run.
  4. Once the session has only assistant-role lines and is over-cap, every subsequent inbound fails preflight compaction with Preflight compaction required but failed: no real conversation messages, wedging the routing key permanently.

Suggested fix

Persist the user message before the agent attempt (in its own write-locked append), independent of the success/failure of the model call. Options:

  • Add appendSessionTranscriptMessage({ role: "user", … }) immediately after the inbound is accepted and before runAgentAttempt, then on success append only the assistant reply.
  • Or wrap the attempt+persistence block in a try/finally so persistence runs whether the attempt threw or not, with the assistant entry replaced by an error stub when applicable.

Either fix preserves transcript integrity for restart-recovery, history-replay, and compaction (which currently see ghost assistant turns with no preceding user prompt on failure-heavy sessions).

Affected symbols

  • appendSessionTranscriptMessage, appendSessionTranscriptMessageLockedtranscript-BA0Ngd-A.js
  • persistTextTurnTranscript, persistCliTurnTranscript, persistAcpTurnTranscriptattempt-execution-DwY67bt5.js
  • agent attempt loop in agent-command-QBBzz2Au.js (~L1136-1250)

Related

Filed alongside a separate issue for status: done session resurrection via the restart-sentinel resume path, which makes this bug terminal (the resurrected, empty session can never be unwedged without manual intervention on sessions.json).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions