claude-cli Telegram DM context loss from missing-transcript race before transcript flush

## Bug Description

Telegram/DM sessions backed by `claude-cli` can intermittently lose conversation context because OpenClaw invalidates an otherwise valid Claude CLI session as `missing-transcript` before Claude Code has finished flushing the previous turn transcript to disk.

In practice this looks like "every other Telegram message forgets context": one turn resumes correctly, then a queued/rapid follow-up turn starts fresh with no resume and no OpenClaw history reseed.

## Impact

High / user-visible data-loss behavior:

- Active Telegram DM loses conversational continuity mid-session.
- Gateway logs show a fresh Claude session being started (`useResume=false`, `session=none`) even though the prior Claude CLI transcript exists shortly afterward and contains assistant messages.
- If the OpenClaw session transcript file is also missing/stale, fallback reseeding is unavailable (`historyPrompt=none`), compounding the amnesia.

## Environment

- OpenClaw: `2026.5.4` (`325df3e`)
- Service: `openclaw-gateway.service`
- Channel: Telegram DM
- Agent runtime/provider: `claude-cli`
- Model: `claude-opus-4-7`
- Platform: Linux/systemd user service
- Claude CLI invocation includes stream-json/replay-user-messages/live-session path

## Evidence / Logs

Representative sanitized log sequence:

```text
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=640 trigger=user useResume=true session=present resumeSession=<fp> reuse=reusable historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15992

[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=420 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15993

[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=398 trigger=user useResume=true session=present resumeSession=<fp> reuse=reusable historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15994
```

Observed multiple times in one session, e.g. later:

```text
[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=417 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
```

and:

```text
[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=440 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
```

The gateway also reported queued/stuck session state during the same window:

```text
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:main:direct:<peer> state=processing age=123s queueDepth=1 reason=queued_work_without_active_run classification=stale_session_state recovery=checking
[diagnostic] stuck session recovery skipped: reason=active_reply_work action=keep_lane sessionId=<uuid> sessionKey=agent:main:direct:<peer> age=123s queueDepth=1 activeSessionId=<uuid>
```

## Verification

The session store had a direct DM entry with a Claude CLI binding:

```json
{
  "sessionKey": "agent:main:direct:<peer>",
  "modelProvider": "claude-cli",
  "cliSessionIds": { "claude-cli": "<uuid>" },
  "cliSessionBindings": {
    "claude-cli": { "sessionId": "<uuid>", "authEpochVersion": 4 }
  }
}
```

At the time of investigation, the OpenClaw transcript path referenced by that store entry did not exist, so `openclaw sessions cleanup --dry-run --fix-missing` planned to prune the active DM session:

```text
Would prune missing transcripts: 24
prune-missing agent:main:direct:<peer>  8m ago  claude-opus-4-7 system id:<uuid>
```

However, the corresponding Claude CLI transcript under `~/.claude/projects/.../<cliSessionId>.jsonl` did exist and contained assistant messages. Examples from the same period:

```text
<cli session A> records=98 assistant=48
<cli session B> records=46 assistant=19
<cli session C> records=74 assistant=35
<cli session D> records=76 assistant=37
<cli session E> records=76 assistant=36
```

This indicates OpenClaw is sometimes declaring `missing-transcript` too early, rather than the Claude session being genuinely absent.

## Likely Root Cause

`claudeCliSessionTranscriptHasContent()` appears to perform a one-shot check for a Claude Code JSONL transcript containing an assistant message:

- It scans `~/.claude/projects/*/<sessionId>.jsonl`.
- If no assistant message is visible at that exact moment, it returns false.
- The caller invalidates the reusable CLI session with `reason=missing-transcript`.

For queued/rapid Telegram turns, this can race Claude Code's transcript flush. The next turn begins before the previous turn's JSONL transcript is visible/populated, so OpenClaw unnecessarily starts a fresh Claude session.

## Expected Behavior

Before invalidating a `claude-cli` session as `missing-transcript`, OpenClaw should tolerate transcript flush latency, for example by:

1. retrying the transcript-content check briefly with backoff/jitter;
2. distinguishing "transcript not flushed yet" from "session genuinely unavailable";
3. using OpenClaw transcript history/reseed when invalidating resume;
4. never producing `useResume=false session=none historyPrompt=none` for a continuing DM when the session store has usable history/bindings.

## Actual Behavior

A continuing Telegram DM sometimes gets:

```text
cli session reset: provider=claude-cli reason=missing-transcript
cli exec ... useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
```

The assistant then behaves as if context was lost.

## Local Mitigation Tested

A local mitigation stopped the immediate issue:

1. Reconstructed the missing OpenClaw direct-session transcript from the matching Claude CLI JSONL transcript.
2. Patched the installed dist helper so `claudeCliSessionTranscriptHasContent()` retries briefly before returning false:
   - 9 attempts
   - 250ms delay between attempts
3. Restarted `openclaw-gateway.service`.
4. Verified the active direct session transcript exists and cleanup no longer prunes it as missing.

This is only a local dist patch and will be overwritten by npm upgrades, but it strongly suggests a small upstream retry/debounce around the transcript check is the right fix.

## Related Issues

Possibly related but not identical/regression-specific:

- #70177 — Telegram DM amnesia with missing backing transcript
- #69973 — claude-cli fallback turn loses prior context despite Claude transcript history
- #76986 — resume chain broken / transcriptPath null

This report is specifically about the `missing-transcript` invalidation race in `2026.5.4` with queued/rapid Telegram DM turns.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

claude-cli Telegram DM context loss from missing-transcript race before transcript flush #77974

Bug Description

Impact

Environment

Evidence / Logs

Verification

Likely Root Cause

Expected Behavior

Actual Behavior

Local Mitigation Tested

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

claude-cli Telegram DM context loss from missing-transcript race before transcript flush #77974

Description

Bug Description

Impact

Environment

Evidence / Logs

Verification

Likely Root Cause

Expected Behavior

Actual Behavior

Local Mitigation Tested

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions