Bug Description
Telegram/DM sessions backed by claude-cli can intermittently lose conversation context because OpenClaw invalidates an otherwise valid Claude CLI session as missing-transcript before Claude Code has finished flushing the previous turn transcript to disk.
In practice this looks like "every other Telegram message forgets context": one turn resumes correctly, then a queued/rapid follow-up turn starts fresh with no resume and no OpenClaw history reseed.
Impact
High / user-visible data-loss behavior:
- Active Telegram DM loses conversational continuity mid-session.
- Gateway logs show a fresh Claude session being started (
useResume=false, session=none) even though the prior Claude CLI transcript exists shortly afterward and contains assistant messages.
- If the OpenClaw session transcript file is also missing/stale, fallback reseeding is unavailable (
historyPrompt=none), compounding the amnesia.
Environment
- OpenClaw:
2026.5.4 (325df3e)
- Service:
openclaw-gateway.service
- Channel: Telegram DM
- Agent runtime/provider:
claude-cli
- Model:
claude-opus-4-7
- Platform: Linux/systemd user service
- Claude CLI invocation includes stream-json/replay-user-messages/live-session path
Evidence / Logs
Representative sanitized log sequence:
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=640 trigger=user useResume=true session=present resumeSession=<fp> reuse=reusable historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15992
[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=420 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15993
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=398 trigger=user useResume=true session=present resumeSession=<fp> reuse=reusable historyPrompt=none
[telegram] sendMessage ok chat=<redacted> message=15994
Observed multiple times in one session, e.g. later:
[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=417 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
and:
[agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
[agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=440 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
The gateway also reported queued/stuck session state during the same window:
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:main:direct:<peer> state=processing age=123s queueDepth=1 reason=queued_work_without_active_run classification=stale_session_state recovery=checking
[diagnostic] stuck session recovery skipped: reason=active_reply_work action=keep_lane sessionId=<uuid> sessionKey=agent:main:direct:<peer> age=123s queueDepth=1 activeSessionId=<uuid>
Verification
The session store had a direct DM entry with a Claude CLI binding:
{
"sessionKey": "agent:main:direct:<peer>",
"modelProvider": "claude-cli",
"cliSessionIds": { "claude-cli": "<uuid>" },
"cliSessionBindings": {
"claude-cli": { "sessionId": "<uuid>", "authEpochVersion": 4 }
}
}
At the time of investigation, the OpenClaw transcript path referenced by that store entry did not exist, so openclaw sessions cleanup --dry-run --fix-missing planned to prune the active DM session:
Would prune missing transcripts: 24
prune-missing agent:main:direct:<peer> 8m ago claude-opus-4-7 system id:<uuid>
However, the corresponding Claude CLI transcript under ~/.claude/projects/.../<cliSessionId>.jsonl did exist and contained assistant messages. Examples from the same period:
<cli session A> records=98 assistant=48
<cli session B> records=46 assistant=19
<cli session C> records=74 assistant=35
<cli session D> records=76 assistant=37
<cli session E> records=76 assistant=36
This indicates OpenClaw is sometimes declaring missing-transcript too early, rather than the Claude session being genuinely absent.
Likely Root Cause
claudeCliSessionTranscriptHasContent() appears to perform a one-shot check for a Claude Code JSONL transcript containing an assistant message:
- It scans
~/.claude/projects/*/<sessionId>.jsonl.
- If no assistant message is visible at that exact moment, it returns false.
- The caller invalidates the reusable CLI session with
reason=missing-transcript.
For queued/rapid Telegram turns, this can race Claude Code's transcript flush. The next turn begins before the previous turn's JSONL transcript is visible/populated, so OpenClaw unnecessarily starts a fresh Claude session.
Expected Behavior
Before invalidating a claude-cli session as missing-transcript, OpenClaw should tolerate transcript flush latency, for example by:
- retrying the transcript-content check briefly with backoff/jitter;
- distinguishing "transcript not flushed yet" from "session genuinely unavailable";
- using OpenClaw transcript history/reseed when invalidating resume;
- never producing
useResume=false session=none historyPrompt=none for a continuing DM when the session store has usable history/bindings.
Actual Behavior
A continuing Telegram DM sometimes gets:
cli session reset: provider=claude-cli reason=missing-transcript
cli exec ... useResume=false session=none resumeSession=none reuse=invalidated:missing-transcript historyPrompt=none
The assistant then behaves as if context was lost.
Local Mitigation Tested
A local mitigation stopped the immediate issue:
- Reconstructed the missing OpenClaw direct-session transcript from the matching Claude CLI JSONL transcript.
- Patched the installed dist helper so
claudeCliSessionTranscriptHasContent() retries briefly before returning false:
- 9 attempts
- 250ms delay between attempts
- Restarted
openclaw-gateway.service.
- Verified the active direct session transcript exists and cleanup no longer prunes it as missing.
This is only a local dist patch and will be overwritten by npm upgrades, but it strongly suggests a small upstream retry/debounce around the transcript check is the right fix.
Related Issues
Possibly related but not identical/regression-specific:
This report is specifically about the missing-transcript invalidation race in 2026.5.4 with queued/rapid Telegram DM turns.
Bug Description
Telegram/DM sessions backed by
claude-clican intermittently lose conversation context because OpenClaw invalidates an otherwise valid Claude CLI session asmissing-transcriptbefore Claude Code has finished flushing the previous turn transcript to disk.In practice this looks like "every other Telegram message forgets context": one turn resumes correctly, then a queued/rapid follow-up turn starts fresh with no resume and no OpenClaw history reseed.
Impact
High / user-visible data-loss behavior:
useResume=false,session=none) even though the prior Claude CLI transcript exists shortly afterward and contains assistant messages.historyPrompt=none), compounding the amnesia.Environment
2026.5.4(325df3e)openclaw-gateway.serviceclaude-cliclaude-opus-4-7Evidence / Logs
Representative sanitized log sequence:
Observed multiple times in one session, e.g. later:
and:
The gateway also reported queued/stuck session state during the same window:
Verification
The session store had a direct DM entry with a Claude CLI binding:
{ "sessionKey": "agent:main:direct:<peer>", "modelProvider": "claude-cli", "cliSessionIds": { "claude-cli": "<uuid>" }, "cliSessionBindings": { "claude-cli": { "sessionId": "<uuid>", "authEpochVersion": 4 } } }At the time of investigation, the OpenClaw transcript path referenced by that store entry did not exist, so
openclaw sessions cleanup --dry-run --fix-missingplanned to prune the active DM session:However, the corresponding Claude CLI transcript under
~/.claude/projects/.../<cliSessionId>.jsonldid exist and contained assistant messages. Examples from the same period:This indicates OpenClaw is sometimes declaring
missing-transcripttoo early, rather than the Claude session being genuinely absent.Likely Root Cause
claudeCliSessionTranscriptHasContent()appears to perform a one-shot check for a Claude Code JSONL transcript containing an assistant message:~/.claude/projects/*/<sessionId>.jsonl.reason=missing-transcript.For queued/rapid Telegram turns, this can race Claude Code's transcript flush. The next turn begins before the previous turn's JSONL transcript is visible/populated, so OpenClaw unnecessarily starts a fresh Claude session.
Expected Behavior
Before invalidating a
claude-clisession asmissing-transcript, OpenClaw should tolerate transcript flush latency, for example by:useResume=false session=none historyPrompt=nonefor a continuing DM when the session store has usable history/bindings.Actual Behavior
A continuing Telegram DM sometimes gets:
The assistant then behaves as if context was lost.
Local Mitigation Tested
A local mitigation stopped the immediate issue:
claudeCliSessionTranscriptHasContent()retries briefly before returning false:openclaw-gateway.service.This is only a local dist patch and will be overwritten by npm upgrades, but it strongly suggests a small upstream retry/debounce around the transcript check is the right fix.
Related Issues
Possibly related but not identical/regression-specific:
This report is specifically about the
missing-transcriptinvalidation race in2026.5.4with queued/rapid Telegram DM turns.