Summary
Companion to #64386 — that issue covers mcpConfigHash drift on gateway restart. This issue reports the second failure mode of the same function: extraSystemPromptHash drifts on every turn transition in any group-style channel (Discord channels, Telegram groups, etc.), completely independent of restarts, causing every turn-2 reply to be generated against a fresh claude -p with no memory of turn 1.
Affects any deployment using claude-cli backend where the agent replies in more than one group channel. With default settings this is "all Discord/Telegram group users."
Users experience this as "the agent has amnesia within seconds" — which they often misattribute to model behaviour or config rather than a session-plumbing bug.
Repro (does not require a restart)
- Configure any agent on
claude-cli backend in a Discord guild channel (channel_type=text).
- Mention the agent twice in succession — ~10 seconds apart, within the 20-minute idle window.
- Watch
~/.openclaw/logs/gateway.log:
[HH:MM:00] cli session reset: provider=claude-cli reason=system-prompt
- Turn 2's reply reads as if it never saw turn 1. Transcript
.jsonl on disk shows two separate session_ids, not one resumed session.
Expected: turn 2 reuses the session and references turn 1.
Root cause
In src/auto-reply/reply/route-reply.ts (bundled as get-reply-*.js in releases), runPreparedReply assembles extraSystemPromptParts:
const shouldInjectGroupIntro = Boolean(
isGroupChat && (isFirstTurnInSession || sessionEntry?.groupActivationNeedsSystemIntro)
);
const groupIntro = shouldInjectGroupIntro
? buildGroupIntro({ cfg, sessionCtx, sessionEntry, defaultActivation, silentToken: SILENT_REPLY_TOKEN })
: "";
// ...
const extraSystemPromptParts = [
buildInboundMetaSystemPrompt(...),
groupChatContext,
groupIntro, // <-- non-empty only on first turn / re-intro
groupSystemPrompt,
buildExecOverridePromptHint(...)
].filter(Boolean);
groupIntro (src/auto-reply/reply/groups.ts:buildGroupIntro) emits a ~200–400 char block describing activation mode ("Activation: always-on (you receive every group message)..." etc). By design it is present on turn 1 and absent on turn 2+.
That assembled text is joined and hashed in src/agents/cli-runner/prepare.ts:
const extraSystemPrompt = params.extraSystemPrompt?.trim() ?? "";
const extraSystemPromptHash = hashCliSessionText(extraSystemPrompt);
Turn 1 hash: sha256(inboundMeta + groupChatContext + groupIntro + groupSystemPrompt + execHint)
Turn 2 hash: sha256(inboundMeta + groupChatContext + "" + groupSystemPrompt + execHint)
Different bytes → different hash. On turn 2, resolveCliSessionReuse (src/agents/cli-session.ts) hits:
if (normalizeOptionalString(binding?.extraSystemPromptHash) !== currentExtraSystemPromptHash)
return { invalidatedReason: "system-prompt" };
→ runCliWithSession(undefined) → claude -p (fresh) → amnesia.
Why this is the wrong invalidation key
This is the same category of bug as #64386 and the underlying architectural mistake is worth calling out together:
The system prompt is re-sent to claude-cli on every invocation via --system-prompt / --append-system-prompt. It does not live inside the --resume transcript. A different system prompt on turn 2 is not a corruption — it's the normal case (context drift, new tools, different user flags).
By the same token, mcpConfigHash (per #64386) hashes --mcp-config content that is also re-read per invocation.
So both extraSystemPromptHash and mcpConfigHash as session-reuse keys optimize for an imagined failure mode (resume-with-stale-env corrupts the transcript) that does not exist in the CLI runtime. They only produce false-positive invalidations.
The only legitimate invalidation keys are the two that remain: authProfileId and authEpoch — a genuine auth rotation means the stored sessionId likely belongs to a different account and shouldn't be resumed.
Additional aggravator in the writer
setCliSessionBinding (src/agents/cli-session.ts) stores optional fields via spread-conditional:
...normalizeOptionalString(binding.extraSystemPromptHash)
? { extraSystemPromptHash: normalizeOptionalString(binding.extraSystemPromptHash) }
: {}
If a binding was first written under a dist that didn't populate the field, or by a turn that happened to produce an empty prompt, the stored binding lacks the field. The comparator then treats undefined !== <hex> as a mismatch, invalidating every subsequent turn for the lifetime of that binding.
In this environment I observed two live Discord channel bindings in sessions.json with exactly this shape:
"cliSessionBindings": {
"claude-cli": {
"sessionId": "ab533298-...",
"mcpConfigHash": "452116cfc1..."
// no authProfileId, no extraSystemPromptHash
}
}
Every turn on those channels produced reason=system-prompt until the fix was applied.
Evidence from a live gateway
2026-04-19T11:38:10 [agent] cli session reset: provider=claude-cli reason=system-prompt <- turn-2 after fresh session
2026-04-19T11:51:02 [agent] cli session reset: provider=claude-cli reason=system-prompt <- turn-3 in a separate channel
2026-04-19T13:15:00 [agent] cli session reset: provider=claude-cli reason=mcp <- #64386 after restart
2026-04-19T18:18:55 [agent] cli session reset: provider=claude-cli reason=mcp <- #64386 after another restart
With the comparator change described below applied locally, zero reason=system-prompt events in 24h across five active Discord channels.
Why tests did not catch it
src/agents/cli-session.test.ts tests resolveCliSessionReuse with hand-crafted binding/params pairs, but has no end-to-end test that computes extraSystemPromptHash across two consecutive turn transitions in a group context. A test as narrow as:
it("reuses session across turn-1 → turn-2 in a group channel", async () => {
const turn1Hash = await runTurnAndReturnHash({ isFirstTurn: true, isGroup: true });
const turn2Hash = await runTurnAndReturnHash({ isFirstTurn: false, isGroup: true });
expect(turn2Hash).toBe(turn1Hash); // would fail today
});
would have caught this.
Suggested fix
Two options, in order of preference:
1. Drop extraSystemPromptHash and mcpConfigHash from resolveCliSessionReuse entirely. Keep only authProfileId and authEpoch. This fixes both this issue and #64386 in one patch, and removes a class of future regressions when anyone adds a new part to extraSystemPromptParts or merges additional ephemeral state into mergedConfig.
2. If the hashes must be retained for some reason not yet documented, normalize the hashed inputs to strip turn-variable content:
Either way, make the comparator tolerant: if binding[field] is undefined (legacy binding), skip that axis rather than invalidating.
Workaround in use
Local patch replaces resolveCliSessionReuse body with option (1) + tolerant auth comparison. Applied via a gateway-launcher hook (~/.openclaw/bin/apply-hermes-dist-patches.sh) that re-applies after auto-updates and detects upstream refactors to fail loudly in a log rather than silently.
Happy to open a PR against src/agents/cli-session.ts with the comparator change + a regression test covering the turn-1 → turn-2 group-channel scenario if maintainers would accept it.
Environment
openclaw 2026.4.15 (install via npm i -g openclaw)
claude-cli backend, OAuth auth profile (Claude Pro)
- macOS 14, Node 22, launchd-managed gateway on port 18789
- Discord channels (guild text, not DM)
Summary
Companion to #64386 — that issue covers
mcpConfigHashdrift on gateway restart. This issue reports the second failure mode of the same function:extraSystemPromptHashdrifts on every turn transition in any group-style channel (Discord channels, Telegram groups, etc.), completely independent of restarts, causing every turn-2 reply to be generated against a freshclaude -pwith no memory of turn 1.Affects any deployment using
claude-clibackend where the agent replies in more than one group channel. With default settings this is "all Discord/Telegram group users."Users experience this as "the agent has amnesia within seconds" — which they often misattribute to model behaviour or config rather than a session-plumbing bug.
Repro (does not require a restart)
claude-clibackend in a Discord guild channel (channel_type=text).~/.openclaw/logs/gateway.log:.jsonlon disk shows two separatesession_ids, not one resumed session.Expected: turn 2 reuses the session and references turn 1.
Root cause
In
src/auto-reply/reply/route-reply.ts(bundled asget-reply-*.jsin releases),runPreparedReplyassemblesextraSystemPromptParts:groupIntro(src/auto-reply/reply/groups.ts:buildGroupIntro) emits a ~200–400 char block describing activation mode ("Activation: always-on (you receive every group message)..." etc). By design it is present on turn 1 and absent on turn 2+.That assembled text is joined and hashed in
src/agents/cli-runner/prepare.ts:Turn 1 hash:
sha256(inboundMeta + groupChatContext + groupIntro + groupSystemPrompt + execHint)Turn 2 hash:
sha256(inboundMeta + groupChatContext + "" + groupSystemPrompt + execHint)Different bytes → different hash. On turn 2,
resolveCliSessionReuse(src/agents/cli-session.ts) hits:→
runCliWithSession(undefined)→claude -p(fresh) → amnesia.Why this is the wrong invalidation key
This is the same category of bug as #64386 and the underlying architectural mistake is worth calling out together:
The system prompt is re-sent to
claude-clion every invocation via--system-prompt/--append-system-prompt. It does not live inside the--resumetranscript. A different system prompt on turn 2 is not a corruption — it's the normal case (context drift, new tools, different user flags).By the same token,
mcpConfigHash(per #64386) hashes--mcp-configcontent that is also re-read per invocation.So both
extraSystemPromptHashandmcpConfigHashas session-reuse keys optimize for an imagined failure mode (resume-with-stale-env corrupts the transcript) that does not exist in the CLI runtime. They only produce false-positive invalidations.The only legitimate invalidation keys are the two that remain:
authProfileIdandauthEpoch— a genuine auth rotation means the storedsessionIdlikely belongs to a different account and shouldn't be resumed.Additional aggravator in the writer
setCliSessionBinding(src/agents/cli-session.ts) stores optional fields via spread-conditional:If a binding was first written under a dist that didn't populate the field, or by a turn that happened to produce an empty prompt, the stored binding lacks the field. The comparator then treats
undefined !== <hex>as a mismatch, invalidating every subsequent turn for the lifetime of that binding.In this environment I observed two live Discord channel bindings in
sessions.jsonwith exactly this shape:Every turn on those channels produced
reason=system-promptuntil the fix was applied.Evidence from a live gateway
With the comparator change described below applied locally, zero
reason=system-promptevents in 24h across five active Discord channels.Why tests did not catch it
src/agents/cli-session.test.tstestsresolveCliSessionReusewith hand-craftedbinding/paramspairs, but has no end-to-end test that computesextraSystemPromptHashacross two consecutive turn transitions in a group context. A test as narrow as:would have caught this.
Suggested fix
Two options, in order of preference:
1. Drop
extraSystemPromptHashandmcpConfigHashfromresolveCliSessionReuseentirely. Keep onlyauthProfileIdandauthEpoch. This fixes both this issue and #64386 in one patch, and removes a class of future regressions when anyone adds a new part toextraSystemPromptPartsor merges additional ephemeral state intomergedConfig.2. If the hashes must be retained for some reason not yet documented, normalize the hashed inputs to strip turn-variable content:
extraSystemPromptHash: hash onlybuildInboundMetaSystemPrompt+groupSystemPrompt. Explicitly excludegroupIntro,groupChatContext(varies with member list), andbuildExecOverridePromptHint(varies with per-message elevation state).mcpConfigHash: compute on the user-authoredmergedConfigbeforeadditionalConfigmerges the loopback entry, per Claude CLI sessions reset on every gateway restart due to ephemeral loopback port in mcpConfigHash #64386.Either way, make the comparator tolerant: if
binding[field]is undefined (legacy binding), skip that axis rather than invalidating.Workaround in use
Local patch replaces
resolveCliSessionReusebody with option (1) + tolerant auth comparison. Applied via a gateway-launcher hook (~/.openclaw/bin/apply-hermes-dist-patches.sh) that re-applies after auto-updates and detects upstream refactors to fail loudly in a log rather than silently.Happy to open a PR against
src/agents/cli-session.tswith the comparator change + a regression test covering the turn-1 → turn-2 group-channel scenario if maintainers would accept it.Environment
openclaw2026.4.15 (install vianpm i -g openclaw)claude-clibackend, OAuth auth profile (Claude Pro)