Claude CLI sessions reset on every turn in group channels due to groupIntro drift in extraSystemPromptHash

## Summary

Companion to #64386 — that issue covers `mcpConfigHash` drift on gateway restart. This issue reports the second failure mode of the same function: `extraSystemPromptHash` drifts **on every turn transition in any group-style channel** (Discord channels, Telegram groups, etc.), completely independent of restarts, causing every turn-2 reply to be generated against a fresh `claude -p` with no memory of turn 1.

Affects any deployment using `claude-cli` backend where the agent replies in more than one group channel. With default settings this is "all Discord/Telegram group users."

Users experience this as "the agent has amnesia within seconds" — which they often misattribute to model behaviour or config rather than a session-plumbing bug.

## Repro (does not require a restart)

1. Configure any agent on `claude-cli` backend in a Discord guild channel (`channel_type=text`).
2. Mention the agent twice in succession — ~10 seconds apart, within the 20-minute idle window.
3. Watch `~/.openclaw/logs/gateway.log`:

```
[HH:MM:00] cli session reset: provider=claude-cli reason=system-prompt
```

4. Turn 2's reply reads as if it never saw turn 1. Transcript `.jsonl` on disk shows two separate `session_id`s, not one resumed session.

Expected: turn 2 reuses the session and references turn 1.

## Root cause

In `src/auto-reply/reply/route-reply.ts` (bundled as `get-reply-*.js` in releases), `runPreparedReply` assembles `extraSystemPromptParts`:

```ts
const shouldInjectGroupIntro = Boolean(
  isGroupChat && (isFirstTurnInSession || sessionEntry?.groupActivationNeedsSystemIntro)
);
const groupIntro = shouldInjectGroupIntro
  ? buildGroupIntro({ cfg, sessionCtx, sessionEntry, defaultActivation, silentToken: SILENT_REPLY_TOKEN })
  : "";
// ...
const extraSystemPromptParts = [
  buildInboundMetaSystemPrompt(...),
  groupChatContext,
  groupIntro,           // <-- non-empty only on first turn / re-intro
  groupSystemPrompt,
  buildExecOverridePromptHint(...)
].filter(Boolean);
```

`groupIntro` (`src/auto-reply/reply/groups.ts:buildGroupIntro`) emits a ~200–400 char block describing activation mode ("Activation: always-on (you receive every group message)..." etc). By design it is present on turn 1 and absent on turn 2+.

That assembled text is joined and hashed in `src/agents/cli-runner/prepare.ts`:

```ts
const extraSystemPrompt = params.extraSystemPrompt?.trim() ?? "";
const extraSystemPromptHash = hashCliSessionText(extraSystemPrompt);
```

Turn 1 hash: `sha256(inboundMeta + groupChatContext + groupIntro + groupSystemPrompt + execHint)`  
Turn 2 hash: `sha256(inboundMeta + groupChatContext + ""                  + groupSystemPrompt + execHint)`

Different bytes → different hash. On turn 2, `resolveCliSessionReuse` (`src/agents/cli-session.ts`) hits:

```ts
if (normalizeOptionalString(binding?.extraSystemPromptHash) !== currentExtraSystemPromptHash)
  return { invalidatedReason: "system-prompt" };
```

→ `runCliWithSession(undefined)` → `claude -p` (fresh) → amnesia.

## Why this is the wrong invalidation key

This is the same category of bug as #64386 and the underlying architectural mistake is worth calling out together:

The system prompt is **re-sent** to `claude-cli` on every invocation via `--system-prompt` / `--append-system-prompt`. It does not live inside the `--resume` transcript. A different system prompt on turn 2 is not a corruption — it's the normal case (context drift, new tools, different user flags).

By the same token, `mcpConfigHash` (per #64386) hashes `--mcp-config` content that is also re-read per invocation.

So both `extraSystemPromptHash` and `mcpConfigHash` as session-reuse keys optimize for an imagined failure mode (resume-with-stale-env corrupts the transcript) that does not exist in the CLI runtime. They only produce false-positive invalidations.

The only legitimate invalidation keys are the two that remain: `authProfileId` and `authEpoch` — a genuine auth rotation means the stored `sessionId` likely belongs to a different account and shouldn't be resumed.

## Additional aggravator in the writer

`setCliSessionBinding` (`src/agents/cli-session.ts`) stores optional fields via spread-conditional:

```ts
...normalizeOptionalString(binding.extraSystemPromptHash)
  ? { extraSystemPromptHash: normalizeOptionalString(binding.extraSystemPromptHash) }
  : {}
```

If a binding was first written under a dist that didn't populate the field, **or** by a turn that happened to produce an empty prompt, the stored binding lacks the field. The comparator then treats `undefined !== <hex>` as a mismatch, invalidating every subsequent turn for the lifetime of that binding.

In this environment I observed two live Discord channel bindings in `sessions.json` with exactly this shape:

```json
"cliSessionBindings": {
  "claude-cli": {
    "sessionId": "ab533298-...",
    "mcpConfigHash": "452116cfc1..."
    // no authProfileId, no extraSystemPromptHash
  }
}
```

Every turn on those channels produced `reason=system-prompt` until the fix was applied.

## Evidence from a live gateway

```
2026-04-19T11:38:10 [agent] cli session reset: provider=claude-cli reason=system-prompt  <- turn-2 after fresh session
2026-04-19T11:51:02 [agent] cli session reset: provider=claude-cli reason=system-prompt  <- turn-3 in a separate channel
2026-04-19T13:15:00 [agent] cli session reset: provider=claude-cli reason=mcp              <- #64386 after restart
2026-04-19T18:18:55 [agent] cli session reset: provider=claude-cli reason=mcp              <- #64386 after another restart
```

With the comparator change described below applied locally, zero `reason=system-prompt` events in 24h across five active Discord channels.

## Why tests did not catch it

`src/agents/cli-session.test.ts` tests `resolveCliSessionReuse` with hand-crafted `binding`/`params` pairs, but has no end-to-end test that computes `extraSystemPromptHash` across two consecutive turn transitions in a group context. A test as narrow as:

```ts
it("reuses session across turn-1 → turn-2 in a group channel", async () => {
  const turn1Hash = await runTurnAndReturnHash({ isFirstTurn: true, isGroup: true });
  const turn2Hash = await runTurnAndReturnHash({ isFirstTurn: false, isGroup: true });
  expect(turn2Hash).toBe(turn1Hash); // would fail today
});
```

would have caught this.

## Suggested fix

Two options, in order of preference:

**1. Drop `extraSystemPromptHash` and `mcpConfigHash` from `resolveCliSessionReuse` entirely.** Keep only `authProfileId` and `authEpoch`. This fixes both this issue and #64386 in one patch, and removes a class of future regressions when anyone adds a new part to `extraSystemPromptParts` or merges additional ephemeral state into `mergedConfig`.

**2. If the hashes must be retained** for some reason not yet documented, normalize the hashed inputs to strip turn-variable content:
   - For `extraSystemPromptHash`: hash only `buildInboundMetaSystemPrompt` + `groupSystemPrompt`. Explicitly exclude `groupIntro`, `groupChatContext` (varies with member list), and `buildExecOverridePromptHint` (varies with per-message elevation state).
   - For `mcpConfigHash`: compute on the user-authored `mergedConfig` **before** `additionalConfig` merges the loopback entry, per #64386.

Either way, make the comparator tolerant: if `binding[field]` is undefined (legacy binding), skip that axis rather than invalidating.

## Workaround in use

Local patch replaces `resolveCliSessionReuse` body with option (1) + tolerant auth comparison. Applied via a gateway-launcher hook (`~/.openclaw/bin/apply-hermes-dist-patches.sh`) that re-applies after auto-updates and detects upstream refactors to fail loudly in a log rather than silently.

Happy to open a PR against `src/agents/cli-session.ts` with the comparator change + a regression test covering the turn-1 → turn-2 group-channel scenario if maintainers would accept it.

## Environment

- `openclaw` 2026.4.15 (install via `npm i -g openclaw`)
- `claude-cli` backend, OAuth auth profile (Claude Pro)
- macOS 14, Node 22, launchd-managed gateway on port 18789
- Discord channels (guild text, not DM)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude CLI sessions reset on every turn in group channels due to groupIntro drift in extraSystemPromptHash #69118

Summary

Repro (does not require a restart)

Root cause

Why this is the wrong invalidation key

Additional aggravator in the writer

Evidence from a live gateway

Why tests did not catch it

Suggested fix

Workaround in use

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Claude CLI sessions reset on every turn in group channels due to groupIntro drift in extraSystemPromptHash #69118

Description

Summary

Repro (does not require a restart)

Root cause

Why this is the wrong invalidation key

Additional aggravator in the writer

Evidence from a live gateway

Why tests did not catch it

Suggested fix

Workaround in use

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions