Skip to content

fix: stabilize claude-cli extraSystemPromptHash across group turns (#69118)#91157

Open
sumitaich1998 wants to merge 1 commit into
openclaw:mainfrom
sumitaich1998:fix/issue-69118-cli-groupintro-hash
Open

fix: stabilize claude-cli extraSystemPromptHash across group turns (#69118)#91157
sumitaich1998 wants to merge 1 commit into
openclaw:mainfrom
sumitaich1998:fix/issue-69118-cli-groupintro-hash

Conversation

@sumitaich1998

Copy link
Copy Markdown

Summary

On the claude-cli backend, agent sessions reset on every turn transition in
group-style channels (Discord channels, Telegram groups, etc.). Turn 2 is generated
against a fresh claude -p with no memory of turn 1, so the agent appears to have
"amnesia within seconds." This is independent of gateway restarts and is the companion
to #64386 (which covered mcpConfigHash drift on restart).

Root cause

claude-cli session reuse is gated by a fingerprint (extraSystemPromptHash) computed
in src/agents/cli-runner/prepare.ts over the static extra system prompt
(extraSystemPromptStatic). When the stored hash differs from the freshly computed one,
resolveCliSessionReuse() (src/agents/cli-session.ts) returns
invalidatedReason: "system-prompt", logged as:

cli session reset: provider=claude-cli reason=system-prompt

In src/auto-reply/reply/get-reply-run.ts runPreparedReply(), the hash input
extraSystemPromptStaticParts included groupIntro. But groupIntro is intentionally
first-turn-only:

const shouldInjectGroupIntro = Boolean(
  isGroupChat && (isFirstTurnInSession || sessionEntry?.groupActivationNeedsSystemIntro),
);

So the intro is present in the hash input on turn 1 and absent on turn 2 → the fingerprint
drifts between turns → claude-cli resets the session on every group turn.

The fix

Exclude the volatile, first-turn-only groupIntro from extraSystemPromptStaticParts
(the session-reuse hash input), while still injecting groupIntro into the live prompt
(extraSystemPromptParts) when appropriate. The static fingerprint is now identical across
turn 1 (intro present in the prompt) and turn 2 (intro absent), so the session is reused and
memory is preserved — consistent with how #64386 stabilized mcpConfigHash.

  • src/auto-reply/reply/get-reply-run.ts: drop groupIntro from extraSystemPromptStaticParts;
    the live extraSystemPromptParts still includes it. Added an inline comment documenting the invariant.

This also removes spurious resets on legitimate re-intros (groupActivationNeedsSystemIntro):
the re-intro text is still injected into the prompt but no longer forces a session reset.

Verification

Work was done in an isolated linked git worktree on fix/issue-69118-cli-groupintro-hash
(based on latest upstream/main).

Added a focused colocated regression test in
src/auto-reply/reply/get-reply-run.media-only.test.ts that drives runPreparedReply for a
first turn (groupIntro injected) and a later turn (groupIntro absent), asserting that:

  • the live prompt contains the intro on turn 1 only, and
  • extraSystemPromptStatic is identical across turns and never contains the intro.

Command (run from inside the worktree):

node scripts/run-vitest.mjs src/auto-reply/reply/get-reply-run.media-only.test.ts

Results:

  • With the fix: Test Files 1 passed (1) / Tests 81 passed (81), including the new test.
  • With the one-line fix reverted (groupIntro re-added to the static parts), the new test fails with
    expected 'GROUP-CHAT-CONTEXT\n\nGROUP-INTRO' not to contain 'GROUP-INTRO', confirming it is a
    true regression test.

Honest scope of testing:

  • Unit tests pass locally (Vitest).
  • I did not run OpenClaw locally (no gateway/CLI/pnpm dev/openclaw); the live claude-cli
    group-channel path was not exercised end-to-end.
  • No version bumps and no CHANGELOG.md edits.

Fixes #69118

@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 7, 2026
@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 7, 2026, 7:41 AM ET / 11:41 UTC.

Summary
The PR removes first-turn groupIntro from the CLI session-reuse static prompt hash while keeping it in the live prompt, and adds a focused runPreparedReply regression test.

PR surface: Source +1, Tests +65. Total +66 across 2 files.

Reproducibility: yes. at source level: current main builds groupIntro only on first/re-intro group turns but includes it in the static prompt hash that gates CLI session reuse. I did not run a live claude-cli gateway reproduction in this read-only review.

Review metrics: 1 noteworthy metric.

  • Closing References: 1 closing reference. Merging the PR as written would automatically close the broader linked tracker even though the diff only covers the groupIntro trigger.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted real claude-cli Discord or Telegram group proof showing two consecutive turns reuse session context after the patch.
  • Change the PR body's closing reference to a non-closing reference unless maintainers split or resolve the remaining canonical issue triggers.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body reports focused Vitest output but no after-fix real claude-cli group-channel proof; the contributor should add redacted terminal/log/screenshot or recording proof in the PR body, which should trigger a fresh ClawSweeper review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A live Telegram group transcript would materially prove that the second claude-cli group turn retains context instead of starting fresh. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live: verify two consecutive claude-cli replies in a Telegram group reuse the same session and the second reply remembers the first.

Risk before merge

Maintainer options:

  1. Keep The Canonical Issue Open (recommended)
    Before merge, replace the closing reference with a non-closing reference or split a narrow groupIntro issue so the broader extraSystemPromptHash tracker is not auto-closed.
  2. Accept Canonical Closure Deliberately
    Maintainers can keep the closing reference only if they first decide the remaining issue comments are already separately tracked or intentionally out of scope.

Next step before merge

  • [P1] Human review should resolve the missing real behavior proof and the canonical-issue closing semantics; automation cannot provide the contributor's real environment proof.

Security
Cleared: The diff only changes prompt-composition logic and a colocated Vitest test; it does not touch dependencies, secrets, CI, package metadata, or code execution surfaces.

Review details

Best possible solution:

Land the narrow groupIntro fix only after real group-channel proof, and keep #69118 open or split the remaining extraSystemPromptHash triggers into explicit follow-ups before merge.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main builds groupIntro only on first/re-intro group turns but includes it in the static prompt hash that gates CLI session reuse. I did not run a live claude-cli gateway reproduction in this read-only review.

Is this the best way to solve the issue?

Partly yes: excluding first-turn-only groupIntro from the static hash is the narrow owner-boundary fix for this trigger while preserving the live prompt. It is not sufficient to close the broader canonical issue because comments document other extraSystemPromptHash invalidation paths.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 66b91d78feb3.

Label changes

Label changes:

  • add P1: The PR addresses a user-facing claude-cli session-state regression where group-channel turns can lose prior conversation context.
  • add merge-risk: 🚨 automation: The PR body's closing reference can cause GitHub automation to close the broader canonical issue before all documented triggers are fixed or split.
  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body reports focused Vitest output but no after-fix real claude-cli group-channel proof; the contributor should add redacted terminal/log/screenshot or recording proof in the PR body, which should trigger a fresh ClawSweeper review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • add mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The generic group reply change can affect visible Telegram group conversation memory and is suitable for a short Telegram proof run.

Label justifications:

  • P1: The PR addresses a user-facing claude-cli session-state regression where group-channel turns can lose prior conversation context.
  • merge-risk: 🚨 automation: The PR body's closing reference can cause GitHub automation to close the broader canonical issue before all documented triggers are fixed or split.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body reports focused Vitest output but no after-fix real claude-cli group-channel proof; the contributor should add redacted terminal/log/screenshot or recording proof in the PR body, which should trigger a fresh ClawSweeper review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The generic group reply change can affect visible Telegram group conversation memory and is suitable for a short Telegram proof run.
Evidence reviewed

PR surface:

Source +1, Tests +65. Total +66 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 3 2 +1
Tests 1 65 0 +65
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 68 2 +66

What I checked:

  • Current main reproduces the groupIntro hash drift from source: On current main, groupIntro is only built when isFirstTurnInSession or groupActivationNeedsSystemIntro is true, yet it is also included in extraSystemPromptStaticParts, so turn 1 and later turns can hash different static prompt bytes. (src/auto-reply/reply/get-reply-run.ts:572, 66b91d78feb3)
  • CLI session reuse invalidates on static prompt hash mismatch: prepareCliRunContext hashes extraSystemPromptStatic when present, then resolveCliSessionReuse returns invalidatedReason: "system-prompt" when the stored and current extraSystemPromptHash values differ. (src/agents/cli-session.ts:201, 66b91d78feb3)
  • PR patch preserves live groupIntro but removes it from static hash input: The diff deletes groupIntro only from extraSystemPromptStaticParts, leaving extraSystemPromptParts unchanged, and adds a regression test that checks first-turn live prompt injection plus stable static prompt across the second turn. (src/auto-reply/reply/get-reply-run.ts:629, 85acdd415334)
  • Latest release still contains the affected static hash input: The v2026.6.1 release commit still includes groupIntro in extraSystemPromptStaticParts, matching the reported shipped behavior and showing this PR is not obsolete on current release. (src/auto-reply/reply/get-reply-run.ts:612, 2e08f0f4221f)
  • Canonical issue has broader remaining triggers: The linked issue discussion includes later reports for subagent announce delivery and dmScope channel switching as separate extraSystemPromptHash false-invalidation triggers, while the PR body still uses closing syntax for that canonical issue.
  • Maintainer note context checked: The Telegram maintainer note asks for real Telegram proof when Telegram behavior touches reply context; this generic group reply change is Telegram-visible even though it is not Telegram-specific. (.agents/maintainer-notes/telegram.md:1, 66b91d78feb3)

Likely related people:

  • vincentkoc: Recent commits added the current split get-reply-run file and current CLI session/static prompt implementation visible in blame and history for the affected files. (role: recent area contributor; confidence: medium; commits: 9fb8d87f91f8, 2e08f0f4221f; files: src/auto-reply/reply/get-reply-run.ts, src/agents/cli-session.ts, src/agents/cli-runner/prepare.ts)
  • steipete: The older auto-reply pipeline split commit carried forward the groupIntro prompt composition path before the later static hash split. (role: feature history contributor; confidence: medium; commits: ea018a68ccb9; files: src/auto-reply/reply/get-reply-run.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. labels Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude CLI sessions reset on every turn in group channels due to groupIntro drift in extraSystemPromptHash

1 participant