Skip to content

fix(cli-runner): drop volatile systemPromptHash from claude-cli live fingerprint#81047

Open
benjamin1492 wants to merge 1 commit into
openclaw:mainfrom
benjamin1492:fix/cli-fingerprint-drop-volatile-system-prompt
Open

fix(cli-runner): drop volatile systemPromptHash from claude-cli live fingerprint#81047
benjamin1492 wants to merge 1 commit into
openclaw:mainfrom
benjamin1492:fix/cli-fingerprint-drop-volatile-system-prompt

Conversation

@benjamin1492

@benjamin1492 benjamin1492 commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #81041.

The buildClaudeLiveFingerprint helper in src/agents/cli-runner/claude-live-session.ts hashes context.systemPrompt as one of the keys deciding whether the active claude-cli subprocess is still valid. On chat channels (Telegram, Discord, Slack, iMessage, Signal, WhatsApp, etc.) OpenClaw injects per-turn-volatile content into that same systemPrompt — inbound metadata ([Tue 2026-05-12 09:33 EDT], message id, sender envelope), heartbeat poll text, channel guidance, runtime status banner. Every inbound turn produces a fresh hash, the fingerprint diverges, and the live session is torn down and rebuilt. The new subprocess has no memory of the prior turn, so the agent looks amnesiac to the user (e.g. references its own last message and gets "what are you talking about?").

extraSystemPromptHash already covers the static side of the system prompt (the agent's persona, SOUL.md, USER.md, AGENTS.md, etc.) via extraSystemPromptStatic — see prepare.ts:151. So removing the volatile systemPromptHash field doesn't weaken the integrity check; it just stops the fingerprint from rejecting valid live sessions on every turn.

Diff

-    systemPromptHash: sha256(params.context.systemPrompt),

The helper is also exported so the new unit test can exercise it directly (matches the pattern of resetClaudeLiveSessionsForTest already in the same module).

Test plan

New file src/agents/cli-runner/claude-live-session.fingerprint.test.ts covers:

  1. Two builds with different context.systemPrompt (per-turn volatile content) but everything else identical → fingerprints equal. This is the regression.
  2. Changing context.extraSystemPromptHash → fingerprints differ. Proves the static-config integrity check is still in force.
  3. Changing context.normalizedModel → fingerprints differ. Proves model swaps still rotate the session.
  4. Changing context.workspaceDir → fingerprints differ. Proves workspace identity still gates reuse.
pnpm test src/agents/cli-runner/claude-live-session.fingerprint.test.ts
 Test Files  1 passed (1)
      Tests  4 passed (4)

pnpm tsgo and pnpm check:test-types both pass.

Affected scope

  • claude-cli backend only. API backends (Anthropic Messages, OpenAI, OpenRouter, Vertex, Bedrock) do not maintain a fingerprinted persistent process and so are unaffected.
  • Multi-turn chat channels with per-turn metadata injection are the primary beneficiaries; one-shot CLI runs (openclaw run) don't expose the multi-turn symptom but are also covered.

Risk

The narrow theoretical risk is: if there exists a scenario where two distinct systemPrompt values intentionally need separate live sessions while sharing the same extraSystemPromptHash (and same model, workspace, auth, MCP, skills, argv, env), this change would let them share one subprocess. I searched the codebase for that case — systemPromptHash is referenced only at the one site removed here (no readers anywhere else in src/, extensions/, or test/), and extraSystemPromptHash is purpose-built (see prepare.ts:148-154) to capture the static portion of the system prompt while excluding per-message metadata, which is exactly the dimension callers need to fingerprint on. I believe no such scenario exists. If reviewers know of one I missed, please flag it.

No public API change (the helper was previously private; exporting it for tests is the only signature delta and is internal-only).


Real Behavior Proof

Behavior or issue addressed: Closes #81041. On the claude-cli backend with chat channels (Telegram/Discord/etc.), buildClaudeLiveFingerprint was hashing the entire system prompt including OpenClaw's per-turn-volatile metadata block (timestamps, inbound envelope, heartbeat text, channel guidance). Every turn produced a different hash, so the runtime decided the live CLI subprocess no longer matched the requested session and rotated to a fresh subprocess — losing all cross-turn context.

Real environment tested: Live OpenClaw 2026.5.7 install on Ubuntu 24.04 (Linux 6.17.0-23-generic x86_64), Node v22.22.0, claude-cli backend (Claude Max subscription), Telegram bot channel. Same change as proposed in this PR was first applied as a runtime patch to the bundled dist/claude-live-session-C0vmXU_W.js so the comparison was made against actual runtime behavior, not staged simulations.

Exact steps or command run after this patch:

  1. Apply the one-line removal directly to the bundled buildClaudeLiveFingerprint (matches this PR's diff verbatim — systemPromptHash field removed from the returned object).
  2. systemctl --user restart openclaw-gateway
  3. Resume normal Telegram chat use with the agent (Leonard) for ~75 minutes spanning ~30 turns of multi-step work (issue triage, fork/clone via gh CLI, GitHub API calls, drafting this PR description). Mix of short turns and long turns including some with multi-minute gaps.
  4. After the patch window, query the gateway journal for any cli/live-session restart events:
journalctl --user -u openclaw-gateway --since "2026-05-12 09:11" \
  | grep -E "(cli-backend|claude-live|live session)" \
  | grep -E "reset|close|restart|rotate|reason="

Evidence after fix: Redacted live runtime journal output, captured from journalctl --user -u openclaw-gateway on the actual install.

Pre-patch window (08:00–09:11 EDT, fingerprint bug active) — every Telegram turn that included new metadata triggered a fingerprint divergence, with downstream session restarts visible as reason=missing-transcript (companion bug #81042) cascading into reason=restart:

2026-05-12T08:13:06.295-04:00 [agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
2026-05-12T08:13:06.327-04:00 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=restart
2026-05-12T08:45:08.710-04:00 [agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
2026-05-12T08:46:26.340-04:00 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=restart
2026-05-12T08:51:32.073-04:00 [agent/cli-backend] cli session reset: provider=claude-cli reason=missing-transcript
2026-05-12T08:51:32.099-04:00 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=restart

Post-patch window (09:11+ EDT, this PR's change live in the bundle) — full output of the same journal query:

$ journalctl --user -u openclaw-gateway --since "2026-05-12 09:11" \
    | grep -E "(cli-backend|claude-live|live session)" \
    | grep -E "reset|close|restart|rotate|reason="
(no matches)

User-visible Telegram conversation excerpt confirming the symptom pre-patch (agent forgetting its own message from 6 minutes earlier in the same thread):

[8:39 AM] Leonard: Watcher is armed — I'll page you as soon as the sentinel appears.
[8:44 AM] User:    still working on it?
[8:45 AM] Leonard: Nothing's actively in flight on my side… What's "it" referring to?

The 8:45 turn corresponds 1:1 with the 08:45:08 reason=missing-transcript log line above.

Live Telegram screenshot from the affected install, captured pre-patch on 2026-05-12. The user asks "still working on it?" 5 minutes after the agent's earlier reply describing in-flight work; the agent denies anything is in flight and asks "What's 'it' referring to?". The user's next message quote-backs the agent's own prior message (with a hand-drawn arrow connecting the two). Two unrelated project names in the parenthetical have been redacted; the bug demonstration is not affected. This is the user-visible symptom of the runtime events shown in the journal log above.

Telegram screenshot showing claude-cli context loss between turns

Observed result after fix: Zero fingerprint-driven session restarts in the post-patch window. Cross-turn context is retained across multi-minute gaps in the same Telegram conversation. The agent now references its own messages from 5+ minutes earlier without losing the thread. This very PR description was drafted across multiple Telegram turns spanning >30 minutes, with all cross-turn references intact.

What was not tested: Other chat surfaces (Discord, Slack, iMessage, Signal, WhatsApp) — the bug analysis predicts they're also affected because they inject similar per-turn metadata, but only Telegram was reproduced and verified on the live runtime here. API-backend providers (Anthropic Messages, OpenAI, OpenRouter) were not tested because they don't go through buildClaudeLiveFingerprint.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 12, 2026
@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed May 29, 2026, 12:22 PM ET / 16:22 UTC.

Summary
The PR exports buildClaudeLiveFingerprint, removes the full system prompt hash from the Claude CLI live-session fingerprint, and adds focused Vitest coverage for volatile prompt reuse and remaining rotation keys.

PR surface: Source -1, Tests +119. Total +118 across 2 files.

Reproducibility: yes. The current source path hashes the full per-turn systemPrompt, and the PR body supplies live Telegram plus journal evidence showing resets before the change and no reset after a matching runtime patch; I did not rerun the live scenario myself.

Review metrics: 1 noteworthy metric.

  • Live-session reuse keys: 1 removed. The diff removes the only full-system-prompt key from the Claude live-process fingerprint, so maintainers need to review the replacement session-state boundary before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦞 diamond lobster
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Preserve static system-prompt invalidation while excluding only per-turn volatile metadata from the reuse key.
  • [P2] Add hadSessionFile and contextEngineConfig to the fingerprint test fixture and rerun the focused test plus test-type check.

Mantis proof suggestion
Independent Telegram proof would materially help confirm that the claude-cli live session keeps context across turns without hiding restart logs. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram desktop proof: verify claude-cli Telegram retains context across two turns after the fingerprint change and include any live-session restart logs.

Risk before merge

  • [P2] Merging as-is can keep a warm claude-cli process after static system prompt, config, bootstrap, skills, or hook-system-prompt changes, so existing users may get stale instructions until idle timeout or restart.
  • [P1] The added fingerprint test fixture is out of sync with the current PreparedCliRunContext contract and needs repair before repo-approved test type checks can validate the regression coverage.

Maintainer options:

  1. Preserve Static Prompt Invalidation (recommended)
    Change the fingerprint boundary so volatile per-turn metadata does not rotate the live process, but static system prompt, config, bootstrap, skills, and hook-system-prompt changes still do.
  2. Accept Warm-Session Stale Prompt Behavior
    Maintainers could intentionally accept that static prompt changes wait for idle timeout or restart, but that compatibility behavior should be explicit and proven before merge.
  3. Pause Until Session Boundary Is Settled
    If the right static-versus-volatile prompt boundary is not clear, pause this branch and resolve the live-session prompt contract before landing a partial fix.

Next step before merge

  • [P2] Needs maintainer decision on the correct static-versus-volatile system-prompt reuse boundary before an automated repair should land; the fixture fix alone is mechanical.

Security
Cleared: The diff does not change dependencies, workflows, credentials, permissions, or supply-chain surfaces; the stale-prompt concern is tracked as a functional session-state merge risk.

Review findings

  • [P1] Preserve static prompt invalidation — src/agents/cli-runner/claude-live-session.ts:329
  • [P2] Add the missing context fields to the fixture — src/agents/cli-runner/claude-live-session.fingerprint.test.ts:22-52
Review details

Best possible solution:

Keep the warm-session reuse fix, but replace the removed full prompt hash with a stable static prompt fingerprint that excludes per-turn inbound metadata while still rotating on static prompt/config/bootstrap/hook changes; then fix the fixture and rerun focused test plus test-type proof.

Do we have a high-confidence way to reproduce the issue?

Yes. The current source path hashes the full per-turn systemPrompt, and the PR body supplies live Telegram plus journal evidence showing resets before the change and no reset after a matching runtime patch; I did not rerun the live scenario myself.

Is this the best way to solve the issue?

No. Removing the entire full-system-prompt hash fixes the reported volatility but is too broad because reused live sessions do not receive updated system prompts; the safer fix is a static prompt fingerprint that excludes only per-turn metadata.

Full review comments:

  • [P1] Preserve static prompt invalidation — src/agents/cli-runner/claude-live-session.ts:329
    After this removal, the fingerprint no longer changes when the static parts of the full system prompt change. A reused Claude live session is started once with argv, and later turns only call writeTurnInput(prompt), so config, bootstrap, skills, hook system prompt, or other static instruction updates can be ignored until the process closes. Keep a prompt fingerprint that excludes only volatile inbound metadata, or otherwise send updated system instructions per turn.
    Confidence: 0.89
  • [P2] Add the missing context fields to the fixture — src/agents/cli-runner/claude-live-session.fingerprint.test.ts:22-52
    buildContext returns PreparedCliRunContext, but the returned object does not include required hadSessionFile and contextEngineConfig fields from types.ts. Add those fixture values, then rerun the focused fingerprint test and the repo-approved test type check.
    Confidence: 0.96

Overall correctness: patch is incorrect
Overall confidence: 0.9

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 21bcc0e94251.

Label changes

Label changes:

  • add merge-risk: 🚨 compatibility: Existing static prompt, config, bootstrap, or hook changes may stop rotating a live process until idle timeout or restart unless the PR adds a narrower static fingerprint.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes live runtime logs from a patched OpenClaw install showing pre-patch session resets and post-patch absence of reset events, plus a redacted Telegram screenshot of the user-visible symptom.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦞 diamond lobster and patch quality is 🧂 unranked krab.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets a user-visible claude-cli chat context-loss regression that can break real multi-turn channel workflows.
  • merge-risk: 🚨 session-state: Removing the full prompt hash changes when a warm Claude live process is reused and can leave stale session instructions attached to later turns.
  • merge-risk: 🚨 compatibility: Existing static prompt, config, bootstrap, or hook changes may stop rotating a live process until idle timeout or restart unless the PR adds a narrower static fingerprint.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦞 diamond lobster and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (logs): The PR body includes live runtime logs from a patched OpenClaw install showing pre-patch session resets and post-patch absence of reset events, plus a redacted Telegram screenshot of the user-visible symptom.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes live runtime logs from a patched OpenClaw install showing pre-patch session resets and post-patch absence of reset events, plus a redacted Telegram screenshot of the user-visible symptom.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The PR changes user-visible Telegram multi-turn memory behavior, which can be demonstrated in a short Telegram Desktop or live transcript proof.
Evidence reviewed

PR surface:

Source -1, Tests +119. Total +118 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 1 2 -1
Tests 1 119 0 +119
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 120 2 +118

What I checked:

Likely related people:

  • joshavant: Current blame on the Claude live-session fingerprint, prepared context type, and extra prompt hashing points to commit b5f8191887b9d47ea3c572372c0a7ac7ea45daf7. (role: recent area contributor; confidence: medium; commits: b5f8191887b9; files: src/agents/cli-runner/claude-live-session.ts, src/agents/cli-runner/prepare.ts, src/agents/cli-runner/types.ts)
  • steipete: History and shortlog show repeated cli-runner/session work, including the runner split and stale CLI session invalidation changes that shaped this area. (role: heavy adjacent contributor; confidence: high; commits: 48ae97633303, e5023cc141e2, c2f9de3935c3; files: src/agents/cli-runner/claude-live-session.ts, src/agents/cli-runner/prepare.ts, src/agents/cli-runner.ts)
  • vincentkoc: Recent history shows CLI runner type/auth refactors on the same prepared context and auth/session surfaces. (role: adjacent cli/auth contributor; confidence: medium; commits: 859eb0666282, 78288e37ed58, a0182574873e; files: src/agents/cli-runner/prepare.ts, src/agents/cli-runner/types.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added the mantis: telegram-visible-proof Mantis should capture Telegram visible proof. label May 12, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 12, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@benjamin1492 benjamin1492 marked this pull request as ready for review May 12, 2026 15:08
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: 🔥 warming; proof passed, review follow-up or readiness checks remain. Hatch with @clawsweeper hatch when eligible.

Rules and details

Hatchability:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

About:

  • Eggs appear after real-behavior proof passes. They are collectible flavor only.
  • Review momentum changes the shell state: follow-up work warms it, re-review makes it wobble, and a clean final review lets it hatch.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 28, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 28, 2026
@RomneyDa

Copy link
Copy Markdown
Member

Heads up: this PR needs to be updated against current main before the new required Dependency Guard check can pass.

@benjamin1492 benjamin1492 force-pushed the fix/cli-fingerprint-drop-volatile-system-prompt branch from 79ddc27 to 9daea40 Compare May 29, 2026 16:13
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: systemPromptHash in buildClaudeLiveFingerprint causes phantom claude-cli session restarts on every turn for chat channels

2 participants