fix(cli-runner): drop volatile systemPromptHash from claude-cli live fingerprint#81047
Conversation
|
Codex review: found issues before merge. Reviewed May 29, 2026, 12:22 PM ET / 16:22 UTC. Summary PR surface: Source -1, Tests +119. Total +118 across 2 files. Reproducibility: yes. The current source path hashes the full per-turn Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the warm-session reuse fix, but replace the removed full prompt hash with a stable static prompt fingerprint that excludes per-turn inbound metadata while still rotating on static prompt/config/bootstrap/hook changes; then fix the fixture and rerun focused test plus test-type proof. Do we have a high-confidence way to reproduce the issue? Yes. The current source path hashes the full per-turn Is this the best way to solve the issue? No. Removing the entire full-system-prompt hash fixes the reported volatility but is too broad because reused live sessions do not receive updated system prompts; the safer fix is a static prompt fingerprint that excludes only per-turn metadata. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 21bcc0e94251. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source -1, Tests +119. Total +118 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
This pull request has been automatically marked as stale due to inactivity. |
|
ClawSweeper PR egg: 🔥 warming; proof passed, review follow-up or readiness checks remain. Hatch with Rules and detailsHatchability:
About:
|
|
Heads up: this PR needs to be updated against current |
79ddc27 to
9daea40
Compare
Summary
Closes #81041.
The
buildClaudeLiveFingerprinthelper insrc/agents/cli-runner/claude-live-session.tshashescontext.systemPromptas one of the keys deciding whether the active claude-cli subprocess is still valid. On chat channels (Telegram, Discord, Slack, iMessage, Signal, WhatsApp, etc.) OpenClaw injects per-turn-volatile content into that samesystemPrompt— inbound metadata ([Tue 2026-05-12 09:33 EDT], message id, sender envelope), heartbeat poll text, channel guidance, runtime status banner. Every inbound turn produces a fresh hash, the fingerprint diverges, and the live session is torn down and rebuilt. The new subprocess has no memory of the prior turn, so the agent looks amnesiac to the user (e.g. references its own last message and gets "what are you talking about?").extraSystemPromptHashalready covers the static side of the system prompt (the agent's persona, SOUL.md, USER.md, AGENTS.md, etc.) viaextraSystemPromptStatic— seeprepare.ts:151. So removing the volatilesystemPromptHashfield doesn't weaken the integrity check; it just stops the fingerprint from rejecting valid live sessions on every turn.Diff
- systemPromptHash: sha256(params.context.systemPrompt),The helper is also exported so the new unit test can exercise it directly (matches the pattern of
resetClaudeLiveSessionsForTestalready in the same module).Test plan
New file
src/agents/cli-runner/claude-live-session.fingerprint.test.tscovers:context.systemPrompt(per-turn volatile content) but everything else identical → fingerprints equal. This is the regression.context.extraSystemPromptHash→ fingerprints differ. Proves the static-config integrity check is still in force.context.normalizedModel→ fingerprints differ. Proves model swaps still rotate the session.context.workspaceDir→ fingerprints differ. Proves workspace identity still gates reuse.pnpm tsgoandpnpm check:test-typesboth pass.Affected scope
openclaw run) don't expose the multi-turn symptom but are also covered.Risk
The narrow theoretical risk is: if there exists a scenario where two distinct
systemPromptvalues intentionally need separate live sessions while sharing the sameextraSystemPromptHash(and same model, workspace, auth, MCP, skills, argv, env), this change would let them share one subprocess. I searched the codebase for that case —systemPromptHashis referenced only at the one site removed here (no readers anywhere else insrc/,extensions/, ortest/), andextraSystemPromptHashis purpose-built (seeprepare.ts:148-154) to capture the static portion of the system prompt while excluding per-message metadata, which is exactly the dimension callers need to fingerprint on. I believe no such scenario exists. If reviewers know of one I missed, please flag it.No public API change (the helper was previously private; exporting it for tests is the only signature delta and is internal-only).
Real Behavior Proof
Behavior or issue addressed: Closes #81041. On the
claude-clibackend with chat channels (Telegram/Discord/etc.),buildClaudeLiveFingerprintwas hashing the entire system prompt including OpenClaw's per-turn-volatile metadata block (timestamps, inbound envelope, heartbeat text, channel guidance). Every turn produced a different hash, so the runtime decided the live CLI subprocess no longer matched the requested session and rotated to a fresh subprocess — losing all cross-turn context.Real environment tested: Live OpenClaw 2026.5.7 install on Ubuntu 24.04 (
Linux 6.17.0-23-generic x86_64), Node v22.22.0, claude-cli backend (Claude Max subscription), Telegram bot channel. Same change as proposed in this PR was first applied as a runtime patch to the bundleddist/claude-live-session-C0vmXU_W.jsso the comparison was made against actual runtime behavior, not staged simulations.Exact steps or command run after this patch:
buildClaudeLiveFingerprint(matches this PR's diff verbatim —systemPromptHashfield removed from the returned object).systemctl --user restart openclaw-gatewayEvidence after fix: Redacted live runtime journal output, captured from
journalctl --user -u openclaw-gatewayon the actual install.Pre-patch window (08:00–09:11 EDT, fingerprint bug active) — every Telegram turn that included new metadata triggered a fingerprint divergence, with downstream session restarts visible as
reason=missing-transcript(companion bug #81042) cascading intoreason=restart:Post-patch window (09:11+ EDT, this PR's change live in the bundle) — full output of the same journal query:
User-visible Telegram conversation excerpt confirming the symptom pre-patch (agent forgetting its own message from 6 minutes earlier in the same thread):
The 8:45 turn corresponds 1:1 with the
08:45:08 reason=missing-transcriptlog line above.Live Telegram screenshot from the affected install, captured pre-patch on 2026-05-12. The user asks "still working on it?" 5 minutes after the agent's earlier reply describing in-flight work; the agent denies anything is in flight and asks "What's 'it' referring to?". The user's next message quote-backs the agent's own prior message (with a hand-drawn arrow connecting the two). Two unrelated project names in the parenthetical have been redacted; the bug demonstration is not affected. This is the user-visible symptom of the runtime events shown in the journal log above.
Observed result after fix: Zero fingerprint-driven session restarts in the post-patch window. Cross-turn context is retained across multi-minute gaps in the same Telegram conversation. The agent now references its own messages from 5+ minutes earlier without losing the thread. This very PR description was drafted across multiple Telegram turns spanning >30 minutes, with all cross-turn references intact.
What was not tested: Other chat surfaces (Discord, Slack, iMessage, Signal, WhatsApp) — the bug analysis predicts they're also affected because they inject similar per-turn metadata, but only Telegram was reproduced and verified on the live runtime here. API-backend providers (Anthropic Messages, OpenAI, OpenRouter) were not tested because they don't go through
buildClaudeLiveFingerprint.