Skip to content

fix(agents): adopt rotated session id and file after embedded compaction rotation#88106

Closed
coder999999999 wants to merge 2 commits into
openclaw:mainfrom
coder999999999:fix/88040-session-rotation
Closed

fix(agents): adopt rotated session id and file after embedded compaction rotation#88106
coder999999999 wants to merge 2 commits into
openclaw:mainfrom
coder999999999:fix/88040-session-rotation

Conversation

@coder999999999

@coder999999999 coder999999999 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

When the embedded agent runner hits its context limit mid-run, auto-compaction with truncateAfterCompaction rotates the transcript to a brand-new session id + file (compaction-successor-transcript.ts mints a fresh randomUUID(); run/attempt.ts updates sessionIdUsed/sessionFileUsed; run.ts surfaces both on result.meta.agentMeta).

But updateSessionStoreAfterAgentRun persisted the pre-rotation sessionId and never wrote sessionFile. So after a rotation sessions.json kept pointing at the stale pre-rotation transcript. Because resolveSessionTranscriptFile prefers the stored sessionEntry.sessionFile, the next turn reopened the stale file and blocked on its write lock — the deadlock / crash-loop reported in #88040.

Fix

updateSessionStoreAfterAgentRun now detects a rotation and adopts the new identity (id and file):

  • Rotation is recognized only when agentMeta.sessionFile is present and agentMeta.sessionId differs from the run's sessionId.
  • sessionStartedAt resets when the id actually changes (a rotated transcript is a new session).

Why the guard matters (CLI safety): only the embedded runner sets agentMeta.sessionFile. For CLI providers, agentMeta.sessionId is the external CLI tool's session id (see cli-runner.ts) and sessionFile is unset — so the guard never rewrites a CLI session's openclaw identity. A naive agentMeta.sessionId ?? sessionId adoption would corrupt every CLI run's session id and still leave sessionFile stale; this fix avoids both.

Linked context

Closes #88040

Real behavior proof (required for external PRs)

Behavior addressed: after embedded auto-compaction rotated the transcript to a new session id/file, sessions.json kept the pre-rotation id and stale sessionFile, so the next turn blocked on the stale transcript's write lock (deadlock). After this patch the on-disk store adopts the rotated id + file; CLI session identity is left untouched.

Real environment tested: local macOS checkout, branch fix/88040-session-rotation based on current upstream/main. The proof drives the real production function updateSessionStoreAfterAgentRun against a real on-disk session store (temp dir, no store mock) with a result.meta.agentMeta carrying the rotated sessionId+sessionFile the embedded runner emits after rotation, then reloads sessions.json from disk to assert the persisted contents. A second case feeds a CLI-shaped agentMeta (external sessionId, no sessionFile) to prove the openclaw id is preserved.

Exact steps or command run after this patch: I isolated the fix by reverting only the production file to unpatched upstream/main while keeping the new tests, ran the production code path with node, then restored the fix and re-ran. Commands:

git checkout upstream/main -- src/agents/command/session-store.ts
node scripts/run-vitest.mjs src/agents/command/session-store.test.ts -t "rotat"
git checkout HEAD -- src/agents/command/session-store.ts
node scripts/run-vitest.mjs src/agents/command/session-store.test.ts

Evidence after fix: copied live console output from node scripts/run-vitest.mjs. Against the unpatched production file the run reproduces the deadlock condition (the on-disk store keeps the stale identity); with the fix restored it resolves —

UNPATCHED production file (bug reproduces):
  FAIL  updateSessionStoreAfterAgentRun > adopts the rotated session id and file after embedded auto-compaction rotation
  AssertionError: expected 'pre-rotation-session' to be 'post-rotation-session'
  Expected: "post-rotation-session"
  Received: "pre-rotation-session"   <- sessions.json kept the stale pre-rotation identity and sessionFile

PATCHED production file (this branch):
  Test Files  2 passed (2)
  Tests  70 passed (70)

Observed result after fix: with the fix restored, the node scripts/run-vitest.mjs run shows the reloaded on-disk sessions.json adopts the rotated sessionId and sessionFile (no stale pointer, so the next turn does not block on the stale transcript's write lock). The CLI case keeps sessionId equal to the openclaw id with sessionFile unchanged and records the external tool id under cliSessionIds.

What was not tested: an end-to-end live agent run driven all the way to an in-flight context-limit compaction rotation (that requires a long real model run, infeasible offline). The proof instead exercises the identical post-rotation production path (updateSessionStoreAfterAgentRun writing/reloading a real sessions.json) with the exact agentMeta shape the embedded runner emits after rotation.

Tests and validation

  • node scripts/run-vitest.mjs src/agents/command/session-store.test.ts -> 70 passed (2 new: embedded-rotation adoption of id+file with sessionStartedAt reset; CLI run preserves openclaw id and leaves sessionFile untouched).
  • Before/after isolation above: the rotation test fails on the unpatched production file and passes with the fix.
  • oxfmt --check clean, oxlint clean, git diff --check clean, tsgo:core exit 0, tsgo:core:test exit 0.

Note on #88055

I opened this after noticing #88055 targets the same issue with a one-line sessionId: result?.meta?.agentMeta?.sessionId ?? sessionId at the call site. That approach has two problems this PR avoids: (1) for CLI providers agentMeta.sessionId is the external tool's id, so it would overwrite the openclaw session id on every CLI run; (2) it never updates sessionFile, so resolveSessionTranscriptFile still resolves the stale transcript and the deadlock persists. Happy to defer to maintainers on which to take.

…ion rotation

Embedded auto-compaction with truncateAfterCompaction rotates the
transcript to a new session id + file mid-run. updateSessionStoreAfterAgentRun
previously persisted the pre-rotation sessionId and never updated sessionFile,
so sessions.json kept pointing at the stale transcript whose write lock the
next turn blocked on (deadlock / crash loop).

Adopt the rotated identity, gated on agentMeta.sessionFile being present so
CLI runs (where agentMeta.sessionId is the external tool's id and sessionFile
is unset) never have their openclaw session identity rewritten.

Closes openclaw#88040

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 29, 2026 18:52
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 29, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR ensures the session store tracks embedded-run transcript rotation (session id + file) caused by auto-compaction, preventing sessions.json from referencing a stale, lock-held transcript file.

Changes:

  • Detect embedded auto-compaction rotation from agentMeta.sessionId + agentMeta.sessionFile and persist the rotated identity in the session store entry.
  • Reset sessionStartedAt when the session identity changes due to rotation.
  • Add tests covering embedded rotation adoption and ensuring CLI runs don’t overwrite the OpenClaw session id.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/agents/command/session-store.ts Adopts rotated session id/file for embedded-run transcript rotation and updates start timestamp behavior.
src/agents/command/session-store.test.ts Adds test coverage for rotation adoption and verifies CLI runs keep the OpenClaw session identity.

Comment on lines +121 to +132
// Embedded auto-compaction with `truncateAfterCompaction` rotates the transcript
// to a new session id + file mid-run. Adopt the rotated identity so sessions.json
// stops pointing at the stale pre-rotation file, whose write lock the next turn
// would otherwise block on. Only the embedded runner sets agentMeta.sessionFile;
// CLI runs leave it unset and reuse agentMeta.sessionId for the external tool's
// id, so this guard never rewrites a CLI session's openclaw identity.
const rotatedSessionId = normalizeOptionalString(result.meta.agentMeta?.sessionId);
const rotatedSessionFile = normalizeOptionalString(result.meta.agentMeta?.sessionFile);
const sessionRotated = Boolean(
rotatedSessionFile && rotatedSessionId && rotatedSessionId !== sessionId,
);
const effectiveSessionId = sessionRotated && rotatedSessionId ? rotatedSessionId : sessionId;
Comment on lines +151 to +153
if (sessionRotated && rotatedSessionFile) {
next.sessionFile = rotatedSessionFile;
}
Comment on lines +129 to +132
const sessionRotated = Boolean(
rotatedSessionFile && rotatedSessionId && rotatedSessionId !== sessionId,
);
const effectiveSessionId = sessionRotated && rotatedSessionId ? rotatedSessionId : sessionId;
Comment on lines +230 to +240
await updateSessionStoreAfterAgentRun({
cfg,
sessionId,
sessionKey,
storePath,
contextTokensOverride: 200_000,
sessionStore,
defaultProvider: "openai",
defaultModel: "gpt-5.5",
result,
});
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 3:19 PM ET / 19:19 UTC.

Summary
The PR updates updateSessionStoreAfterAgentRun to adopt embedded compaction's rotated session id/file and adds session-store regression tests for normal, CLI, and preserved-state cases.

PR surface: Source +27, Tests +173. Total +200 across 2 files.

Reproducibility: yes. for source-level reproduction: current main rotates embedded compaction to a new id/file and reports both through agentMeta, while the existing session-store update path persists the caller id. I did not establish a live in-flight compaction repro in this read-only review.

Review metrics: 2 noteworthy metrics.

  • Persisted session identity fields: 3 changed: sessionId, sessionFile, sessionStartedAt. These fields determine which transcript future turns and session views attach to, so session-family behavior needs explicit review before merge.
  • Rotation regression cases: 3 added. The tests now cover normal embedded rotation, CLI id preservation, and preserved-state heartbeat rotation.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add usage-family persistence and regression assertions for rotated embedded sessions, including preserve mode.
  • [P1] Add after-fix live or diagnostic proof that a real embedded compaction rotation leaves sessions.json on the rotated id/file.
  • Refresh or explain the currently failing checks once they are not attributable to unrelated base churn.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides copied terminal output from targeted Vitest runs against a temp on-disk store, but external PR proof still needs a live or diagnostic embedded compaction run after the patch with private paths, IPs, tokens, phone numbers, and other private data redacted; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The PR adopts a new persisted embedded session id/file but does not update usageFamilyKey or usageFamilySessionIds, so family usage/list views can split the pre-rotation transcript from the current session.
  • [P1] Real behavior proof is still a targeted Vitest run with constructed agentMeta, not a live or diagnostic embedded compaction rotation through the full agent runner path.
  • [P1] The exact head currently has failing checks, although the exposed annotations point outside the touched agents files.

Maintainer options:

  1. Complete rotation persistence before merge (recommended)
    Add usageFamilyKey and usageFamilySessionIds when sessionRotated is true, carry those fields through the preserved-state patch, and add the matching regression assertion before merge.
  2. Accept the narrow deadlock repair
    Maintainers could deliberately land the id/file repair now and track the usage-family split as follow-up if the deadlock risk outweighs the dashboard/session-history risk.
  3. Replace both open fix attempts
    If maintainers want one consolidated branch, pause this PR and the sibling one-line PR in favor of a current-main replacement covering id, file, CLI safety, preserve-state writes, usage family, and proof.

Next step before merge

  • [P1] The PR needs contributor or maintainer follow-up for real embedded-compaction proof plus a small session-family repair before merge, so it is not a safe repair-lane candidate while the proof gate remains mock-only.

Security
Cleared: The diff only changes TypeScript session-store logic and colocated tests; I found no concrete security or supply-chain regression.

Review findings

  • [P2] Carry usage family across embedded rotations — src/agents/command/session-store.ts:151-153
Review details

Best possible solution:

Land a current-main patch that adopts the rotated id/file for normal and preserved-state runs, preserves usage family across the old and new session ids, and includes real diagnostic embedded-compaction proof.

Do we have a high-confidence way to reproduce the issue?

Yes for source-level reproduction: current main rotates embedded compaction to a new id/file and reports both through agentMeta, while the existing session-store update path persists the caller id. I did not establish a live in-flight compaction repro in this read-only review.

Is this the best way to solve the issue?

No: the PR is in the right helper and now covers preserved-state writes, but it should also preserve usage-family fields across the old and new session ids and still needs real embedded-run proof.

Full review comments:

  • [P2] Carry usage family across embedded rotations — src/agents/command/session-store.ts:151-153
    When sessionRotated is true this branch rewrites the store entry to the new session id/file, but it never records the old and new ids in usageFamilySessionIds. The sibling CLI compaction path does this on session-id changes, and usage family grouping relies on those ids to fold the old transcript into the current session instead of showing it as a separate unnamed session. Please update the rotation patch, including the preserved-state metadata path, and cover it in the regression test.
    Confidence: 0.8

Overall correctness: patch is incorrect
Overall confidence: 0.82

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a39c2d784efd.

Label changes

Label justifications:

  • P2: This is a bounded but real session-store fix for embedded compaction rotations that can affect future turns after transcript rotation.
  • merge-risk: 🚨 session-state: The PR rewrites persisted session identity and currently omits the usage-family state that sibling session-id rotation paths maintain.
  • merge-risk: 🚨 availability: The touched path is the recovery path for stale transcript locks after compaction, so an incomplete merge can leave the deadlock class insufficiently proven.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides copied terminal output from targeted Vitest runs against a temp on-disk store, but external PR proof still needs a live or diagnostic embedded compaction run after the patch with private paths, IPs, tokens, phone numbers, and other private data redacted; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +27, Tests +173. Total +200 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 30 3 +27
Tests 1 173 0 +173
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 203 3 +200

What I checked:

Likely related people:

  • shakkernerd: Blame and git history tie the current session-store helper plus embedded compaction runner files to the agent code move and current implementation shape. (role: feature-history owner; confidence: medium; commits: a6df6838b9c3; files: src/agents/command/session-store.ts, src/agents/embedded-agent-runner/compaction-successor-transcript.ts, src/agents/embedded-agent-runner/run.ts)
  • steipete: Recent commits changed session-store maintenance and single-entry persistence around the same helper and merge result surface. (role: recent area contributor; confidence: high; commits: 1ca7f5c0a057, 467b068fdc7b; files: src/agents/command/session-store.ts, src/config/sessions/store.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 29, 2026
ClawSweeper review of openclaw#88106 flagged that the preserveUserFacingSessionModelState
(heartbeat) path builds metadataPatch from only updatedAt/lastInteractionAt, so a
rotated sessionId/sessionFile assigned on `next` never reached mergeSessionEntry.
A heartbeat turn runs through the embedded runner and can hit a compaction
rotation, so that path would reintroduce the openclaw#88040 deadlock.

Carry the rotated identity (sessionId + sessionFile + sessionStartedAt) into the
preserve-path patch too: a physical transcript rotation is filesystem identity,
not user-facing model state. Add a preserved-state rotation regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coder999999999

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Addressed the needs-human finding in 40e58f9: the preserveUserFacingSessionModelState (heartbeat) path built metadataPatch from only updatedAt/lastInteractionAt, so the rotated sessionId/sessionFile assigned on next never reached mergeSessionEntry. A heartbeat turn runs through the embedded runner and can hit a compaction rotation, so that path would have reintroduced the #88040 deadlock.

The rotated identity (sessionId + sessionFile + sessionStartedAt) is now carried into the preserve-path patch as well — a physical transcript rotation is filesystem identity, not user-facing model state. Added a preserved-state rotation regression test (the missing coverage you flagged).

Validation: node scripts/run-vitest.mjs src/agents/command/session-store.test.ts → 72 passed; oxfmt --check, oxlint, tsgo:core, tsgo:core:test all green.

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@coder999999999

Copy link
Copy Markdown
Contributor Author

Closing as a duplicate — #88040 was fixed on main by #88055 (commit 9601172, "fix(agents): preserve rotated compaction session identity"), which merged while this was in review.

The landed fix covers the same rotated-identity adoption plus usage-family tracking, and handles the CLI-vs-embedded session-id distinction. No reason to carry a second, now-conflicting change for the same issue. Thanks @1052326311 — closing in favor of the merged fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: sessions.json stale after compaction transcript rotation (deadlock)

2 participants