Skip to content

fix: refresh prompt fence after compaction writes#90775

Merged
jalehman merged 1 commit into
mainfrom
openclaw-62b/issue-90729-compaction-fence
Jun 6, 2026
Merged

fix: refresh prompt fence after compaction writes#90775
jalehman merged 1 commit into
mainfrom
openclaw-62b/issue-90729-compaction-fence

Conversation

@jalehman

@jalehman jalehman commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fixes embedded attempts that falsely report session takeover after OpenClaw-owned auto-compaction writes a type: "compaction" session entry while the prompt fence is released.
  • Threads compaction persistence through the existing guarded SessionManager owned-write callback path, matching the existing message-persisted fence refresh behavior.
  • Keeps external raw compaction-looking JSONL edits rejected; this does not broaden the benign rewrite whitelist.
  • Reviewers should focus on the ownership boundary in guardSessionManager and the session-lock regression coverage.

Linked context

Closes #90729

Related #88348 and #88653 for prior benign rewrite handling; those do not cover same-file owned compaction appends.

Requested by Josh in this maintainer thread.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: OpenClaw-owned compaction appends no longer trip EmbeddedAttemptSessionTakeoverError when the embedded prompt fence is released.
  • Real environment tested: Blacksmith Testbox through Crabbox, provider blacksmith-testbox.
  • Exact steps or command run after this patch: node scripts/crabbox-wrapper.mjs run --provider blacksmith-testbox -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed
  • Evidence after fix: Testbox tbx_01ktcyq66htgrbs2yer89p0s7f, Actions run https://github.com/openclaw/openclaw/actions/runs/27043607767, exit 0.
  • Observed result after fix: changed gate selected core, coreTests; core typecheck, core test typecheck, lint on changed files, import-cycle guard, and related runtime guards passed.
  • What was not tested: no live provider turn was run after the patch; this change is covered by focused session-lock regression tests and remote type/lint/guard proof.
  • Proof limitations or environment constraints: direct AWS Crabbox was blocked earlier by coordinator auth, so the broad remote proof used delegated Blacksmith Testbox through the Crabbox wrapper.
  • Before evidence: the original issue was reproduced before the patch on Testbox tbx_01ktcnjft8yk11w7ar9mdnv1yr, Actions run https://github.com/openclaw/openclaw/actions/runs/27036822896, with EmbeddedAttemptSessionTakeoverError.

Tests and validation

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts -- --reporter=verbose
  • node scripts/run-vitest.mjs src/agents/session-tool-result-guard.test.ts src/agents/session-tool-result-guard.transcript-events.test.ts -- --reporter=verbose
  • git diff --check
  • node scripts/crabbox-wrapper.mjs run --provider blacksmith-testbox -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed
  • .agents/skills/autoreview/scripts/autoreview --mode local

Regression coverage added:

  • Owned SessionManager.appendCompaction(...) refreshes the prompt fence before post-prompt cleanup reacquires the session lock.
  • Unowned external raw type: "compaction" JSONL appends still throw EmbeddedAttemptSessionTakeoverError.

Failed before fix:

  • The RED regression failed with EmbeddedAttemptSessionTakeoverError before the compaction persistence hook was wired.
  • Autoreview initially found the regression fixture was too weak after a test simplification; the fixture now seeds a persisted assistant entry before compaction and the final autoreview is clean.

Risk checklist

Did user-visible behavior change? (Yes/No)

Yes. Auto-compaction during embedded attempts should no longer produce a false takeover error.

Did config, environment, or migration behavior change? (Yes/No)

No.

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

No.

What is the highest-risk area?

The session-file ownership boundary while an embedded attempt has released the prompt lock.

How is that risk mitigated?

The hook only marks writes made through the guarded SessionManager.appendCompaction path as owned. The negative test proves raw external compaction-looking writes remain takeover errors.

Current review state

What is the next action?

Review and CI.

What is still waiting on author, maintainer, CI, or external proof?

Waiting on PR CI after creation.

Which bot or reviewer comments were addressed?

Local autoreview P3 finding about a weak compaction persistence fixture was accepted and fixed. Final autoreview reported no accepted/actionable findings.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Jun 5, 2026
@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 5, 2026, 7:59 PM ET / 23:59 UTC.

Summary
The branch threads SessionManager.appendCompaction(...) through the guarded session-manager path so owned compaction appends can refresh the embedded prompt fence, with positive and negative session-lock regression coverage.

PR surface: Source +97, Tests +270. Total +367 across 5 files.

Reproducibility: yes. source inspection gives a clear reproduction path: current main refreshes the prompt fence for message persistence but not for appendCompaction while the prompt lock is released. The linked report and PR regression describe the failure, but I did not run tests in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted live provider, CLI, or transport proof that threshold auto-compaction completes without a takeover error.
  • Get an explicit maintainer proof override if the focused Testbox regression is considered sufficient.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides before/after Testbox and regression-test proof, but no live provider, CLI, or transport run after the patch; add redacted proof and update the PR body to trigger re-review, or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • [P1] Contributor proof is still remote Testbox and focused regression-test proof; no live provider, CLI, or transport run after the patch shows threshold auto-compaction completing without EmbeddedAttemptSessionTakeoverError.
  • [P1] The diff changes the session takeover boundary, so maintainers should explicitly accept that only guarded SessionManager.appendCompaction(...) writes are recorded as owned.

Maintainer options:

  1. Require live compaction proof or override (recommended)
    Ask for redacted live provider, CLI, or transport output showing threshold auto-compaction completes without a takeover error, or have a maintainer explicitly override the proof gate.
  2. Accept focused regression coverage
    Maintainers can intentionally accept the Testbox before/after regression plus negative external-edit coverage as enough for this protected boundary.
  3. Pause for owner review
    If proof cannot be added, keep the PR paused until a session-boundary owner reviews the owned-write hook and validates the security boundary.

Next step before merge

  • [P1] No automated repair is needed; the remaining action is maintainer review and proof handling for a protected session-state/security-boundary PR with test-only real behavior proof.

Security
Cleared: No concrete security or supply-chain defect was found in the diff; the takeover-boundary sensitivity remains a maintainer merge-risk decision.

Review details

Best possible solution:

Land the narrow guarded compaction-owned write path after maintainer review plus redacted live compaction proof or an explicit proof override, while preserving rejection of raw external session edits.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection gives a clear reproduction path: current main refreshes the prompt fence for message persistence but not for appendCompaction while the prompt lock is released. The linked report and PR regression describe the failure, but I did not run tests in this read-only review.

Is this the best way to solve the issue?

Yes, the proposed shape is the best fix I found: it threads compaction persistence through the existing guarded SessionManager owned-write path and validates the exact append before trusting it. Broadening the benign rewrite whitelist would be a weaker security-boundary tradeoff.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a4f7e4cbb946.

Label changes

Label justifications:

  • P1: The PR targets a false takeover error that can kill an active embedded agent turn during threshold auto-compaction.
  • merge-risk: 🚨 session-state: The diff changes how owned compaction writes refresh the prompt fence for session JSONL state during an embedded attempt.
  • merge-risk: 🚨 security-boundary: The guard distinguishes OpenClaw-owned session writes from external raw edits, so an overly broad whitelist would affect takeover protection.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides before/after Testbox and regression-test proof, but no live provider, CLI, or transport run after the patch; add redacted proof and update the PR body to trigger re-review, or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +97, Tests +270. Total +367 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 4 98 1 +97
Tests 1 270 0 +270
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 5 368 1 +367

What I checked:

Likely related people:

  • steipete: The available all-ref shortlog shows Peter Steinberger as the most frequent contributor across the touched session-lock and guard files. (role: heavy adjacent area contributor; confidence: medium; commits: 0c278bb93c2d, 5586b3fd19cd, ee03ade0d63f; files: src/agents/embedded-agent-runner/run/attempt.session-lock.ts, src/agents/embedded-agent-runner/run/attempt.ts, src/agents/session-tool-result-guard.ts)
  • obviyus: Current main blame for the session-lock fence, guarded session-manager call site, and session tool guard points to Ayaan Zaidi's recent commit. (role: recent area contributor; confidence: medium; commits: 7c885528ba25; files: src/agents/embedded-agent-runner/run/attempt.session-lock.ts, src/agents/embedded-agent-runner/run/attempt.ts, src/agents/session-tool-result-guard.ts)
  • Takhoffman: Recent history on the touched files includes context-window and tool-result truncation work that shares the session/context boundary involved here. (role: adjacent context-window contributor; confidence: medium; commits: 4f00b769251d, 7fc1a74ee9e3; files: src/agents/embedded-agent-runner/run/attempt.ts, src/agents/session-tool-result-guard.ts)
  • jalehman: Beyond authoring this PR, Josh Lehman has prior merged history on context engine transcript maintenance touching this area. (role: related prior feature contributor; confidence: medium; commits: 751d5b7849ca, 054ea49f5754; files: src/agents/embedded-agent-runner/run/attempt.ts, src/agents/session-tool-result-guard.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 5, 2026
@jalehman jalehman force-pushed the openclaw-62b/issue-90729-compaction-fence branch from 327a3e1 to 054ea49 Compare June 5, 2026 23:52
@jalehman

jalehman commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Updated proof for the current head SHA 054ea49f57548eeac9d213a7cbf1a0744b2fd2d2.

Behavior addressed: owned auto-compaction session-manager appends no longer trip embedded prompt-fence takeover detection after the prompt lock is released, while unowned or interleaved external session-file edits are still rejected.

Real environment tested: Blacksmith Testbox through Crabbox on Linux.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts -- --reporter=verbose
  • node scripts/run-vitest.mjs src/agents/session-tool-result-guard.test.ts src/agents/session-tool-result-guard.transcript-events.test.ts -- --reporter=verbose
  • .agents/skills/autoreview/scripts/autoreview --mode local
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • Crabbox/Testbox runtime proof using production modules: SessionManager.open, guardSessionManager, createEmbeddedAttemptSessionLockController, and acquireSessionWriteLock.

Evidence after fix:

  • Testbox provider/id: blacksmith-testbox, tbx_01ktd3ckct31kq04bxw1500chg
  • Actions run: https://github.com/openclaw/openclaw/actions/runs/27046304204
  • Remote proof SHA: 054ea49f57548eeac9d213a7cbf1a0744b2fd2d2
  • Remote proof output:
    • LIVE_PROOF owned-compaction: pass
    • LIVE_PROOF interleaved-external: pass
    • LIVE_PROOF other-owned-writer: pass
    • LIVE_PROOF result: pass

Observed result after fix: focused session-lock regression tests passed (49 passed), guard tests passed (35 passed and 4 passed), both autoreview passes reported no accepted/actionable findings, and the Testbox runtime proof exited 0.

What was not tested: this proof does not drive a full provider/model embedded agent run all the way through threshold-triggered auto-compaction. It directly exercises the production session manager, compaction guard, embedded lock controller, and session write-lock paths that own the issue invariant.

@jalehman

jalehman commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Land-ready verification for head 054ea49f57548eeac9d213a7cbf1a0744b2fd2d2.

  • PR state: mergeable and clean against main.
  • GitHub checks: passing or intentionally skipped as of the latest gh pr checks watch.
  • Focused local tests:
    • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts -- --reporter=verbose -> 49 passed
    • node scripts/run-vitest.mjs src/agents/session-tool-result-guard.test.ts src/agents/session-tool-result-guard.transcript-events.test.ts -- --reporter=verbose -> 35 passed and 4 passed
  • Formatting sanity: git diff --check -> clean.
  • Structured review:
    • .agents/skills/autoreview/scripts/autoreview --mode local -> clean, no accepted/actionable findings.
    • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main -> clean, no accepted/actionable findings.
  • Remote runtime proof: Blacksmith Testbox through Crabbox, tbx_01ktd3ckct31kq04bxw1500chg, Actions run https://github.com/openclaw/openclaw/actions/runs/27046304204, exit 0.

Known proof scope: the Testbox proof directly exercises the production session manager, compaction guard, embedded lock controller, and session write-lock paths. It does not run a full live provider/model turn through threshold auto-compaction.

@jalehman jalehman merged commit bbfe8cc into main Jun 6, 2026
169 of 172 checks passed
@jalehman jalehman deleted the openclaw-62b/issue-90729-compaction-fence branch June 6, 2026 00:05
@jalehman

jalehman commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Landed via squash merge onto main.

  • Head commit before merge: 054ea49f57548eeac9d213a7cbf1a0744b2fd2d2
  • Merge commit: bbfe8ccaf60e87e84a3c695e56f4bee7c687c5b0
  • Gate: latest PR checks passed or were intentionally skipped before merge.
  • Focused proof: session-lock regression tests, guard tests, git diff --check, local + branch autoreview.
  • Remote runtime proof: Blacksmith Testbox through Crabbox tbx_01ktd3ckct31kq04bxw1500chg, Actions run https://github.com/openclaw/openclaw/actions/runs/27046304204.

Thanks @jalehman.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EmbeddedAttemptSessionTakeoverError: auto-compaction at reason=threshold trips fence on rewritten session jsonl

1 participant