Release embedded session write lock before model I/O#82891
Conversation
|
Codex review: passed. Summary Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt setup until cleanup, and the PR body includes Blacksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the repaired PR through the existing automerge/check gates, preserving the controller-based release/fence design and the new skipped-prompt/context-engine regression coverage. Do we have a high-confidence way to reproduce the issue? Yes. Current main source still holds the embedded session write lock from early attempt setup until cleanup, and the PR body includes Blacksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Is this the best way to solve the issue? Yes. The repaired head releases before provider/cache I/O, fences short transcript-write and cleanup sections, and adds regression coverage for the skipped-prompt path that the prior review flagged. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3dd8bcb41969. |
3b87580 to
037323c
Compare
|
/clawsweeper automerge |
|
ClawSweeper 🐠 automerge status This repair pass finished without changing the PR. ClawSweeper checked the branch and found no safe patch to push this time. Executor outcome: no planned fix actions. Worker actions:
Nothing moved downstream from this pass: no branch update, replacement PR, merge, or re-review. fish notes: model gpt-5.5, reasoning high. Automerge progress:
|
a53710b to
4c6dd7e
Compare
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7ed6e560aa8149aaa00c39f45f3f36253d4. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7ed6e560aa8149aaa00c39f45f3f36253d4 Review: openclaw/openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7ed6e560aa8149aaa00c39f45f3f36253d4. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7ed6e560aa8149aaa00c39f45f3f36253d4 Review: openclaw/openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: openclaw#82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Summary
Fixes #13744 by narrowing the embedded run's session transcript write-lock lifecycle:
session.writeLock.staleMsandsession.writeLock.maxHoldMspolicy with env overrides, and propagate those options through transcript writers, doctor, gateway startup cleanup, SDK runtime, Codex transcript mirroring, compaction, and rewrite/truncation pathsReproduction
I reproduced the issue on unmodified main in Blacksmith Testbox through Crabbox.
blacksmith-testboxtbx_01krsajemdgtkm2288545hcsfeembedded-lock-spans-prompt acquire=68758 prompt=113525 cleanup=180285same-process-contention timeoutElapsedMs=26 message=session file locked (timeout 25ms): pid=4306 /tmp/openclaw-lock-repro-yYhC6d/session.jsonl.lockThat proved the embedded run still held the session write lock across prompt/model I/O and caused another same-process transcript writer to time out while the model turn was in flight.
Verification
Current head:
a53710bba15daf53e8e2dd19c2a9b23e5f7431f6.Local live provider/channel E2E on the user's local machine:
Result:
pnpm buildpassed after the final runner type cleanup..artifacts/qa-e2e/matrix-local-live-20260517T064732Z/matrix-qa-output.log.artifacts/qa-e2e/matrix-local-live-20260517T064732Z/matrix-qa-report.md.artifacts/qa-e2e/matrix-local-live-20260517T064732Z/matrix-qa-summary.jsonghcr.io/matrix-construct/tuwunel:v1.5.1, servermatrix-qa.testlive-frontier,openai/gpt-5.412016ms15/15passed,0failed, total163055ms108matrix-thread-follow-up,matrix-thread-isolation,matrix-top-level-reply-shape,matrix-reaction-notification,matrix-approval-exec-metadata-single-event,matrix-approval-exec-metadata-chunked,matrix-restart-resume,matrix-mention-gating,matrix-allowbots-default-block,matrix-allowbots-mentions-mentioned-room,matrix-allowlist-block,matrix-e2ee-basic-replyFocused local regression proof on the same head:
Result:
Test Files 21 passed (21),Tests 310 passed (310), diff check clean.Earlier local baseline proof before the final live E2E type cleanup:
Result: both generated baseline checks OK.
Final Blacksmith Testbox through Crabbox proof:
blacksmith-testboxtbx_01krt8tz8kf8ddzktzj01yy63gCI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm test ...focused session-lock/context-engine suite...passed 6 Vitest shards in 9.13s, exit 0{"provider":"blacksmith-testbox","leaseId":"tbx_01krt8tz8kf8ddzktzj01yy63g","syncDelegated":true,"commandMs":51926,"totalMs":54468,"exitCode":0}Additional review/proof completed during iteration:
blacksmith-testbox, idtbx_01krt2k7k76gtfn65725vdj4kn, Actions https://github.com/openclaw/openclaw/actions/runs/25981164105,Test Files 19 passed (19),Tests 272 passed (272), exit 0.Real behavior proof
Behavior addressed: embedded PI runs no longer hold the session transcript write lock across model/provider I/O; context-engine post-turn hooks no longer run under the post-prompt transcript lock; stale attempts are fenced before live output, tool execution, transcript writes, and cleanup after a takeover.
Real environment tested: local macOS source checkout with Docker Desktop and a disposable Tuwunel Matrix homeserver, using the user's local OpenAI provider token for
live-frontier/openai/gpt-5.4; Blacksmith Testbox through Crabbox for remote Linux focused regression proof.Exact steps or command run after this patch:
pnpm build; local Matrix QAfastprofile withOPENCLAW_LIVE_OPENAI_KEY=<redacted from local OPENAI_API_KEY>,providerMode: "live-frontier",primaryModel: "openai/gpt-5.4",alternateModel: "openai/gpt-5.4"; the focusednode scripts/run-vitest.mjs ...command above;git diff --check; and the focused suite in Blacksmith Testbox through Crabbox.Evidence after fix: live Matrix provider/channel E2E passed with
15/15checks/scenarios,0failed, canary pass in12016ms,108observed Matrix events, report at.artifacts/qa-e2e/matrix-local-live-20260517T064732Z/matrix-qa-report.md; local regression proof passed with 21 test files and 310 tests; Testbox proof passed with providerblacksmith-testbox, idtbx_01krt8tz8kf8ddzktzj01yy63g, 6 Vitest shards passed, exit 0.Observed result after fix: a real Matrix channel and live OpenAI provider turn completed end to end, followed by the Matrix fast-profile contract scenarios for thread replies, top-level reply shape, reaction observation, approval metadata, restart/resume, mention gating, allow-bot policy, sender allowlist blocking, and encrypted-room E2EE reply. Lock lifecycle tests verify prompt-time release, fenced reacquire, takeover no-op cleanup, live-output fencing, tool-call fencing, external-hook locking, context-engine rewrite-only locking, and stale/max-hold policy propagation.
What was not tested: full repository gates beyond the targeted session-lock, embedded-run, context-engine maintenance, config/docs baseline, SDK runtime, Codex transcript-mirror surface, and the Matrix live provider/channel E2E described above.