fix(agents): persist subagent registry before returning accepted (#83132)#83238
Conversation
Native subagent spawn could return accepted while the run entry was absent from ~/.openclaw/subagents/runs.json, so subagents list and completion delivery lost track of the run. persistSubagentRunsToDisk swallowed the save error, hiding the missing-registry symptom. Make the initial registration fail closed: persist via a strict variant that propagates write errors, and roll the in-memory entry back on failure so spawn returns an actionable error instead of accepted. Subsequent lifecycle updates keep the best-effort writer. Refs #83132
|
Codex review: passed. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the run is added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this narrow fail-closed initial-registration fix once exact-head checks pass; leave broader orphan reconstruction and final-delivery recovery to the existing follow-up track. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows registry save failures are swallowed after the run is added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Is this the best way to solve the issue? Yes. The PR uses strict persistence only for the initial acceptance invariant while preserving best-effort lifecycle updates, which is the narrow maintainable fix for this bug. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against f36a1b0c8120. |
|
🦞✅ Source: What merged:
Automerge notes:
The automerge loop is complete. Automerge progress:
|
* fix(gateway): clear CLI bindings on session reset * fix(gateway): preserve spawned sessions in configured lists * fix(channels): clear canonical stale routes * fix(telegram): preserve forum topic origin targets * fix(agents): skip fallback for session coordination errors * fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238) * fix(memory): catch up stale sessions on startup (openclaw#82341) * fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423) * fix(anthropic): preserve Claude image capability (openclaw#83756) * fix(agents): exclude tool result details from guard budget (openclaw#75525) * fix(provider): use Together video API endpoint * fix(telegram): preserve implicit default account (openclaw#82794) * fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953) * fix(discord): return subagent thread delivery origin * fix: add missing prerequisites for upstream-ported fixes Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout helper needed by the ported fix(agents) skip-fallback commit. Remove route property references from session-delivery.ts that don't exist in gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was present in upstream but missing from gemmaclaw's auth.ts. * fix: remove route assertions incompatible with gemmaclaw SessionEntry Remove test assertions using .route property that exists in upstream's SessionEntry type but not in gemmaclaw's, restoring typecheck green. * fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758) Summary: - The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry. - Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ... and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path. Automerge notes: - PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera… - PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81… Validation: - ClawSweeper review passed for head 0ede3d7. - Required merge gates passed before the squash merge. Prepared head SHA: 0ede3d7 Review: openclaw#83758 (comment) Co-authored-by: NW <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(subagents): collect unresolved announce batches (openclaw#83701) Summary: - The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain. - Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`. Automerge notes: - PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518 Validation: - ClawSweeper review passed for head e6ad029. - Required merge gates passed before the squash merge. Prepared head SHA: e6ad029 Review: openclaw#83701 (comment) Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(config): accept gateway remote port * fix: restore Array<{}> closing bracket in manager-search.ts Cherry-pick 68b3729 accidentally dropped the '>' from '}>', producing a syntax error. Restore '}>;' as it was in origin/main. * fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema * fix(agents): prioritize manual session turns (openclaw#82765) * fix(agents): prioritize manual session turns * docs: update changelog for session priority --------- Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> * revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw * fix: resolve undefined variable errors in cherry-picked extension code * fix(tui): preserve draft while chat is busy * fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit * fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687) * fix(ui): render session-scoped tool events (openclaw#83734) * chore: regenerate base config schema after upstream cherry-picks * fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock New export added to subagent-registry-state.ts was missing from the vi.mock definition, causing all tests in the suite to skip and the module to fail to load. * fix(telegram): wire buildTelegramInboundOriginTarget into session context Cherry-pick 675e053 added the helper and the test assertion but did not update bot-message-context.session.ts to use it. OriginatingTo now correctly includes :topic:<id> for forum groups. * fix(memory): correct session path format in startup-catchup test sessionPathForFile returns sessions/<basename> (no agent dir), but the cherry-picked test used sessions/main/<basename>. The clean-file test always failed because the path mismatch made every file look unindexed. * fix(together): update video generation test URL from v1 to v2 The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2 but the cherry-picked test still asserted the old v1 URL. --------- Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Galin Iliev <iliev@galcho.com> Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>
…132) (#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (#83… Validation: - ClawSweeper review passed for head d564ef051d37595b08e08bd47f81e11b388909b4. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef051d37595b08e08bd47f81e11b388909b4 Review: openclaw/openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…132) (#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (#83… Validation: - ClawSweeper review passed for head d564ef051d37595b08e08bd47f81e11b388909b4. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef051d37595b08e08bd47f81e11b388909b4 Review: openclaw/openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…qlite divergence updateTask and deleteTaskRecordById committed the in-memory tasks / taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId indexes) before calling the sqlite persist helpers, and neither persist call was wrapped in a try/catch. The default store always binds the transactional variants (upsertTaskWithDeliveryState / deleteTaskWithDeliveryState), so persist always goes through withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk full), or SQLITE_IOERR. The unhandled throw left the in-memory state mutated while sqlite kept the prior row, so the two stores diverged: reload/restart resurrected deleted tasks (lost-delete), and the sqlite-direct reader listFreshTasksForOwnerKey (used by media-generation-task-status-shared) returned rows that contradicted the in-memory path. Persist to the store first and only mutate the in-memory maps and indexes after persist succeeds, so a persist throw leaves the in-memory state untouched and consistent with sqlite. Make the delete persist a single atomic store operation. The composite deleteTaskWithDeliveryState already removes the task and its delivery state in one transaction, so the previously trailing persistTaskDeliveryStateDelete was a redundant, non-transactional write that, if it threw after the composite committed, dropped the task from sqlite while memory still held it. Stores without a composite delete now persist the removal of both records through one projected snapshot rather than separate deleteTask / deleteDeliveryState calls, which would leave a delivery-state row behind or reintroduce a two-write divergence window. The snapshot fallbacks in the persist helpers project the pending change so they stay correct under the persist-before-in-memory ordering. Store contracts are unchanged. Mirrors the persist-before-in-memory fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence, the composite delete (no redundant second delete), the snapshot-only delete (no resurrection), and non-composite stores with separate delete methods (both records removed atomically). AI-assisted: drafted with claude code (claude-opus-4-8). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…nclaw#83132) (openclaw#83238) Summary: - This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog. - Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83… Validation: - ClawSweeper review passed for head d564ef0. - Required merge gates passed before the squash merge. Prepared head SHA: d564ef0 Review: openclaw#83238 (comment) Co-authored-by: yetval <yetvald@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Makes #83146 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
ClawSweeper 🐠 replacement reef notes:
Co-author credit kept:
fish notes: model gpt-5.5, reasoning high; reviewed against d564ef0.