Skip to content

fix(agents): persist subagent registry before returning accepted (#83132)#83238

Merged
clawsweeper[bot] merged 2 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-83146
May 17, 2026
Merged

fix(agents): persist subagent registry before returning accepted (#83132)#83238
clawsweeper[bot] merged 2 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-83146

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Makes #83146 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against d564ef0.

yetval and others added 2 commits May 17, 2026 18:48
Native subagent spawn could return accepted while the run entry was
absent from ~/.openclaw/subagents/runs.json, so subagents list and
completion delivery lost track of the run. persistSubagentRunsToDisk
swallowed the save error, hiding the missing-registry symptom.

Make the initial registration fail closed: persist via a strict variant
that propagates write errors, and roll the in-memory entry back on
failure so spawn returns an actionable error instead of accepted.
Subsequent lifecycle updates keep the best-effort writer.

Refs #83132
@clawsweeper clawsweeper Bot added agents Agent runtime and tooling size: XS clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: supplied External PR includes structured after-fix real behavior proof. labels May 17, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. impact:message-loss Channel message delivery can be lost, duplicated, or misrouted. clawsweeper Tracked by ClawSweeper automation labels May 17, 2026
@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: passed.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.

Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the run is added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Real behavior proof
Sufficient (terminal): The linked source PR includes copied live terminal output from a local checkout showing the strict persistence failure path rolls back and the happy path still lists the run.

Next step before merge
No repair lane needed; review found no actionable patch defect, so the automerge path should rely on exact-head checks.

Security
Cleared: The diff changes local subagent registry persistence and tests only; it does not add dependencies, workflow permissions, secret handling, or new code-execution paths.

Review details

Best possible solution:

Land this narrow fail-closed initial-registration fix once exact-head checks pass; leave broader orphan reconstruction and final-delivery recovery to the existing follow-up track.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows registry save failures are swallowed after the run is added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Is this the best way to solve the issue?

Yes. The PR uses strict persistence only for the initial acceptance invariant while preserving best-effort lifecycle updates, which is the narrow maintainable fix for this bug.

What I checked:

  • Current-main persistence gap: Current main still routes subagent registry saves through a helper that catches and ignores save failures, so a failed registry write can be hidden from the spawn acceptance path. (src/agents/subagent-registry-state.ts:7, f36a1b0c8120)
  • PR fail-closed registration path: The PR head adds persistOrThrow to the run manager and deletes the just-added in-memory entry before rethrowing when the initial durable save fails. (src/agents/subagent-registry-run-manager.ts:515, d564ef051d37)
  • Existing spawn rollback contract: The native subagent spawn caller already catches registerSubagentRun failures, rolls back prepared context and child session state, and returns an error instead of accepted. (src/agents/subagent-spawn.ts:1268, f36a1b0c8120)
  • Regression coverage: The PR adds coverage that injects a strict persistence failure, expects the registration call to throw, and verifies the run is not listed afterward. (src/agents/subagent-registry.test.ts:738, d564ef051d37)
  • Commit structure: The branch carries the contributor code commit 2a00c4a4a05561fd92f44a9026ad4f2164b35752 and a ClawSweeper changelog commit on top; the latter is the current PR head. (2a00c4a4a055)
  • Related proof from source PR: The linked source PR includes copied terminal output from a local checkout showing an ENOSPC-style strict persistence error is propagated and the in-memory entry is rolled back, with the happy path still listed. (2a00c4a4a055)

Likely related people:

  • Peter Steinberger: Introduced early subagent registry/archive behavior and later split the subagent run manager that this PR edits; shortlog also shows the highest commit count in these files. (role: feature owner / major refactor author; confidence: high; commits: 75c66acfd828, cfbef8035dd1, 3b2db583cdb1; files: src/agents/subagent-registry.ts, src/agents/subagent-registry-run-manager.ts)
  • @Takhoffman: Has a dense run of recent subagent registry cleanup, registration-failure, and session-key fixes, and requested automerge for this replacement PR. (role: recent area contributor; confidence: high; commits: 79ef86c305ef, a77f76b4d07d, c90ae1ee7f63; files: src/agents/subagent-registry.ts, src/agents/subagent-registry-run-manager.ts, src/agents/subagent-spawn.ts)
  • @vincentkoc: Recent task-run and subagent hook/bootstrap refactors touch the same registration and completion-management surface. (role: adjacent owner; confidence: medium; commits: ec13f6d73eb5, f17fd735efd9, 66b8de9c8361; files: src/agents/subagent-registry-run-manager.ts, src/agents/subagent-registry.ts)
  • @yetval: Authored the source fix carried into this replacement PR and is co-credited on the already-merged adjacent subagent registry fix for the same bug family. (role: recent adjacent contributor; confidence: medium; commits: 2a00c4a4a055, 3e765263dd6a; files: src/agents/subagent-registry-run-manager.ts, src/agents/subagent-registry-state.ts, src/agents/subagent-registry.test.ts)

Remaining risk / open question:

  • This read-only review did not execute tests; exact-head CI should still gate automerge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f36a1b0c8120.

@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=d564ef051d37595b08e08bd47f81e11b388909b4)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-05-17T19:11:02Z
Merge commit: 214f718be7b3

What merged:

  • This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
  • Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:

The automerge loop is complete.

Automerge progress:

  • 2026-05-17 19:04:05 UTC review queued d564ef051d37 (queued)
  • 2026-05-17 19:10:51 UTC review passed d564ef051d37 (structured ClawSweeper verdict: pass (sha=d564ef051d37595b08e08bd47f81e11b38890...)
  • 2026-05-17 19:11:05 UTC merged d564ef051d37 (merged by ClawSweeper automerge)

@openclaw-barnacle openclaw-barnacle Bot removed proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 17, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@clawsweeper clawsweeper Bot merged commit 214f718 into main May 17, 2026
195 of 207 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-83146 branch May 17, 2026 19:11
frankhli843 added a commit to gemmaclaw/gemmaclaw that referenced this pull request May 19, 2026
* fix(gateway): clear CLI bindings on session reset

* fix(gateway): preserve spawned sessions in configured lists

* fix(channels): clear canonical stale routes

* fix(telegram): preserve forum topic origin targets

* fix(agents): skip fallback for session coordination errors

* fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238)

* fix(memory): catch up stale sessions on startup (openclaw#82341)

* fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423)

* fix(anthropic): preserve Claude image capability (openclaw#83756)

* fix(agents): exclude tool result details from guard budget (openclaw#75525)

* fix(provider): use Together video API endpoint

* fix(telegram): preserve implicit default account (openclaw#82794)

* fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953)

* fix(discord): return subagent thread delivery origin

* fix: add missing prerequisites for upstream-ported fixes

Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout
helper needed by the ported fix(agents) skip-fallback commit. Remove
route property references from session-delivery.ts that don't exist in
gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was
present in upstream but missing from gemmaclaw's auth.ts.

* fix: remove route assertions incompatible with gemmaclaw SessionEntry

Remove test assertions using .route property that exists in upstream's
SessionEntry type but not in gemmaclaw's, restoring typecheck green.

* fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758)

Summary:
- The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry.
- Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ...  and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera…
- PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81…

Validation:
- ClawSweeper review passed for head 0ede3d7.
- Required merge gates passed before the squash merge.

Prepared head SHA: 0ede3d7
Review: openclaw#83758 (comment)

Co-authored-by: NW <nitinwadhawan66@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

* fix(subagents): collect unresolved announce batches (openclaw#83701)

Summary:
- The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain.
- Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`.

Automerge notes:
- PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518

Validation:
- ClawSweeper review passed for head e6ad029.
- Required merge gates passed before the squash merge.

Prepared head SHA: e6ad029
Review: openclaw#83701 (comment)

Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

* fix(config): accept gateway remote port

* fix: restore Array<{}> closing bracket in manager-search.ts

Cherry-pick 68b3729 accidentally dropped the '>' from '}>',
producing a syntax error. Restore '}>;' as it was in origin/main.

* fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema

* fix(agents): prioritize manual session turns (openclaw#82765)

* fix(agents): prioritize manual session turns

* docs: update changelog for session priority

---------

Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com>

* revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw

* fix: resolve undefined variable errors in cherry-picked extension code

* fix(tui): preserve draft while chat is busy

* fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit

* fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687)

* fix(ui): render session-scoped tool events (openclaw#83734)

* chore: regenerate base config schema after upstream cherry-picks

* fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock

New export added to subagent-registry-state.ts was missing from the
vi.mock definition, causing all tests in the suite to skip and the
module to fail to load.

* fix(telegram): wire buildTelegramInboundOriginTarget into session context

Cherry-pick 675e053 added the helper and the test assertion but did not
update bot-message-context.session.ts to use it. OriginatingTo now
correctly includes :topic:<id> for forum groups.

* fix(memory): correct session path format in startup-catchup test

sessionPathForFile returns sessions/<basename> (no agent dir), but the
cherry-picked test used sessions/main/<basename>. The clean-file test
always failed because the path mismatch made every file look unindexed.

* fix(together): update video generation test URL from v1 to v2

The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2
but the cherry-picked test still asserted the old v1 URL.

---------

Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Co-authored-by: Galin Iliev <iliev@galcho.com>
Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com>
Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>
markfietje pushed a commit to markfietje/openclaw that referenced this pull request May 20, 2026
…132) (#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (#83…

Validation:
- ClawSweeper review passed for head d564ef051d37595b08e08bd47f81e11b388909b4.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef051d37595b08e08bd47f81e11b388909b4
Review: openclaw/openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
markfietje pushed a commit to markfietje/openclaw that referenced this pull request May 20, 2026
…132) (#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (#83…

Validation:
- ClawSweeper review passed for head d564ef051d37595b08e08bd47f81e11b388909b4.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef051d37595b08e08bd47f81e11b388909b4
Review: openclaw/openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 20, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 29, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 29, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 29, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 31, 2026
…qlite divergence

updateTask and deleteTaskRecordById committed the in-memory tasks /
taskDeliveryStates maps (and owner/parentFlow/relatedSession/runId
indexes) before calling the sqlite persist helpers, and neither persist
call was wrapped in a try/catch. The default store always binds the
transactional variants (upsertTaskWithDeliveryState /
deleteTaskWithDeliveryState), so persist always goes through
withWriteTransaction(BEGIN IMMEDIATE), which ROLLBACKs and re-throws on
SQLITE_BUSY (multi-writer busy_timeout exceeded), SQLITE_FULL (disk
full), or SQLITE_IOERR. The unhandled throw left the in-memory state
mutated while sqlite kept the prior row, so the two stores diverged:
reload/restart resurrected deleted tasks (lost-delete), and the
sqlite-direct reader listFreshTasksForOwnerKey (used by
media-generation-task-status-shared) returned rows that contradicted the
in-memory path.

Persist to the store first and only mutate the in-memory maps and indexes
after persist succeeds, so a persist throw leaves the in-memory state
untouched and consistent with sqlite.

Make the delete persist a single atomic store operation. The composite
deleteTaskWithDeliveryState already removes the task and its delivery
state in one transaction, so the previously trailing
persistTaskDeliveryStateDelete was a redundant, non-transactional write
that, if it threw after the composite committed, dropped the task from
sqlite while memory still held it. Stores without a composite delete now
persist the removal of both records through one projected snapshot rather
than separate deleteTask / deleteDeliveryState calls, which would leave a
delivery-state row behind or reintroduce a two-write divergence window.
The snapshot fallbacks in the persist helpers project the pending change
so they stay correct under the persist-before-in-memory ordering.

Store contracts are unchanged. Mirrors the persist-before-in-memory
fix-shape of openclaw#83238. Adds regression coverage for the upsert divergence,
the composite delete (no redundant second delete), the snapshot-only
delete (no resurrection), and non-composite stores with separate delete
methods (both records removed atomically).

AI-assisted: drafted with claude code (claude-opus-4-8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…nclaw#83132) (openclaw#83238)

Summary:
- This PR adds a strict initial subagent registry persistence path, rolls back failed registrations, updates affected test seams, adds a regression test, and records the fix in the changelog.
- Reproducibility: yes. Source inspection on current main shows registry save failures are swallowed after the ... s added, and the linked source PR provides an ENOSPC-style after-fix terminal proof for the corrected path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): persist subagent registry before returning accepted (openclaw#83…

Validation:
- ClawSweeper review passed for head d564ef0.
- Required merge gates passed before the squash merge.

Prepared head SHA: d564ef0
Review: openclaw#83238 (comment)

Co-authored-by: yetval <yetvald@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation impact:message-loss Channel message delivery can be lost, duplicated, or misrouted. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant