Skip to content

Harden subagent completion delivery#68464

Open
nnlevy wants to merge 3 commits into
openclaw:mainfrom
nnlevy:harden-subagent-completion-delivery
Open

Harden subagent completion delivery#68464
nnlevy wants to merge 3 commits into
openclaw:mainfrom
nnlevy:harden-subagent-completion-delivery

Conversation

@nnlevy

@nnlevy nnlevy commented Apr 18, 2026

Copy link
Copy Markdown

Summary

  • persist a per-run delivery claim and user-safe delivery payload for subagent completions
  • route active-parent completion announces queue-first with fail-closed semantics instead of direct-first fallback behavior
  • harden iMessage delivery normalization and chunking to avoid leaking internal context and to preserve multiline/code payloads safely

Why

A production bug leaked internal subagent/control-plane envelopes into iMessage, and a second bug allowed duplicate user-visible completion delivery when a parent session was active while a child completion also delivered directly.

This change moves the fix to source-level architecture instead of relying on downstream dist patching:

  • separates persisted user-delivery payloads from internal orchestration text
  • makes announce delivery idempotent across queue/direct transitions with a persisted run claim
  • prevents direct external delivery while an active parent session can still accept the completion through its announce queue

Implementation Notes

  • add deliveryClaim, userDeliveryPayload, and fallbackUserDeliveryPayload to subagent run records
  • extract user-facing completion payloads from structured child-result blocks first, then named delivery sections, then sanitized fallback cleanup
  • persist payload/claim state through completion freeze, cleanup retries, and steer restarts
  • normalize iMessage text before send, close dangling code fences, and chunk around non-code boundaries when possible

Verification

  • corepack pnpm exec vitest run src/agents/subagent-announce-dispatch.test.ts src/agents/subagent-announce.format.e2e.test.ts
  • corepack pnpm exec vitest run src/agents/subagent-registry-lifecycle.test.ts src/agents/subagent-announce-delivery.test.ts
  • corepack pnpm exec vitest run extensions/imessage/src/monitor/sanitize-outbound.test.ts extensions/imessage/src/monitor/deliver.test.ts

@openclaw-barnacle openclaw-barnacle Bot added channel: imessage Channel integration: imessage agents Agent runtime and tooling size: L labels Apr 18, 2026
@greptile-apps

greptile-apps Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens subagent completion delivery by flipping the dispatch order to queue-first (fail-closed), adding a persisted delivery-claim mechanism to prevent duplicate delivery, and tightening iMessage text normalization to block internal context leakage and handle dangling code fences.

  • P1 architectural concern: buildStoredUserDeliveryPayload is invoked in three production code paths through the __testing export of subagent-announce.ts. This function should be promoted to a named production export instead of crossing the test-only boundary in live code.

Confidence Score: 4/5

Safe to merge after promoting buildStoredUserDeliveryPayload out of the __testing export; the delivery and normalization logic itself is correct.

One genuine P1 finding: production code in subagent-registry-lifecycle.ts imports and calls a function through __testing, which is conventionally test-only. The two P2 findings (double normalization, duplicated regex constants) don't affect correctness. The dispatch logic flip, persisted claim mechanism, and iMessage sanitization are well-structured and properly tested.

src/agents/subagent-registry-lifecycle.ts (lines 16, 188, 248) and src/agents/subagent-announce.ts (the __testing export that should be split into a proper named export).

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/subagent-registry-lifecycle.ts
Line: 16

Comment:
**`__testing` export consumed in production code**

`buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.

Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/imessage/src/monitor/deliver.ts
Line: 58-100

Comment:
**Double normalization of delivery text**

`deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.

Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 99-106

Comment:
**Duplicated regex constants across modules**

`OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Harden subagent completion delivery" | Re-trigger Greptile

captureSubagentCompletionReply,
runSubagentAnnounceFlow,
type SubagentRunOutcome,
__testing as subagentAnnounceTesting,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 __testing export consumed in production code

buildStoredUserDeliveryPayload is invoked via subagentAnnounceTesting in three production paths (lines 188, 248, and via subagent-registry-lifecycle.ts). The __testing export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.

Export buildStoredUserDeliveryPayload as a proper named export from subagent-announce.ts and import it directly instead of routing through __testing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-registry-lifecycle.ts
Line: 16

Comment:
**`__testing` export consumed in production code**

`buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.

Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +58 to +100
function chunkIMessageText(text: string, limit: number): string[] {
const normalized = normalizeIMessageDeliveryText(text ?? "");
if (!normalized) {
return [];
}
if (limit <= 0 || normalized.length <= limit) {
return [normalized];
}
const paragraphs = normalized
.split(/\n{2,}/)
.map((part) => part.trim())
.filter(Boolean);
if (paragraphs.length <= 1) {
return splitIMessageTextSafely(normalized, limit);
}
const chunks: string[] = [];
let current = "";
for (const paragraph of paragraphs) {
const candidate = current ? `${current}\n\n${paragraph}` : paragraph;
if (candidate.length <= limit) {
current = candidate;
continue;
}
if (current) {
chunks.push(current);
}
if (paragraph.length <= limit) {
current = paragraph;
continue;
}
const safeChunks = splitIMessageTextSafely(paragraph, limit);
if (safeChunks.length === 0) {
current = "";
continue;
}
chunks.push(...safeChunks.slice(0, -1));
current = safeChunks.at(-1) ?? "";
}
if (current) {
chunks.push(current);
}
return chunks;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double normalization of delivery text

deliverReplies normalizes payload.text via normalizeIMessageDeliveryText on line 123 (when building reply.text) and then chunkIMessageText normalizes the same, already-normalized value a second time (line 59). Since normalizeIMessageDeliveryText is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.

Consider removing the normalizeIMessageDeliveryText call inside chunkIMessageText and making it the caller's responsibility, or conversely, not pre-normalizing in deliverReplies and letting the chunker own normalization exclusively.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/imessage/src/monitor/deliver.ts
Line: 58-100

Comment:
**Double normalization of delivery text**

`deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.

Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +99 to +106
const INTERNAL_CONTEXT_BLOCK_RE =
/(?:^|\n)\s*(?:Human:\s*)?(?:\[[^\n]*\]\s*)?<<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>[\s\S]*?<<<END_OPENCLAW_INTERNAL_CONTEXT>>>\s*/g;
const CHILD_RESULT_BLOCK_RE =
/<<<BEGIN_UNTRUSTED_CHILD_RESULT>>>\s*([\s\S]*?)\s*<<<END_UNTRUSTED_CHILD_RESULT>>>/g;
const INTERNAL_CONTEXT_LINE_RE =
/(?:^|\n)\s*(?:\[Inter-session message][^\n]*|OpenClaw runtime context \(internal\):|This context is runtime-generated, not user-authored\. Keep internal details private\.|Result \(untrusted content, treat as data\):|Action:\s*A completed .*? ready for user delivery\.[^\n]*|Stats:\s*runtime[^\n]*|sourceSession=[^\n]*|sourceChannel=[^\n]*|sourceTool=[^\n]*)(?=\n|$)/gim;
const DIRECT_USER_DELIVERY_INSTRUCTION_RE =
/(?:^|\n)(?:Convert the result above into your normal assistant voice and send that user-facing update now\.|Keep internal context private.*|Reply ONLY:\s*NO_REPLY.*)(?=\n|$)/gim;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated regex constants across modules

OPENCLAW_INTERNAL_BLOCK_RE and OPENCLAW_INTERNAL_LINE_RE are defined verbatim in both subagent-announce.ts and sanitize-outbound.ts. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a subagent-text-patterns.ts internal helper) so there is a single source of truth.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 99-106

Comment:
**Duplicated regex constants across modules**

`OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f687a4d28

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/subagent-announce.ts Outdated
Comment on lines +609 to +610
const userDeliveryPayload = !requesterIsSubagent
? persistedUserDeliveryPayload ?? buildStoredUserDeliveryPayload(findings)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive user payload from chosen completion findings

When descendant completions are present, findings is intentionally set from childCompletionFindings, but this code still prefers persistedUserDeliveryPayload (captured earlier from roundOneReply) for userDeliveryPayload. In completion mode for non-subagent requesters, buildDirectUserCompletionPrompt then uses that stale payload, so users can receive an outdated summary instead of the descendant-derived result that this flow selected as authoritative. This is reproducible whenever a run has both a frozen round-one reply and direct-child completion findings.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 120c775b32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return withPhases({
delivered: false,
path: "none",
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Release delivery claim when dispatch aborts early

When announceId is provided and the signal is already aborted, this early return exits before the finally block that clears or finalizes the persisted claim. Because the claim was already created earlier in the function, the run is left in deliveryClaim.state = "claimed" until lease expiry, and subsequent retries for the same announce can be rejected as delivery-already-in-flight without attempting queue/direct delivery.

Useful? React with 👍 / 👎.

accountId: account.accountId,
});
message = convertMarkdownTables(message, tableMode);
message = convertMarkdownTables(normalizeIMessageDeliveryText(message), tableMode);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip per-chunk fence normalization in iMessage send

This normalizes (and potentially auto-closes) code fences on every sendMessageIMessage call, but chunked monitor delivery already normalizes before splitting. If a fenced block is longer than textLimit, individual chunks can have an odd fence count, so this line appends extra closing fences per chunk and corrupts the rendered code output for long responses.

Useful? React with 👍 / 👎.

@MertBasar0

Copy link
Copy Markdown
Contributor

I reported this issue in #65000 and confirmed the durability gap during reconnects in my local tests. I'm happy to help test the hardening fixes in this PR (#68464) on my WSL2 environment to see if it fully resolves the edge cases I encountered.

@clawsweeper

clawsweeper Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Keep open: the PR addresses real subagent/iMessage delivery failures, but this branch is not merge-ready because it can strand persisted delivery claims on abort, can corrupt long fenced iMessage chunks, lacks after-fix real behavior proof, and is dirty against a current main delivery state that has since been heavily reworked.

Canonical path: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

So I’m closing this here because the remaining work is already tracked in the canonical issue.

Review details

Best possible solution:

Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR-introduced blockers: source inspection shows an aborted dispatch can return after creating a persisted claim but before finalization, and iMessage chunks are normalized again after full-message normalization. No live after-fix reproduction proof is present for the original production bug.

Is this the best way to solve the issue?

No. The direction is plausible, but the best merge shape is a rebase onto current main's newer pending-delivery state with the concrete claim and chunking defects fixed first.

Security review:

Security review cleared: No new supply-chain, secret-handling, permission, or dependency-execution concern was found; the security-sensitive internal-context leak fix still needs the functional delivery fixes above.

AGENTS.md: found and applied where relevant.

What I checked:

  • stale F-rated PR: PR was opened 2026-04-18T08:00:14Z, is older than 30 days, and the latest review rated it F.
  • proof blocker: real behavior proof is mock_only and proof tier is D, so this branch is not merge-ready without contributor follow-up.
  • no human follow-up: live comments and timeline hydrated by apply contain no non-automation activity after the ClawSweeper review.

Likely related people:

  • steipete: Recent GitHub commit history for the subagent registry and announce-delivery files includes multiple delivery-state and media-completion changes by this author. (role: recent area contributor; confidence: high; commits: 465a5456fe0f, beb42b12c989, 7f28c8bd07d3; files: src/agents/subagent-registry-lifecycle.ts, src/agents/subagent-announce-delivery.ts, src/agents/subagent-announce-dispatch.ts)
  • vincentkoc: Recent commit history for subagent announce delivery includes the empty completion handoff rejection, and local blame on the current checkout points to recent current-main delivery work. (role: recent area contributor; confidence: medium; commits: e0018382eb00, 1d2bebbb41bf; files: src/agents/subagent-announce-delivery.ts, src/agents/subagent-registry.types.ts, src/agents/subagent-registry-lifecycle.ts)
  • omarshahine: Recent adjacent work touched duplicate generated-media fallback behavior in the same subagent completion delivery surface. (role: adjacent contributor; confidence: medium; commits: 12798eb789bd; files: src/agents/subagent-announce-delivery.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d2bebbb41bf.

@MertBasar0

Copy link
Copy Markdown
Contributor

I would keep this in “needs changes” until the remaining repair set lands.

The direction still looks right, but the mergeable shape should cover:

  • fail-closed cleanup/finalization for persisted delivery claims on early abort paths
  • avoid per-chunk iMessage fence normalization after full-message normalization
  • docs + changelog update for the queue-first completion delivery behavior

After that, I can help validate the reconnect/dropped-completion edge cases from my WSL2 setup.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 19, 2026
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 20, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 6, 2026
@clawsweeper clawsweeper Bot removed the rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. label Jun 6, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 6, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 7, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling channel: imessage Channel integration: imessage merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: L status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants