Harden subagent completion delivery by nnlevy · Pull Request #68464 · openclaw/openclaw

nnlevy · 2026-04-18T08:00:14Z

Summary

persist a per-run delivery claim and user-safe delivery payload for subagent completions
route active-parent completion announces queue-first with fail-closed semantics instead of direct-first fallback behavior
harden iMessage delivery normalization and chunking to avoid leaking internal context and to preserve multiline/code payloads safely

Why

A production bug leaked internal subagent/control-plane envelopes into iMessage, and a second bug allowed duplicate user-visible completion delivery when a parent session was active while a child completion also delivered directly.

This change moves the fix to source-level architecture instead of relying on downstream dist patching:

separates persisted user-delivery payloads from internal orchestration text
makes announce delivery idempotent across queue/direct transitions with a persisted run claim
prevents direct external delivery while an active parent session can still accept the completion through its announce queue

Implementation Notes

add deliveryClaim, userDeliveryPayload, and fallbackUserDeliveryPayload to subagent run records
extract user-facing completion payloads from structured child-result blocks first, then named delivery sections, then sanitized fallback cleanup
persist payload/claim state through completion freeze, cleanup retries, and steer restarts
normalize iMessage text before send, close dangling code fences, and chunk around non-code boundaries when possible

Verification

corepack pnpm exec vitest run src/agents/subagent-announce-dispatch.test.ts src/agents/subagent-announce.format.e2e.test.ts
corepack pnpm exec vitest run src/agents/subagent-registry-lifecycle.test.ts src/agents/subagent-announce-delivery.test.ts
corepack pnpm exec vitest run extensions/imessage/src/monitor/sanitize-outbound.test.ts extensions/imessage/src/monitor/deliver.test.ts

greptile-apps · 2026-04-18T08:04:20Z

Greptile Summary

This PR hardens subagent completion delivery by flipping the dispatch order to queue-first (fail-closed), adding a persisted delivery-claim mechanism to prevent duplicate delivery, and tightening iMessage text normalization to block internal context leakage and handle dangling code fences.

P1 architectural concern: buildStoredUserDeliveryPayload is invoked in three production code paths through the __testing export of subagent-announce.ts. This function should be promoted to a named production export instead of crossing the test-only boundary in live code.

Confidence Score: 4/5

Safe to merge after promoting buildStoredUserDeliveryPayload out of the __testing export; the delivery and normalization logic itself is correct.

One genuine P1 finding: production code in subagent-registry-lifecycle.ts imports and calls a function through __testing, which is conventionally test-only. The two P2 findings (double normalization, duplicated regex constants) don't affect correctness. The dispatch logic flip, persisted claim mechanism, and iMessage sanitization are well-structured and properly tested.

src/agents/subagent-registry-lifecycle.ts (lines 16, 188, 248) and src/agents/subagent-announce.ts (the __testing export that should be split into a proper named export).

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/subagent-registry-lifecycle.ts
Line: 16

Comment:
**`__testing` export consumed in production code**

`buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.

Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/imessage/src/monitor/deliver.ts
Line: 58-100

Comment:
**Double normalization of delivery text**

`deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.

Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 99-106

Comment:
**Duplicated regex constants across modules**

`OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Harden subagent completion delivery" | Re-trigger Greptile}

greptile-apps · 2026-04-18T08:04:24Z

  captureSubagentCompletionReply,
  runSubagentAnnounceFlow,
  type SubagentRunOutcome,
+  __testing as subagentAnnounceTesting,


__testing export consumed in production code

buildStoredUserDeliveryPayload is invoked via subagentAnnounceTesting in three production paths (lines 188, 248, and via subagent-registry-lifecycle.ts). The __testing export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.

Export buildStoredUserDeliveryPayload as a proper named export from subagent-announce.ts and import it directly instead of routing through __testing.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/subagent-registry-lifecycle.ts Line: 16 Comment: **`__testing` export consumed in production code** `buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior. Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-18T08:04:25Z

+function chunkIMessageText(text: string, limit: number): string[] {
+  const normalized = normalizeIMessageDeliveryText(text ?? "");
+  if (!normalized) {
+    return [];
+  }
+  if (limit <= 0 || normalized.length <= limit) {
+    return [normalized];
+  }
+  const paragraphs = normalized
+    .split(/\n{2,}/)
+    .map((part) => part.trim())
+    .filter(Boolean);
+  if (paragraphs.length <= 1) {
+    return splitIMessageTextSafely(normalized, limit);
+  }
+  const chunks: string[] = [];
+  let current = "";
+  for (const paragraph of paragraphs) {
+    const candidate = current ? `${current}\n\n${paragraph}` : paragraph;
+    if (candidate.length <= limit) {
+      current = candidate;
+      continue;
+    }
+    if (current) {
+      chunks.push(current);
+    }
+    if (paragraph.length <= limit) {
+      current = paragraph;
+      continue;
+    }
+    const safeChunks = splitIMessageTextSafely(paragraph, limit);
+    if (safeChunks.length === 0) {
+      current = "";
+      continue;
+    }
+    chunks.push(...safeChunks.slice(0, -1));
+    current = safeChunks.at(-1) ?? "";
+  }
+  if (current) {
+    chunks.push(current);
+  }
+  return chunks;
+}


Double normalization of delivery text

deliverReplies normalizes payload.text via normalizeIMessageDeliveryText on line 123 (when building reply.text) and then chunkIMessageText normalizes the same, already-normalized value a second time (line 59). Since normalizeIMessageDeliveryText is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.

Consider removing the normalizeIMessageDeliveryText call inside chunkIMessageText and making it the caller's responsibility, or conversely, not pre-normalizing in deliverReplies and letting the chunker own normalization exclusively.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/imessage/src/monitor/deliver.ts Line: 58-100 Comment: **Double normalization of delivery text** `deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes. Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-18T08:04:26Z

+const INTERNAL_CONTEXT_BLOCK_RE =
+  /(?:^|\n)\s*(?:Human:\s*)?(?:\[[^\n]*\]\s*)?<<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>[\s\S]*?<<<END_OPENCLAW_INTERNAL_CONTEXT>>>\s*/g;
+const CHILD_RESULT_BLOCK_RE =
+  /<<<BEGIN_UNTRUSTED_CHILD_RESULT>>>\s*([\s\S]*?)\s*<<<END_UNTRUSTED_CHILD_RESULT>>>/g;
+const INTERNAL_CONTEXT_LINE_RE =
+  /(?:^|\n)\s*(?:\[Inter-session message][^\n]*|OpenClaw runtime context \(internal\):|This context is runtime-generated, not user-authored\. Keep internal details private\.|Result \(untrusted content, treat as data\):|Action:\s*A completed .*? ready for user delivery\.[^\n]*|Stats:\s*runtime[^\n]*|sourceSession=[^\n]*|sourceChannel=[^\n]*|sourceTool=[^\n]*)(?=\n|$)/gim;
+const DIRECT_USER_DELIVERY_INSTRUCTION_RE =
+  /(?:^|\n)(?:Convert the result above into your normal assistant voice and send that user-facing update now\.|Keep internal context private.*|Reply ONLY:\s*NO_REPLY.*)(?=\n|$)/gim;


Duplicated regex constants across modules

OPENCLAW_INTERNAL_BLOCK_RE and OPENCLAW_INTERNAL_LINE_RE are defined verbatim in both subagent-announce.ts and sanitize-outbound.ts. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a subagent-text-patterns.ts internal helper) so there is a single source of truth.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/subagent-announce.ts Line: 99-106 Comment: **Duplicated regex constants across modules** `OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f687a4d28

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T08:07:36Z

+    const userDeliveryPayload = !requesterIsSubagent
+      ? persistedUserDeliveryPayload ?? buildStoredUserDeliveryPayload(findings)


Derive user payload from chosen completion findings

When descendant completions are present, findings is intentionally set from childCompletionFindings, but this code still prefers persistedUserDeliveryPayload (captured earlier from roundOneReply) for userDeliveryPayload. In completion mode for non-subagent requesters, buildDirectUserCompletionPrompt then uses that stale payload, so users can receive an outdated summary instead of the descendant-derived result that this flow selected as authoritative. This is reproducible whenever a run has both a frozen round-one reply and direct-child completion findings.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 120c775b32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T08:23:52Z

    return withPhases({
      delivered: false,
      path: "none",
    });


Release delivery claim when dispatch aborts early

When announceId is provided and the signal is already aborted, this early return exits before the finally block that clears or finalizes the persisted claim. Because the claim was already created earlier in the function, the run is left in deliveryClaim.state = "claimed" until lease expiry, and subsequent retries for the same announce can be rejected as delivery-already-in-flight without attempting queue/direct delivery.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-18T08:23:52Z

      accountId: account.accountId,
    });
-    message = convertMarkdownTables(message, tableMode);
+    message = convertMarkdownTables(normalizeIMessageDeliveryText(message), tableMode);


Skip per-chunk fence normalization in iMessage send

This normalizes (and potentially auto-closes) code fences on every sendMessageIMessage call, but chunked monitor delivery already normalizes before splitting. If a fenced block is longer than textLimit, individual chunks can have an odd fence count, so this line appends extra closing fences per chunk and corrupts the rendered code output for long responses.

Useful? React with 👍 / 👎.

MertBasar0 · 2026-04-28T23:27:50Z

I reported this issue in #65000 and confirmed the durability gap during reconnects in my local tests. I'm happy to help test the hardening fixes in this PR (#68464) on my WSL2 environment to see if it fully resolves the edge cases I encountered.

clawsweeper · 2026-04-28T23:39:48Z

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Keep open: the PR addresses real subagent/iMessage delivery failures, but this branch is not merge-ready because it can strand persisted delivery claims on abort, can corrupt long fenced iMessage chunks, lacks after-fix real behavior proof, and is dirty against a current main delivery state that has since been heavily reworked.

Canonical path: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

So I’m closing this here because the remaining work is already tracked in the canonical issue.

Review details

Best possible solution:

Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR-introduced blockers: source inspection shows an aborted dispatch can return after creating a persisted claim but before finalization, and iMessage chunks are normalized again after full-message normalization. No live after-fix reproduction proof is present for the original production bug.

Is this the best way to solve the issue?

No. The direction is plausible, but the best merge shape is a rebase onto current main's newer pending-delivery state with the concrete claim and chunking defects fixed first.

Security review:

Security review cleared: No new supply-chain, secret-handling, permission, or dependency-execution concern was found; the security-sensitive internal-context leak fix still needs the functional delivery fixes above.

AGENTS.md: found and applied where relevant.

What I checked:

stale F-rated PR: PR was opened 2026-04-18T08:00:14Z, is older than 30 days, and the latest review rated it F.
proof blocker: real behavior proof is mock_only and proof tier is D, so this branch is not merge-ready without contributor follow-up.
no human follow-up: live comments and timeline hydrated by apply contain no non-automation activity after the ClawSweeper review.

Likely related people:

steipete: Recent GitHub commit history for the subagent registry and announce-delivery files includes multiple delivery-state and media-completion changes by this author. (role: recent area contributor; confidence: high; commits: 465a5456fe0f, beb42b12c989, 7f28c8bd07d3; files: src/agents/subagent-registry-lifecycle.ts, src/agents/subagent-announce-delivery.ts, src/agents/subagent-announce-dispatch.ts)
vincentkoc: Recent commit history for subagent announce delivery includes the empty completion handoff rejection, and local blame on the current checkout points to recent current-main delivery work. (role: recent area contributor; confidence: medium; commits: e0018382eb00, 1d2bebbb41bf; files: src/agents/subagent-announce-delivery.ts, src/agents/subagent-registry.types.ts, src/agents/subagent-registry-lifecycle.ts)
omarshahine: Recent adjacent work touched duplicate generated-media fallback behavior in the same subagent completion delivery surface. (role: adjacent contributor; confidence: medium; commits: 12798eb789bd; files: src/agents/subagent-announce-delivery.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d2bebbb41bf.

MertBasar0 · 2026-05-11T00:37:53Z

I would keep this in “needs changes” until the remaining repair set lands.

The direction still looks right, but the mergeable shape should cover:

fail-closed cleanup/finalization for persisted delivery claims on early abort paths
avoid per-chunk iMessage fence normalization after full-message normalization
docs + changelog update for the queue-first completion delivery behavior

After that, I can help validate the reconnect/dropped-completion edge cases from my WSL2 setup.

clawsweeper · 2026-05-20T00:42:55Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

openclaw-barnacle · 2026-06-06T04:50:00Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

Harden subagent completion delivery

9f687a4

openclaw-barnacle Bot added channel: imessage Channel integration: imessage agents Agent runtime and tooling size: L labels Apr 18, 2026

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

Fix subagent delivery follow-up issues

c45d629

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Prefer descendant completion payloads

120c775

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

barronlroth mentioned this pull request Apr 21, 2026

[Bug] Async/subagent completion can be announced before the result payload exists #69767

Closed

This was referenced Apr 26, 2026

[Bug]: Make subagent completion delivery durable across retry and reconnect gaps #65000

Closed

fix(subagents): harden deferred completion delivery retry state #65037

Closed

clawsweeper Bot mentioned this pull request Apr 30, 2026

Subagent completion event sometimes drops result payload and/or token stats (3 repros, intermittent) #75196

Open

This was referenced May 15, 2026

fix(agents): preserve agent.wait results for subagent announces #43992

Closed

fix(subagents): add sendMessage fallback + callGateway fallthrough for delivery drops #79059

Closed

clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026

openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 19, 2026

clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 20, 2026

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 6, 2026

clawsweeper Bot removed the rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. label Jun 6, 2026

clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 6, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 7, 2026

		const userDeliveryPayload = !requesterIsSubagent
		? persistedUserDeliveryPayload ?? buildStoredUserDeliveryPayload(findings)

Uh oh!

Conversation

nnlevy commented Apr 18, 2026

Summary

Why

Implementation Notes

Verification

Uh oh!

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

MertBasar0 commented Apr 28, 2026

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MertBasar0 commented May 11, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openclaw-barnacle Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading

clawsweeper Bot commented May 20, 2026 •

edited

Loading