Harden subagent completion delivery#68464
Conversation
Greptile SummaryThis PR hardens subagent completion delivery by flipping the dispatch order to queue-first (fail-closed), adding a persisted delivery-claim mechanism to prevent duplicate delivery, and tightening iMessage text normalization to block internal context leakage and handle dangling code fences.
Confidence Score: 4/5Safe to merge after promoting One genuine P1 finding: production code in
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/subagent-registry-lifecycle.ts
Line: 16
Comment:
**`__testing` export consumed in production code**
`buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.
Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/imessage/src/monitor/deliver.ts
Line: 58-100
Comment:
**Double normalization of delivery text**
`deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.
Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 99-106
Comment:
**Duplicated regex constants across modules**
`OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Harden subagent completion delivery" | Re-trigger Greptile |
| captureSubagentCompletionReply, | ||
| runSubagentAnnounceFlow, | ||
| type SubagentRunOutcome, | ||
| __testing as subagentAnnounceTesting, |
There was a problem hiding this comment.
__testing export consumed in production code
buildStoredUserDeliveryPayload is invoked via subagentAnnounceTesting in three production paths (lines 188, 248, and via subagent-registry-lifecycle.ts). The __testing export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.
Export buildStoredUserDeliveryPayload as a proper named export from subagent-announce.ts and import it directly instead of routing through __testing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-registry-lifecycle.ts
Line: 16
Comment:
**`__testing` export consumed in production code**
`buildStoredUserDeliveryPayload` is invoked via `subagentAnnounceTesting` in three production paths (lines 188, 248, and via `subagent-registry-lifecycle.ts`). The `__testing` export is conventionally test-only; importing it in production code bypasses that boundary and means tree-shakers/bundlers cannot safely strip test helpers, and any future refactoring of the testing surface will silently break production behavior.
Export `buildStoredUserDeliveryPayload` as a proper named export from `subagent-announce.ts` and import it directly instead of routing through `__testing`.
How can I resolve this? If you propose a fix, please make it concise.| function chunkIMessageText(text: string, limit: number): string[] { | ||
| const normalized = normalizeIMessageDeliveryText(text ?? ""); | ||
| if (!normalized) { | ||
| return []; | ||
| } | ||
| if (limit <= 0 || normalized.length <= limit) { | ||
| return [normalized]; | ||
| } | ||
| const paragraphs = normalized | ||
| .split(/\n{2,}/) | ||
| .map((part) => part.trim()) | ||
| .filter(Boolean); | ||
| if (paragraphs.length <= 1) { | ||
| return splitIMessageTextSafely(normalized, limit); | ||
| } | ||
| const chunks: string[] = []; | ||
| let current = ""; | ||
| for (const paragraph of paragraphs) { | ||
| const candidate = current ? `${current}\n\n${paragraph}` : paragraph; | ||
| if (candidate.length <= limit) { | ||
| current = candidate; | ||
| continue; | ||
| } | ||
| if (current) { | ||
| chunks.push(current); | ||
| } | ||
| if (paragraph.length <= limit) { | ||
| current = paragraph; | ||
| continue; | ||
| } | ||
| const safeChunks = splitIMessageTextSafely(paragraph, limit); | ||
| if (safeChunks.length === 0) { | ||
| current = ""; | ||
| continue; | ||
| } | ||
| chunks.push(...safeChunks.slice(0, -1)); | ||
| current = safeChunks.at(-1) ?? ""; | ||
| } | ||
| if (current) { | ||
| chunks.push(current); | ||
| } | ||
| return chunks; | ||
| } |
There was a problem hiding this comment.
Double normalization of delivery text
deliverReplies normalizes payload.text via normalizeIMessageDeliveryText on line 123 (when building reply.text) and then chunkIMessageText normalizes the same, already-normalized value a second time (line 59). Since normalizeIMessageDeliveryText is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.
Consider removing the normalizeIMessageDeliveryText call inside chunkIMessageText and making it the caller's responsibility, or conversely, not pre-normalizing in deliverReplies and letting the chunker own normalization exclusively.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/imessage/src/monitor/deliver.ts
Line: 58-100
Comment:
**Double normalization of delivery text**
`deliverReplies` normalizes `payload.text` via `normalizeIMessageDeliveryText` on line 123 (when building `reply.text`) and then `chunkIMessageText` normalizes the same, already-normalized value a second time (line 59). Since `normalizeIMessageDeliveryText` is idempotent this is functionally harmless, but it's redundant and would mask any future non-idempotency introduced by sanitization or fence-closing changes.
Consider removing the `normalizeIMessageDeliveryText` call inside `chunkIMessageText` and making it the caller's responsibility, or conversely, not pre-normalizing in `deliverReplies` and letting the chunker own normalization exclusively.
How can I resolve this? If you propose a fix, please make it concise.| const INTERNAL_CONTEXT_BLOCK_RE = | ||
| /(?:^|\n)\s*(?:Human:\s*)?(?:\[[^\n]*\]\s*)?<<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>[\s\S]*?<<<END_OPENCLAW_INTERNAL_CONTEXT>>>\s*/g; | ||
| const CHILD_RESULT_BLOCK_RE = | ||
| /<<<BEGIN_UNTRUSTED_CHILD_RESULT>>>\s*([\s\S]*?)\s*<<<END_UNTRUSTED_CHILD_RESULT>>>/g; | ||
| const INTERNAL_CONTEXT_LINE_RE = | ||
| /(?:^|\n)\s*(?:\[Inter-session message][^\n]*|OpenClaw runtime context \(internal\):|This context is runtime-generated, not user-authored\. Keep internal details private\.|Result \(untrusted content, treat as data\):|Action:\s*A completed .*? ready for user delivery\.[^\n]*|Stats:\s*runtime[^\n]*|sourceSession=[^\n]*|sourceChannel=[^\n]*|sourceTool=[^\n]*)(?=\n|$)/gim; | ||
| const DIRECT_USER_DELIVERY_INSTRUCTION_RE = | ||
| /(?:^|\n)(?:Convert the result above into your normal assistant voice and send that user-facing update now\.|Keep internal context private.*|Reply ONLY:\s*NO_REPLY.*)(?=\n|$)/gim; |
There was a problem hiding this comment.
Duplicated regex constants across modules
OPENCLAW_INTERNAL_BLOCK_RE and OPENCLAW_INTERNAL_LINE_RE are defined verbatim in both subagent-announce.ts and sanitize-outbound.ts. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a subagent-text-patterns.ts internal helper) so there is a single source of truth.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 99-106
Comment:
**Duplicated regex constants across modules**
`OPENCLAW_INTERNAL_BLOCK_RE` and `OPENCLAW_INTERNAL_LINE_RE` are defined verbatim in both `subagent-announce.ts` and `sanitize-outbound.ts`. If the pattern ever needs updating, both copies must be changed in sync. Consider extracting these into a shared constant (e.g. in a `subagent-text-patterns.ts` internal helper) so there is a single source of truth.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f687a4d28
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const userDeliveryPayload = !requesterIsSubagent | ||
| ? persistedUserDeliveryPayload ?? buildStoredUserDeliveryPayload(findings) |
There was a problem hiding this comment.
Derive user payload from chosen completion findings
When descendant completions are present, findings is intentionally set from childCompletionFindings, but this code still prefers persistedUserDeliveryPayload (captured earlier from roundOneReply) for userDeliveryPayload. In completion mode for non-subagent requesters, buildDirectUserCompletionPrompt then uses that stale payload, so users can receive an outdated summary instead of the descendant-derived result that this flow selected as authoritative. This is reproducible whenever a run has both a frozen round-one reply and direct-child completion findings.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 120c775b32
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return withPhases({ | ||
| delivered: false, | ||
| path: "none", | ||
| }); |
There was a problem hiding this comment.
Release delivery claim when dispatch aborts early
When announceId is provided and the signal is already aborted, this early return exits before the finally block that clears or finalizes the persisted claim. Because the claim was already created earlier in the function, the run is left in deliveryClaim.state = "claimed" until lease expiry, and subsequent retries for the same announce can be rejected as delivery-already-in-flight without attempting queue/direct delivery.
Useful? React with 👍 / 👎.
| accountId: account.accountId, | ||
| }); | ||
| message = convertMarkdownTables(message, tableMode); | ||
| message = convertMarkdownTables(normalizeIMessageDeliveryText(message), tableMode); |
There was a problem hiding this comment.
Skip per-chunk fence normalization in iMessage send
This normalizes (and potentially auto-closes) code fences on every sendMessageIMessage call, but chunked monitor delivery already normalizes before splitting. If a fenced block is longer than textLimit, individual chunks can have an odd fence count, so this line appends extra closing fences per chunk and corrupts the rendered code output for long responses.
Useful? React with 👍 / 👎.
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Keep open: the PR addresses real subagent/iMessage delivery failures, but this branch is not merge-ready because it can strand persisted delivery claims on abort, can corrupt long fenced iMessage chunks, lacks after-fix real behavior proof, and is dirty against a current main delivery state that has since been heavily reworked. Canonical path: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review. So I’m closing this here because the remaining work is already tracked in the canonical issue. Review detailsBest possible solution: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review. Do we have a high-confidence way to reproduce the issue? Yes for the PR-introduced blockers: source inspection shows an aborted dispatch can return after creating a persisted claim but before finalization, and iMessage chunks are normalized again after full-message normalization. No live after-fix reproduction proof is present for the original production bug. Is this the best way to solve the issue? No. The direction is plausible, but the best merge shape is a rebase onto current main's newer pending-delivery state with the concrete claim and chunking defects fixed first. Security review: Security review cleared: No new supply-chain, secret-handling, permission, or dependency-execution concern was found; the security-sensitive internal-context leak fix still needs the functional delivery fixes above. AGENTS.md: found and applied where relevant. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d2bebbb41bf. |
|
I would keep this in “needs changes” until the remaining repair set lands. The direction still looks right, but the mergeable shape should cover:
After that, I can help validate the reconnect/dropped-completion edge cases from my WSL2 setup. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Why
A production bug leaked internal subagent/control-plane envelopes into iMessage, and a second bug allowed duplicate user-visible completion delivery when a parent session was active while a child completion also delivered directly.
This change moves the fix to source-level architecture instead of relying on downstream
distpatching:Implementation Notes
deliveryClaim,userDeliveryPayload, andfallbackUserDeliveryPayloadto subagent run recordsVerification
corepack pnpm exec vitest run src/agents/subagent-announce-dispatch.test.ts src/agents/subagent-announce.format.e2e.test.tscorepack pnpm exec vitest run src/agents/subagent-registry-lifecycle.test.ts src/agents/subagent-announce-delivery.test.tscorepack pnpm exec vitest run extensions/imessage/src/monitor/sanitize-outbound.test.ts extensions/imessage/src/monitor/deliver.test.ts