WhatsApp: send commentary live#57289
Conversation
Greptile SummaryThis PR adds live WhatsApp commentary delivery via a Key design decisions that are working correctly:
Three P2 observations:
Confidence Score: 5/5Safe to merge — no P0/P1 issues found; all three observations are P2 design/style notes that do not affect correctness. The implementation is well-scoped behind a new config flag with a safe default, backed by comprehensive seam tests covering abort, timeout, compaction retry, async drain, and cross-turn deduplication. Abort/cleanup semantics are correct, the delivered-commentary filter is explicitly tested, and the channel boundary is guarded by a Telegram boundary test. All findings are P2.
|
| Filename | Overview |
|---|---|
| src/agents/pi-embedded-commentary.ts | New file: extracts structured assistant output segments (commentary vs final_answer) with stable IDs from textSignature or auto-generated fallbacks. Segment grouping and phase resolution logic is correct. |
| src/agents/pi-embedded-subscribe.ts | Adds commentary delivery queue with per-segment abort controllers, generation-based staleness tracking, and compaction-safe cleanup. The seam API is well-structured. |
| src/agents/pi-embedded-runner/run/attempt.ts | Adds waitForCommentaryDeliveryBounded with per-round timeout scaling; wires assistantOutputs and deliveredCommentarySegmentIds into the attempt result. Minor: 15_000 fallback duplicates the constant in process-message.ts. |
| src/agents/pi-embedded-runner/run/payloads.ts | Filters delivered commentary from the final payload using assistantOutputs + deliveredCommentarySegmentIds. Removes lastAssistant text fallback — intentional by design but changes behavior when all outputs are delivered commentary. |
| extensions/whatsapp/src/auto-reply/deliver-reply.ts | Adds abortSignal/timeoutMs support; replaces sendMediaWithLeadingCaption with an inline loop; adds sleepWithAbort and awaitWithAbort helpers. Media error fallback behavior preserved. Abort cleanup in finally is correct. |
| extensions/whatsapp/src/auto-reply/monitor/process-message.ts | Wires onCommentaryReply only when commentaryDelivery === 'live'; extracts sendWhatsAppPayload helper shared by commentary and final paths; updates !queuedFinal return to reflect live-sent commentary correctly. |
| src/auto-reply/reply/reply-delivery.ts | Adds normalizeDirectiveAliases and trimLeadingWhitespace to normalizeReplyPayloadDirectives. Alias normalization applies globally to all channels — confirm this is intentional. |
| src/agents/pi-embedded-subscribe.commentary.test.ts | New test file with 8 scenarios covering: no-callback no-op, live emission after non-terminal detection, message_end delivery, unsigned ID stability, cross-turn distinctness, async drain correctness, and compaction-retry abort/replay suppression. |
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/reply-delivery.ts
Line: 26-28
Comment:
**`normalizeDirectiveAliases` applies globally to all channels**
`normalizeDirectiveAliases` is now called unconditionally inside `normalizeReplyPayloadDirectives`, meaning camelCase directive aliases (`[[replyToCurrent]]`, `[[audioAsVoice]]`, `[[replyTo:...]]`) are normalized for *all* channels and callers — not just the new WhatsApp commentary path.
This may be intentional as a general cleanup (the test `"normalizes commentary directives before delivery"` explicitly verifies it), but it is an undocumented side-effect on Telegram, Slack, and other channels that is outside the stated PR scope. If the model was previously emitting `[[replyToCurrent]]` on any channel and that literal token was expected to pass through unchanged, it will now be silently rewritten to `[[reply_to_current]]`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 1290-1292
Comment:
**Hardcoded `15_000` timeout duplicated across two files**
`COMMENTARY_REPLY_TIMEOUT_MS = 15_000` in `process-message.ts` is wired as `blockReplyTimeoutMs`, and `attempt.ts` has `params.blockReplyTimeoutMs ?? 15_000` as the fallback inside `waitForCommentaryDeliveryBounded`. Both constants must be kept in sync manually. If the timeout in `process-message.ts` is updated but the fallback here is not, the scaling logic (`commentaryDeliveryTimeoutMs * pendingCount`) would silently use a different cap for callers that omit `blockReplyTimeoutMs`.
Consider exporting the constant from a shared location or adding a comment cross-referencing both sites.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/payloads.ts
Line: 248-270
Comment:
**`assistantTexts` no longer consulted as fallback when `assistantOutputs` is non-empty but entirely filtered**
The old fallback chain was: `assistantTexts || extractAssistantText(lastAssistant) || []`. The new logic uses `assistantOutputs` when non-empty, but if all entries are commentary segments that were already delivered live, `filteredAssistantOutputs` becomes `[]` — and `assistantTexts` is *not* consulted.
The test `"does not fall back to assistantTexts when delivered commentary strips all assistant outputs"` confirms this is intentional. The design decision is sound, but operators should be aware: when the model emits only `phase: "commentary"` blocks (no `phase: "final_answer"`) and all are delivered live, the final WhatsApp reply will be empty.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "WhatsApp: send commentary live" | Re-trigger Greptile
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7200a5830
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the open review items and pushed 2931754962. What changed:
Validation:
I left the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2931754962
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the remaining unresolved bot threads in What changed:
Verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6997165d4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the two remaining bot threads in 5508a7e796. What changed:
Validation:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5508a7e796
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5508a7e to
0bea12b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0bea12bb89
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Superseded by #57484. This replacement PR carries the same feature on a fresh current-main branch with clean history, plus the latest follow-up fix for the |
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
Vulnerabilities1. 🟠 Unbounded growth of commentary segment tracking (Map/Set) from model-controlled segmentId leading to memory/CPU DoS
DescriptionThe live commentary delivery implementation tracks delivered/pending segments in process memory keyed by
A malicious or buggy model (potentially influenced by prompt injection) can continuously emit new unique Vulnerable code (memory growth on successful delivery): state.deliveredCommentarySegmentIds.add(segment.segmentId);
state.deliveredCommentarySegmentTexts.set(
segment.segmentId,
options?.deliveredText ?? segment.text,
);Model-controlled const parsed = JSON.parse(trimmed) as { id?: unknown; phase?: unknown };
...
return { id: parsed.id, ... };While RecommendationIntroduce strict bounds and cleanup for commentary segment tracking:
Example (simple hard limits): const MAX_SEGMENTS = 500;
const MAX_SEGMENT_ID_LEN = 128;
const MAX_DELIVERED_TEXT_LEN = 10_000;
function normalizeSegmentId(id: string): string | null {
const trimmed = id.trim();
if (trimmed.length === 0 || trimmed.length > MAX_SEGMENT_ID_LEN) return null;
return trimmed;
}
function rememberDelivered(segmentId: string, text: string) {
if (state.deliveredCommentarySegmentIds.size >= MAX_SEGMENTS) {
// evict oldest / use LRU structure
const oldest = state.deliveredCommentarySegmentIds.values().next().value;
state.deliveredCommentarySegmentIds.delete(oldest);
state.deliveredCommentarySegmentTexts.delete(oldest);
}
state.deliveredCommentarySegmentIds.add(segmentId);
state.deliveredCommentarySegmentTexts.set(segmentId, text.slice(0, MAX_DELIVERED_TEXT_LEN));
}Also consider applying similar bounds when parsing 2. 🟡 Unbounded commentary delivery timeout multiplication enables denial-of-service and timer overflow
Description
Vulnerable code: const pendingCount = Math.max(1, getPendingCommentaryDeliveryCount());
const roundTimeoutMs = commentaryDeliveryTimeoutMs * pendingCount;
...
timer = setTimeout(() => reject(timeoutError), roundTimeoutMs);RecommendationCap both the queue-derived multiplier and the resulting timeout to a safe maximum, and avoid relying on Example: const MAX_PENDING_MULTIPLIER = 10; // or another small constant
const MAX_TIMEOUT_MS = 60_000; // e.g., 60s, product decision
const rawPending = getPendingCommentaryDeliveryCount();
const pendingCount = Math.max(1, Math.min(MAX_PENDING_MULTIPLIER, rawPending));
const roundTimeoutMs = Math.min(
MAX_TIMEOUT_MS,
commentaryDeliveryTimeoutMs * pendingCount,
);Additionally, consider using a fixed per-round timeout (not multiplied), and separately limit/flush the commentary queue size to prevent unbounded growth. 3. 🟡 Live WhatsApp commentary delivery can expose internal reasoning/tool output to end users
Description
This can result in unintended information disclosure of internal model reasoning, tool results, or system details to WhatsApp recipients when the feature is enabled. RecommendationTreat live commentary as unsafe-by-default for external channels unless the content is guaranteed user-safe. Suggested mitigations:
Example (block free-form commentary on WhatsApp): onCommentaryReply:
commentaryDelivery === "live"
? async (_payload, _context) => {
// Intentionally do not forward model commentary to WhatsApp.
// Optionally send a fixed typing/progress indicator instead.
}
: undefined,Or if forwarding is required, add an explicit allowlist and suppress any payload flagged/heuristically detected as reasoning: if (payload.isReasoning) return;
if (!isAllowedCommentary(payload.text ?? "")) return;Analyzed PR: #57289 at commit Last updated on: 2026-03-30T09:55:14Z |
|
Addressed the latest bot feedback and pushed What changed:
Validation:
Note: the first plain On the Aisle item about live commentary exposure: this PR keeps free-form WhatsApp commentary as an explicit opt-in ( |
Summary
onCommentaryReplyseam in the embedded runner/subscribe path, wired it only for WhatsApp behindchannels.whatsapp.commentaryDelivery: "off" | "live", and filtered already-delivered commentary out of the final payload assembly.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause / Regression History (if applicable)
N/A
git blame, prior PR, issue, or refactor if known): previous attempts lived in WhatsApp: send commentary updates live #43731 and the rebased-but-stale replacement WhatsApp: send commentary live #57279.Regression Test Plan (if applicable)
src/agents/pi-embedded-subscribe.commentary.test.tssrc/agents/pi-embedded-runner/run/payloads.test.tsextensions/whatsapp/src/auto-reply/deliver-reply.test.tsUser-visible / Behavior Changes
channels.whatsapp.commentaryDelivery: "off" | "live""live", commentary is sent to the WhatsApp chat in real time and the final answer is still delivered separately.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation: N/ARepro + Verification
Environment
channels.whatsapp.commentaryDelivery: "live"Steps
channels.whatsapp.commentaryDelivery: "live"for the WhatsApp account/channel.Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
0behind currentorigin/mainpnpm checkpassed during commit hooks, includingpnpm tsgoand lint gatespnpm buildcompleted successfully on this branchsrc/agents/pi-embedded-subscribe.commentary.test.tssrc/agents/pi-embedded-runner/run/payloads.test.tsextensions/whatsapp/src/auto-reply/deliver-reply.test.tsextensions/telegram/src/bot-message-dispatch.test.tsextensions/whatsapp/src/auto-reply/monitor/process-message.inbound-context.test.tsandsrc/config/validation.allowed-values.test.tshit an existing Vitest worker hang in this workspace when run via the wrapper, so I am not claiming clean completion for those two files here.Review Conversations
Compatibility / Migration
channels.whatsapp.commentaryDeliveryto"live"Risks and Mitigations
.finally.