Skip to content

WhatsApp: send commentary live#57289

Closed
tristanmanchester wants to merge 4 commits into
openclaw:mainfrom
tristanmanchester:codex/whatsapp-commentary-main-clean
Closed

WhatsApp: send commentary live#57289
tristanmanchester wants to merge 4 commits into
openclaw:mainfrom
tristanmanchester:codex/whatsapp-commentary-main-clean

Conversation

@tristanmanchester

@tristanmanchester tristanmanchester commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: WhatsApp only returned commentary once the final answer arrived, so agent progress updates were held back until the end of the turn.
  • Why it matters: live commentary is useful on chat surfaces only if it is delivered during the run, while still keeping the final answer separate and preserving existing behavior for other channels.
  • What changed: added a channel-agnostic onCommentaryReply seam in the embedded runner/subscribe path, wired it only for WhatsApp behind channels.whatsapp.commentaryDelivery: "off" | "live", and filtered already-delivered commentary out of the final payload assembly.
  • What did NOT change (scope boundary): this does not enable live commentary for Telegram, Slack, or any other channel, and it does not route commentary through generic block streaming.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

N/A

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/pi-embedded-subscribe.commentary.test.ts
    • src/agents/pi-embedded-runner/run/payloads.test.ts
    • extensions/whatsapp/src/auto-reply/deliver-reply.test.ts
  • Scenario the test should lock in: live commentary emits once per finalized commentary segment, already-delivered commentary is filtered from the final reply, WhatsApp delivery honors timeout/abort semantics, and final replies remain separate.
  • Why this is the smallest reliable guardrail: the feature is implemented at the embedded subscribe/runner seam and only channel-wired in WhatsApp, so seam tests plus WhatsApp delivery tests cover the real behavior without requiring a full live gateway run.
  • Existing test that already covers this (if any): Telegram boundary coverage continues to assert that other channels remain unwired.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • New WhatsApp config: channels.whatsapp.commentaryDelivery: "off" | "live"
  • Default behavior stays final-only.
  • When set to "live", commentary is sent to the WhatsApp chat in real time and the final answer is still delivered separately.
  • Commentary already sent live is not repeated in the final reply.

Diagram (if applicable)

Before:
[assistant commentary] -> [buffer until run end] -> [commentary + final answer delivered together]

After:
[assistant commentary] -> [subscribe seam emits commentary] -> [WhatsApp sends live commentary]
[final answer] -> [final payload assembly filters delivered commentary] -> [final answer only]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 24 / pnpm workspace
  • Model/provider: N/A
  • Integration/channel (if any): WhatsApp
  • Relevant config (redacted): channels.whatsapp.commentaryDelivery: "live"

Steps

  1. Enable channels.whatsapp.commentaryDelivery: "live" for the WhatsApp account/channel.
  2. Send a prompt that produces assistant commentary before the final answer.
  3. Observe the WhatsApp thread during the run and after completion.

Expected

  • Commentary appears in the chat as it is produced.
  • The final answer still arrives as a separate reply.
  • Commentary already delivered live is not appended again to the final reply.

Actual

  • Matches expected behavior in manual WhatsApp validation.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • rebased branch is 0 behind current origin/main
    • pnpm check passed during commit hooks, including pnpm tsgo and lint gates
    • pnpm build completed successfully on this branch
    • targeted tests passed:
      • src/agents/pi-embedded-subscribe.commentary.test.ts
      • src/agents/pi-embedded-runner/run/payloads.test.ts
      • extensions/whatsapp/src/auto-reply/deliver-reply.test.ts
      • extensions/telegram/src/bot-message-dispatch.test.ts
  • Edge cases checked:
    • delivered live commentary is filtered from final payloads
    • empty finalized commentary outputs still fall back to streamed assistant text for timeout/abort paths
    • commentary delivery uses abort/timeout cleanup without mutating upstream errors
  • What you did not verify:
    • I did not personally re-run a live WhatsApp chat on this rebased branch; the end-to-end WhatsApp behavior was manually validated by the operator.
    • extensions/whatsapp/src/auto-reply/monitor/process-message.inbound-context.test.ts and src/config/validation.allowed-values.test.ts hit an existing Vitest worker hang in this workspace when run via the wrapper, so I am not claiming clean completion for those two files here.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (Yes)
  • Migration needed? (No)
  • If yes, exact upgrade steps:
    • optional: set channels.whatsapp.commentaryDelivery to "live"

Risks and Mitigations

  • Risk: commentary and final payload reconciliation could double-send or drop text on unusual run-end timing.
    • Mitigation: run-scoped segment IDs, commentary queue versioning, delivered-commentary filtering, and regression coverage for timeout/abort fallback behavior.
  • Risk: commentary callback cleanup could leak abort listeners during long runs.
    • Mitigation: each queued delivery now removes the upstream abort listener in .finally.

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web agents Agent runtime and tooling size: XL labels Mar 29, 2026
@greptile-apps

greptile-apps Bot commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds live WhatsApp commentary delivery via a commentaryDelivery: "off" | "live" config flag. While an agent run is in progress, intermediate commentary-phase assistant messages are now sent to the WhatsApp thread in real time, and already-delivered commentary is filtered out of the final reply to avoid duplication. The implementation is gated entirely behind a new config flag, is backward-compatible (default stays "off"), and does not affect any other channel.

Key design decisions that are working correctly:

  • A serial per-run commentary delivery queue with generation-based staleness tracking, per-segment AbortController wiring, and upstream-abort forwarding prevents double-delivery or leaked listeners.
  • seenLiveCommentarySegmentIds prevents re-queuing a streaming segment already picked up from message_update; deliveredCommentarySegmentIds survives compaction retry to suppress replay of the same segment ID.
  • waitForCommentaryDeliveryBounded in the attempt runner uses a pendingCount-scaled timeout and catches runner-abort errors cleanly before proceeding to final payload assembly.
  • The !queuedFinal → return didSendReply change in process-message.ts correctly surfaces true when live commentary was the only output from a turn.

Three P2 observations:

  • normalizeDirectiveAliases is injected into normalizeReplyPayloadDirectives globally, affecting all channels — confirm this is intentional.
  • The 15_000 ms timeout is hardcoded in two places and must be kept in sync manually.
  • When all assistantOutputs are delivered commentary, the final reply is intentionally empty — operators should be aware a turn can produce a zero-text final reply with commentaryDelivery: "live".

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; all three observations are P2 design/style notes that do not affect correctness.

The implementation is well-scoped behind a new config flag with a safe default, backed by comprehensive seam tests covering abort, timeout, compaction retry, async drain, and cross-turn deduplication. Abort/cleanup semantics are correct, the delivered-commentary filter is explicitly tested, and the channel boundary is guarded by a Telegram boundary test. All findings are P2.

src/auto-reply/reply/reply-delivery.ts — confirm global normalizeDirectiveAliases application is intentional. src/agents/pi-embedded-runner/run/attempt.ts and extensions/whatsapp/src/auto-reply/monitor/process-message.ts — duplicate 15_000 constant needs to stay in sync.

Important Files Changed

Filename Overview
src/agents/pi-embedded-commentary.ts New file: extracts structured assistant output segments (commentary vs final_answer) with stable IDs from textSignature or auto-generated fallbacks. Segment grouping and phase resolution logic is correct.
src/agents/pi-embedded-subscribe.ts Adds commentary delivery queue with per-segment abort controllers, generation-based staleness tracking, and compaction-safe cleanup. The seam API is well-structured.
src/agents/pi-embedded-runner/run/attempt.ts Adds waitForCommentaryDeliveryBounded with per-round timeout scaling; wires assistantOutputs and deliveredCommentarySegmentIds into the attempt result. Minor: 15_000 fallback duplicates the constant in process-message.ts.
src/agents/pi-embedded-runner/run/payloads.ts Filters delivered commentary from the final payload using assistantOutputs + deliveredCommentarySegmentIds. Removes lastAssistant text fallback — intentional by design but changes behavior when all outputs are delivered commentary.
extensions/whatsapp/src/auto-reply/deliver-reply.ts Adds abortSignal/timeoutMs support; replaces sendMediaWithLeadingCaption with an inline loop; adds sleepWithAbort and awaitWithAbort helpers. Media error fallback behavior preserved. Abort cleanup in finally is correct.
extensions/whatsapp/src/auto-reply/monitor/process-message.ts Wires onCommentaryReply only when commentaryDelivery === 'live'; extracts sendWhatsAppPayload helper shared by commentary and final paths; updates !queuedFinal return to reflect live-sent commentary correctly.
src/auto-reply/reply/reply-delivery.ts Adds normalizeDirectiveAliases and trimLeadingWhitespace to normalizeReplyPayloadDirectives. Alias normalization applies globally to all channels — confirm this is intentional.
src/agents/pi-embedded-subscribe.commentary.test.ts New test file with 8 scenarios covering: no-callback no-op, live emission after non-terminal detection, message_end delivery, unsigned ID stability, cross-turn distinctness, async drain correctness, and compaction-retry abort/replay suppression.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/reply-delivery.ts
Line: 26-28

Comment:
**`normalizeDirectiveAliases` applies globally to all channels**

`normalizeDirectiveAliases` is now called unconditionally inside `normalizeReplyPayloadDirectives`, meaning camelCase directive aliases (`[[replyToCurrent]]`, `[[audioAsVoice]]`, `[[replyTo:...]]`) are normalized for *all* channels and callers — not just the new WhatsApp commentary path.

This may be intentional as a general cleanup (the test `"normalizes commentary directives before delivery"` explicitly verifies it), but it is an undocumented side-effect on Telegram, Slack, and other channels that is outside the stated PR scope. If the model was previously emitting `[[replyToCurrent]]` on any channel and that literal token was expected to pass through unchanged, it will now be silently rewritten to `[[reply_to_current]]`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 1290-1292

Comment:
**Hardcoded `15_000` timeout duplicated across two files**

`COMMENTARY_REPLY_TIMEOUT_MS = 15_000` in `process-message.ts` is wired as `blockReplyTimeoutMs`, and `attempt.ts` has `params.blockReplyTimeoutMs ?? 15_000` as the fallback inside `waitForCommentaryDeliveryBounded`. Both constants must be kept in sync manually. If the timeout in `process-message.ts` is updated but the fallback here is not, the scaling logic (`commentaryDeliveryTimeoutMs * pendingCount`) would silently use a different cap for callers that omit `blockReplyTimeoutMs`.

Consider exporting the constant from a shared location or adding a comment cross-referencing both sites.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/payloads.ts
Line: 248-270

Comment:
**`assistantTexts` no longer consulted as fallback when `assistantOutputs` is non-empty but entirely filtered**

The old fallback chain was: `assistantTexts || extractAssistantText(lastAssistant) || []`. The new logic uses `assistantOutputs` when non-empty, but if all entries are commentary segments that were already delivered live, `filteredAssistantOutputs` becomes `[]` — and `assistantTexts` is *not* consulted.

The test `"does not fall back to assistantTexts when delivered commentary strips all assistant outputs"` confirms this is intentional. The design decision is sound, but operators should be aware: when the model emits only `phase: "commentary"` blocks (no `phase: "final_answer"`) and all are delivered live, the final WhatsApp reply will be empty.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "WhatsApp: send commentary live" | Re-trigger Greptile

Comment thread src/auto-reply/reply/reply-delivery.ts
Comment thread src/agents/pi-embedded-runner/run/attempt.ts
Comment thread src/agents/pi-embedded-runner/run/payloads.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7200a5830

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-commentary.ts
@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label Mar 29, 2026
@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Addressed the open review items and pushed 2931754962.

What changed:

  • scoped camelCase directive alias normalization to the WhatsApp commentary delivery path instead of applying it globally
  • shared the commentary reply timeout constant between the WhatsApp caller and the runner fallback
  • merged repeated same-signature assistant text blocks before storing/delivering them, with coverage for the OpenAI Responses-style split-block case
  • regenerated the plugin SDK API baseline after exporting the shared timeout constant

Validation:

  • pnpm test -- src/auto-reply/reply/reply-delivery.test.ts src/agents/pi-embedded-subscribe.commentary.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts
  • pnpm check
  • pnpm build

I left the assistantTexts fallback thread open for now because that behavior still looks intentional: if we reintroduce the fallback when commentary-only segments were already delivered live, we risk duplicating text instead of producing a true final-only payload.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2931754962

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-subscribe.handlers.messages.ts Outdated
Comment thread extensions/whatsapp/src/auto-reply/deliver-reply.ts
@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Addressed the remaining unresolved bot threads in 6997165d4f.

What changed:

  • restore assistantTexts fallback when live-delivered commentary strips all finalized assistant outputs
  • track delivered commentary text per segment so later updates send only the appended suffix and message_end flushes any remaining tail
  • propagate AbortError from later WhatsApp media sends so truncated commentary is not marked as delivered

Verification:

  • pnpm test -- src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-subscribe.commentary.test.ts extensions/whatsapp/src/auto-reply/deliver-reply.test.ts
  • pnpm check

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6997165d4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/payloads.ts Outdated
Comment thread src/agents/pi-embedded-subscribe.handlers.messages.ts
@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Addressed the two remaining bot threads in 5508a7e796.

What changed:

  • preserve undelivered commentary suffixes in the final payload by carrying delivered commentary text through the runner result
  • queue a follow-up live commentary delta after an in-flight send completes if the same segment grew while pending, including the message_end case with no later events

Validation:

  • pnpm test -- src/agents/pi-embedded-subscribe.commentary.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts
  • pnpm build

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5508a7e796

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run.ts
@tristanmanchester tristanmanchester force-pushed the codex/whatsapp-commentary-main-clean branch from 5508a7e to 0bea12b Compare March 30, 2026 04:11

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bea12bb89

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/payloads.ts
@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Superseded by #57484.

This replacement PR carries the same feature on a fresh current-main branch with clean history, plus the latest follow-up fix for the lastAssistant timeout/abort fallback.

@aisle-research-bot

aisle-research-bot Bot commented Mar 30, 2026

Copy link
Copy Markdown

🔒 Aisle Security Analysis

We found 3 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Unbounded growth of commentary segment tracking (Map/Set) from model-controlled segmentId leading to memory/CPU DoS
2 🟡 Medium Unbounded commentary delivery timeout multiplication enables denial-of-service and timer overflow
3 🟡 Medium Live WhatsApp commentary delivery can expose internal reasoning/tool output to end users
Vulnerabilities

1. 🟠 Unbounded growth of commentary segment tracking (Map/Set) from model-controlled segmentId leading to memory/CPU DoS

Property Value
Severity High
CWE CWE-400
Location src/agents/pi-embedded-subscribe.ts:730-734

Description

The live commentary delivery implementation tracks delivered/pending segments in process memory keyed by segmentId, which can be controlled by the model via textSignature (either raw string or JSON with an id). There are no limits on:

  • Number of unique segmentId values
  • Length of segmentId strings
  • Amount of text stored per segment (deliveredCommentarySegmentTexts stores full text per id)

A malicious or buggy model (potentially influenced by prompt injection) can continuously emit new unique textSignature IDs, causing deliveredCommentarySegmentIds/deliveredCommentarySegmentTexts to grow without bound for the lifetime of the subscription/session, eventually exhausting memory. Additionally, each unique segment can enqueue work onto the serialized commentaryDeliveryQueue, amplifying CPU/time usage.

Vulnerable code (memory growth on successful delivery):

state.deliveredCommentarySegmentIds.add(segment.segmentId);
state.deliveredCommentarySegmentTexts.set(
  segment.segmentId,
  options?.deliveredText ?? segment.text,
);

Model-controlled segmentId source (textSignature):

const parsed = JSON.parse(trimmed) as { id?: unknown; phase?: unknown };
...
return { id: parsed.id, ... };

While abortCommentaryDelivery() clears pendingCommentarySegmentIds, it does not clear deliveredCommentarySegmentIds or deliveredCommentarySegmentTexts, so growth persists across aborts/retries within a session.

Recommendation

Introduce strict bounds and cleanup for commentary segment tracking:

  1. Validate and constrain segmentId (length + allowed charset), and ignore/replace invalid IDs.
  2. Cap the number of tracked segments (LRU/TTL) for both delivered IDs and delivered texts.
  3. Avoid storing full delivered text indefinitely; store only an offset/length or a hash, or limit stored text size.
  4. Clear delivered segment state on abort/unsubscribe and when starting a new attempt/session.

Example (simple hard limits):

const MAX_SEGMENTS = 500;
const MAX_SEGMENT_ID_LEN = 128;
const MAX_DELIVERED_TEXT_LEN = 10_000;

function normalizeSegmentId(id: string): string | null {
  const trimmed = id.trim();
  if (trimmed.length === 0 || trimmed.length > MAX_SEGMENT_ID_LEN) return null;
  return trimmed;
}

function rememberDelivered(segmentId: string, text: string) {
  if (state.deliveredCommentarySegmentIds.size >= MAX_SEGMENTS) {// evict oldest / use LRU structure
    const oldest = state.deliveredCommentarySegmentIds.values().next().value;
    state.deliveredCommentarySegmentIds.delete(oldest);
    state.deliveredCommentarySegmentTexts.delete(oldest);
  }
  state.deliveredCommentarySegmentIds.add(segmentId);
  state.deliveredCommentarySegmentTexts.set(segmentId, text.slice(0, MAX_DELIVERED_TEXT_LEN));
}

Also consider applying similar bounds when parsing textSignature in pi-embedded-commentary.ts so oversized IDs never enter the system.


2. 🟡 Unbounded commentary delivery timeout multiplication enables denial-of-service and timer overflow

Property Value
Severity Medium
CWE CWE-400
Location src/agents/pi-embedded-runner/run/attempt.ts:1309-1317

Description

waitForCommentaryDeliveryBounded() scales its timeout linearly with the number of pending commentary segments.

  • pendingCount is derived from getPendingCommentaryDeliveryCount() (size of an in-memory queue of commentary segment IDs).
  • The code computes roundTimeoutMs = commentaryDeliveryTimeoutMs * pendingCount and passes that to setTimeout().
  • If many segments are queued (which can be influenced by agent/model output producing many commentary segments), the runner can be blocked for an excessively long time waiting for commentary delivery.
  • If roundTimeoutMs exceeds Node.js's maximum supported timeout (~2^31-1 ms), setTimeout() overflows/clamps, causing the timeout behavior to become unreliable (potentially firing immediately or much sooner/later than intended), undermining the safety mechanism.

Vulnerable code:

const pendingCount = Math.max(1, getPendingCommentaryDeliveryCount());
const roundTimeoutMs = commentaryDeliveryTimeoutMs * pendingCount;
...
timer = setTimeout(() => reject(timeoutError), roundTimeoutMs);

Recommendation

Cap both the queue-derived multiplier and the resulting timeout to a safe maximum, and avoid relying on setTimeout() with values above the platform limit.

Example:

const MAX_PENDING_MULTIPLIER = 10; // or another small constant
const MAX_TIMEOUT_MS = 60_000;     // e.g., 60s, product decision
const rawPending = getPendingCommentaryDeliveryCount();
const pendingCount = Math.max(1, Math.min(MAX_PENDING_MULTIPLIER, rawPending));

const roundTimeoutMs = Math.min(
  MAX_TIMEOUT_MS,
  commentaryDeliveryTimeoutMs * pendingCount,
);

Additionally, consider using a fixed per-round timeout (not multiplied), and separately limit/flush the commentary queue size to prevent unbounded growth.


3. 🟡 Live WhatsApp commentary delivery can expose internal reasoning/tool output to end users

Property Value
Severity Medium
CWE CWE-200
Location extensions/whatsapp/src/auto-reply/monitor/process-message.ts:520-530

Description

processMessage() enables a new delivery path that forwards assistant commentary segments to WhatsApp when commentaryDelivery === "live".

  • The callback replyOptions.onCommentaryReply sends interim payloads via sendWhatsAppPayload().
  • Unlike the final-reply path, there is no guard to suppress reasoning/commentary content (e.g. payload.isReasoning suppression only exists in deliver() for info.kind === "final").
  • Commentary payloads are constructed from assistant message segments marked with phase: "commentary" (derived from the model output / phase field / textSignature JSON). This can be influenced by prompt injection (user-controlled input affecting model output), causing the model to emit chain-of-thought-like content or tool/transcript details.
  • The sanitization in extractAssistantOutputCandidates() removes some structured markers (<thinking> tags, special tokens, tool-call XML), but it does not guarantee removal of sensitive internal content if it appears as plain text (e.g., tool outputs, hidden instructions, step-by-step reasoning).

This can result in unintended information disclosure of internal model reasoning, tool results, or system details to WhatsApp recipients when the feature is enabled.

Recommendation

Treat live commentary as unsafe-by-default for external channels unless the content is guaranteed user-safe.

Suggested mitigations:

  1. Hard-block reasoning/commentary by default (even if configured) unless an explicit, high-friction allowlist is enabled.
  2. Add a strict filter for commentary payloads before sending to WhatsApp (e.g., only allow predefined status messages like "Working…", never free-form model text).
  3. If free-form commentary must be supported, enforce a separate model output channel that is guaranteed not to contain chain-of-thought/tool output, and add a final safety pass.

Example (block free-form commentary on WhatsApp):

onCommentaryReply:
  commentaryDelivery === "live"
    ? async (_payload, _context) => {// Intentionally do not forward model commentary to WhatsApp.// Optionally send a fixed typing/progress indicator instead.
      }
    : undefined,

Or if forwarding is required, add an explicit allowlist and suppress any payload flagged/heuristically detected as reasoning:

if (payload.isReasoning) return;
if (!isAllowedCommentary(payload.text ?? "")) return;

Analyzed PR: #57289 at commit 0bea12b

Last updated on: 2026-03-30T09:55:14Z

@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Addressed the latest bot feedback and pushed 4ad996ebb1 to the PR branch.

What changed:

  • restored the lastAssistant fallback when finalized assistant outputs are empty (12ab6b9a05), resolving the remaining Codex thread
  • bounds live-commentary segment IDs from model-provided textSignature values and falls back to generated segment IDs for invalid/oversized IDs
  • caps delivered-commentary tracking to 500 segment IDs and stores only a bounded text prefix plus exact delivered length, so suffix preservation still works without retaining unbounded text
  • clamps the commentary delivery wait multiplier to prevent unbounded timeout growth

Validation:

  • pnpm test -- src/agents/pi-embedded-subscribe.commentary.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts
  • pnpm check
  • npm_config_cache=/tmp/openclaw-npm-cache pnpm build

Note: the first plain pnpm build hit a local npm cache ownership problem (EPERM under /Users/tristan/.npm/_cacache), so I reran it with a temporary npm cache. The temporary-cache build passed; it still printed the existing local config warning for channels.telegram.streaming.

On the Aisle item about live commentary exposure: this PR keeps free-form WhatsApp commentary as an explicit opt-in (commentaryDelivery: "live", default off) and continues to restrict delivery to sanitized phase: "commentary" assistant output rather than reasoning/block streams. I treated that as a product/security tradeoff rather than replacing the feature with fixed status text in this follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web docs Improvements or additions to documentation size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant