Skip to content

WhatsApp: send commentary updates live#43731

Closed
tristanmanchester wants to merge 11 commits into
openclaw:mainfrom
tristanmanchester:codex/whatsapp-live-commentary-clean
Closed

WhatsApp: send commentary updates live#43731
tristanmanchester wants to merge 11 commits into
openclaw:mainfrom
tristanmanchester:codex/whatsapp-live-commentary-clean

Conversation

@tristanmanchester

@tristanmanchester tristanmanchester commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

This replaces the earlier dirty-base attempt from #42489 with the final WhatsApp-only version of the feature.

Summary

This PR makes WhatsApp optionally deliver assistant commentary updates live during a run, while keeping every other channel unchanged.

The implementation was tightened to keep the product boundary clear:

  • WhatsApp-only user-facing behavior via channels.whatsapp.commentaryDelivery = "off" | "live"
  • shared assistant-output metadata kept only where needed to identify commentary vs final_answer
  • commentary tracking moved into the embedded subscribe/message lifecycle instead of being reconstructed later in run/attempt.ts
  • WhatsApp delivery normalization and bookkeeping kept local to the WhatsApp monitor
  • no Telegram, Discord, LINE, Slack, Signal, or extension rollout in this PR
  • docs added for the new WhatsApp config

What changed

  • Preserved phase and stable text-signature metadata on OpenAI WebSocket partial assistant text blocks so live commentary can be identified before response.completed.
  • Simplified src/agents/assistant-output.ts down to extraction and per-message fallback-id helpers only.
  • Moved commentary ownership into subscribeEmbeddedPiSession():
    • collects finalized assistantOutputs
    • tracks deliveredCommentarySegmentIds
    • emits live commentary only for completed non-terminal commentary segments
    • retries missed commentary on message_end
    • keeps undelivered commentary eligible for the final payload
  • Removed the polling/reconciliation loop from src/agents/pi-embedded-runner/run/attempt.ts.
  • Removed the shared deliverNormalizedReplyPayload() abstraction and kept the WhatsApp send path local in src/web/auto-reply/monitor/process-message.ts.
  • Removed the broad system-prompt guidance that implied commentary/final-answer behavior across all messaging surfaces.
  • Documented channels.whatsapp.commentaryDelivery in the WhatsApp channel docs and configuration reference.

Scope boundary

Included:

  • WhatsApp live commentary delivery
  • minimum shared metadata/plumbing required to support it
  • Telegram regression coverage to prove other channels remain unchanged

Explicitly not included:

  • cross-channel live commentary rollout
  • global commentaryDelivery config
  • new generic dispatcher kind for commentary

User-visible behavior

  • Default behavior is unchanged: WhatsApp still sends only the final reply when channels.whatsapp.commentaryDelivery is unset or off.
  • With channels.whatsapp.commentaryDelivery: "live", interim assistant commentary updates are sent as separate WhatsApp messages during the run.
  • The final WhatsApp reply still arrives separately.
  • Commentary already delivered live is stripped from the terminal payload; commentary that was not delivered live remains eligible for the final payload.

Verification

Manual / behavior checks:

  • verified WhatsApp live commentary delivery remains opt-in and final replies still send separately
  • verified reply directives are normalized before live commentary send
  • verified silent commentary with no media is suppressed
  • verified media-only commentary is still deliverable
  • verified Telegram remains unwired for commentary callbacks

Tests run:

  • pnpm exec vitest run src/agents/openai-ws-stream.test.ts
  • pnpm exec vitest run src/agents/pi-embedded-subscribe*.test.ts
  • pnpm exec vitest run src/agents/assistant-output.test.ts src/auto-reply/reply/reply-delivery.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts src/web/auto-reply/monitor/process-message.inbound-contract.test.ts src/telegram/bot-message-dispatch.test.ts
  • pnpm format
  • pnpm tsgo

Risk / rollback

  • Fast rollback: set channels.whatsapp.commentaryDelivery=off
  • Main risks reviewers should watch for: duplicate live commentary, commentary directives leaking into WhatsApp, or final replies missing after live commentary delivery

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web agents Agent runtime and tooling labels Mar 12, 2026
@greptile-apps

greptile-apps Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces WhatsApp live-commentary delivery by wiring onCommentaryReply through the embedded runner and extracting shared reply-delivery infrastructure. The implementation is thorough: a new assistant-output.ts module provides phase-aware segment extraction and two-path reconciliation (live stream and finalized snapshot), the WebSocket stream now carries phase/textSignature metadata on partial deltas, the payload builder correctly filters already-delivered commentary from the terminal reply, and the Telegram regression test enforces the scope boundary.

Key changes and observations:

  • New assistant-output.ts: extractAssistantOutputCandidates correctly handles signed vs. unsigned segments, boundary splitting on non-text blocks, and isTerminal gating to avoid sending still-growing commentary. Compaction cursor recovery via resolveCompactedPromptBoundaryIndex is a well-reasoned solution for the message-history compaction edge case.
  • liveAssistantFallbackMessageIdsByFingerprint (module-level Map): Never cleared. Grows for every unique content fingerprint encountered during the server's lifetime. The WeakMap counterpart is safe, but this strong-reference Map accumulates indefinitely. For long-running servers with diverse message content, a size cap or eviction strategy should be considered.
  • Polling loop in attempt.ts: The 250 ms commentary reconciler loop starts unconditionally for every run, even when onCommentaryReply is undefined. No delivery occurs without a callback, but the loop still scans messages every tick. Guarding loop startup on params.onCommentaryReply would avoid unnecessary work when commentary delivery is disabled.
  • Return value of processMessage: When !queuedFinal, the function now returns didSendReply instead of the previous hard-coded false. This is semantically correct (a reply was sent via commentary), and current callers discard the return value, so there is no live regression. The implicit contract change is worth documenting.
  • reply-delivery.ts extraction: Clean abstraction. Directive normalization, silent suppression, and per-kind bookkeeping hooks are all reusable and well-tested.
  • Test coverage is extensive: 731-line assistant-output.test.ts, new reply-delivery.test.ts, updated openai-ws-stream.test.ts, updated payloads.test.ts, and the Telegram regression guard.

Confidence Score: 4/5

  • Safe to merge with minor style concerns; no correctness regressions found and scope is well-bounded.
  • The core reconciliation logic is correct, the dual-set deduplication design prevents double delivery, and the fallback/filter path in payload construction is sound. The main open items are: (1) the unbounded liveAssistantFallbackMessageIdsByFingerprint Map which could grow over a server's lifetime, and (2) the unconditional polling loop when commentary delivery is disabled. Neither is a correctness bug for the current usage — the Map entries are small strings and the polling is inexpensive — but both warrant attention before this pattern is reused in higher-throughput contexts.
  • src/agents/assistant-output.ts (module-level Map growth) and src/agents/pi-embedded-runner/run/attempt.ts (unconditional polling loop).

Comments Outside Diff (2)

  1. src/agents/pi-embedded-runner/run/attempt.ts, line 1757-1766 (link)

    Reconciler loop runs unconditionally regardless of commentary delivery setting

    The commentaryReconciler polling loop starts for every run, even when params.onCommentaryReply is undefined (i.e., when commentaryDelivery is "off" or the channel doesn't wire it). The reconcilers themselves are guarded by params.onCommentaryReply for actual delivery, but the 250 ms polling loop still executes reconcileLiveStreamCommentary() and reconcileFinalizedAssistantOutputs() on every tick, doing work — fingerprint hashing, message scanning, set lookups — that has no observable effect when delivery is disabled.

    Consider skipping the loop entirely when there is no consumer:

    const commentaryReconciler = params.onCommentaryReply
      ? (async () => {
          while (!stopCommentaryReconciler) {
            await reconcileLiveStreamCommentary();
            await reconcileFinalizedAssistantOutputs(activeSession.messages);
            if (stopCommentaryReconciler) break;
            await sleep(250);
          }
        })()
      : Promise.resolve();

    This also slightly reduces the surface area for the loop-related races if live commentary is never used for the current channel.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agents/pi-embedded-runner/run/attempt.ts
    Line: 1757-1766
    
    Comment:
    **Reconciler loop runs unconditionally regardless of commentary delivery setting**
    
    The `commentaryReconciler` polling loop starts for every run, even when `params.onCommentaryReply` is `undefined` (i.e., when `commentaryDelivery` is `"off"` or the channel doesn't wire it). The reconcilers themselves are guarded by `params.onCommentaryReply` for actual delivery, but the 250 ms polling loop still executes `reconcileLiveStreamCommentary()` and `reconcileFinalizedAssistantOutputs()` on every tick, doing work — fingerprint hashing, message scanning, set lookups — that has no observable effect when delivery is disabled.
    
    Consider skipping the loop entirely when there is no consumer:
    
    ```ts
    const commentaryReconciler = params.onCommentaryReply
      ? (async () => {
          while (!stopCommentaryReconciler) {
            await reconcileLiveStreamCommentary();
            await reconcileFinalizedAssistantOutputs(activeSession.messages);
            if (stopCommentaryReconciler) break;
            await sleep(250);
          }
        })()
      : Promise.resolve();
    ```
    
    This also slightly reduces the surface area for the loop-related races if live commentary is never used for the current channel.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. src/web/auto-reply/monitor/process-message.ts, line 507-514 (link)

    Return value semantics changed when no final reply was queued

    Previously, the if (!queuedFinal) branch always returned false. It now returns didSendReply, which will be true if at least one live commentary message was successfully delivered — even if no final reply was ever produced by the agent (e.g., because the run ended silently or with an error after commentary was already sent).

    Callers in broadcast.ts and on-message.ts currently discard the return value, so this is not a live regression. But the semantics shift is worth documenting: processMessage previously meant "a final reply was sent to the user" and now means "any reply (commentary or final) was sent." If future callers distinguish these cases, the implicit contract change could cause subtle bugs.

    A comment on the return type clarifying the new semantics would help:

    // Returns true if any reply (including live commentary) was delivered to the user.
    export async function processMessage(...)
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/web/auto-reply/monitor/process-message.ts
    Line: 507-514
    
    Comment:
    **Return value semantics changed when no final reply was queued**
    
    Previously, the `if (!queuedFinal)` branch always returned `false`. It now returns `didSendReply`, which will be `true` if at least one live commentary message was successfully delivered — even if no final reply was ever produced by the agent (e.g., because the run ended silently or with an error after commentary was already sent).
    
    Callers in `broadcast.ts` and `on-message.ts` currently discard the return value, so this is not a live regression. But the semantics shift is worth documenting: `processMessage` previously meant "a _final_ reply was sent to the user" and now means "any reply (commentary or final) was sent." If future callers distinguish these cases, the implicit contract change could cause subtle bugs.
    
    A comment on the return type clarifying the new semantics would help:
    
    ```ts
    // Returns true if any reply (including live commentary) was delivered to the user.
    export async function processMessage(...)
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 1757-1766

Comment:
**Reconciler loop runs unconditionally regardless of commentary delivery setting**

The `commentaryReconciler` polling loop starts for every run, even when `params.onCommentaryReply` is `undefined` (i.e., when `commentaryDelivery` is `"off"` or the channel doesn't wire it). The reconcilers themselves are guarded by `params.onCommentaryReply` for actual delivery, but the 250 ms polling loop still executes `reconcileLiveStreamCommentary()` and `reconcileFinalizedAssistantOutputs()` on every tick, doing work — fingerprint hashing, message scanning, set lookups — that has no observable effect when delivery is disabled.

Consider skipping the loop entirely when there is no consumer:

```ts
const commentaryReconciler = params.onCommentaryReply
  ? (async () => {
      while (!stopCommentaryReconciler) {
        await reconcileLiveStreamCommentary();
        await reconcileFinalizedAssistantOutputs(activeSession.messages);
        if (stopCommentaryReconciler) break;
        await sleep(250);
      }
    })()
  : Promise.resolve();
```

This also slightly reduces the surface area for the loop-related races if live commentary is never used for the current channel.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/web/auto-reply/monitor/process-message.ts
Line: 507-514

Comment:
**Return value semantics changed when no final reply was queued**

Previously, the `if (!queuedFinal)` branch always returned `false`. It now returns `didSendReply`, which will be `true` if at least one live commentary message was successfully delivered — even if no final reply was ever produced by the agent (e.g., because the run ended silently or with an error after commentary was already sent).

Callers in `broadcast.ts` and `on-message.ts` currently discard the return value, so this is not a live regression. But the semantics shift is worth documenting: `processMessage` previously meant "a _final_ reply was sent to the user" and now means "any reply (commentary or final) was sent." If future callers distinguish these cases, the implicit contract change could cause subtle bugs.

A comment on the return type clarifying the new semantics would help:

```ts
// Returns true if any reply (including live commentary) was delivered to the user.
export async function processMessage(...)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/assistant-output.ts
Line: 25-26

Comment:
**Unbounded module-level fingerprint Map**

`liveAssistantFallbackMessageIdsByFingerprint` is a module-level `Map` that is never cleared. Its purpose is correctenabling stable fallback ID reuse when the same message content appears under different JS object identities across the live-stream path and the finalized-snapshot path. However, on a long-running server handling many diverse conversations, each unique content fingerprint produces a permanent entry that is never evicted.

The `liveAssistantFallbackMessageIds` WeakMap is fine since entries are GC'd once the message object is released. But the fingerprint `Map` holds strong string keys indefinitely. For a server processing thousands of structurally distinct messages over its lifetime, this accumulates without bound.

Consider adding a size cap with LRU-style eviction (e.g., delete the oldest entry once the map exceeds a threshold like 10,000 entries), or add a comment explaining the expected growth bound if it is genuinely small in practice.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 8e32ebf

Comment thread src/agents/assistant-output.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e32ebf15e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/assistant-output.ts Outdated
@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Addressed the two summary-only review notes in 94f3e9505ddec09135ba5d6021a094f18a7545d8:

  • run/attempt.ts now skips the live-commentary polling loop entirely unless onCommentaryReply is wired for the current channel.
  • process-message.ts now documents that processMessage() returns true when any user-visible reply was delivered, including live commentary.

The same commit also replaces the old module-global fallback-id cache with per-run reconciliation state and adds a regression for repeated identical unsigned assistant messages.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94f3e9505d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/assistant-output.ts Outdated
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime labels Mar 13, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbff0b8064

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-subscribe.handlers.messages.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 553c0248e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-subscribe.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69085551fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/whatsapp/src/auto-reply/deliver-reply.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c769105a9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/payloads.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89867ce38d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bc1a90593

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-subscribe.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd971099a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +207 to +210
throwIfAborted();
const caption = index === 0 ? remainingText.shift() || undefined : undefined;
try {
const media = await loadWebMedia(mediaUrl, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply commentary abort timeout to media fetches

The new commentary abort/timeout wiring does not cover media loading: loadWebMedia(...) is awaited directly after a one-time throwIfAborted() check, so if the media source stalls (for example, a hanging HTTP response or slow local read), timeoutMs/abortSignal cannot interrupt this await. In live-commentary runs with media payloads, this leaves a callback running past its deadline and can let stale commentary work outlive the run instead of being promptly canceled.

Useful? React with 👍 / 👎.

@tristanmanchester

Copy link
Copy Markdown
Contributor Author

Closing this in favor of #57279.

The feature is still wanted, but this branch is too stale relative to current review expectations. #57279 is the clean replacement PR with the WhatsApp live commentary implementation on a fresh branch and verified WhatsApp behavior.

@tristanmanchester tristanmanchester deleted the codex/whatsapp-live-commentary-clean branch March 29, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web docs Improvements or additions to documentation gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant