Skip to content

Draft stream preview garbles long replies on Telegram (streamMode: block) #8537

@ksylvan

Description

@ksylvan

Description

When streamMode is set to "block" on the Telegram channel, long model replies arrive on Telegram as a single final message with garbled/corrupted text (missing spaces, words merged, broken markdown). The draft stream preview visible in the local Dashboard GUI shows progressively longer replacements of the message, but Telegram itself does not show progressive updates — it only receives the final (already garbled) message. The garbled final state in the dashboard matches what Telegram receives.

Steps to Reproduce

  1. Configure Telegram channel with streamMode: "block" (private chat)
  2. Send a message that triggers a long response (e.g., a code analysis question)
  3. Observe the local dashboard showing progressively longer message updates
  4. The final message in both dashboard and Telegram is garbled

Observed Behavior

The initial short reply appears clean. As the model continues generating, the dashboard draft preview grows. The final delivered message contains:

  • Code snippets merged into prose without spaces
  • Internal analysis/tool output leaking into reply text
  • Markdown syntax broken (backticks without spaces, headers merged with text)

Root Cause Analysis

Architecture Overview

When streamMode: "block" is active in a private Telegram chat:

  1. Draft stream is created (createTelegramDraftStream) with a 4096-char cap
  2. Draft chunker (EmbeddedBlockChunker) with minChars: 200, maxChars: 800, breakPreference: "paragraph" controls when the draft preview updates
  3. Block streaming is disabled (disableBlockStreaming = true — line 156-158 in bot-message-dispatch.ts) because the draft stream handles the preview instead
  4. The final reply comes from deliverReplies() using the accumulated assistantTexts payloads

The Draft Stream Path (Preview Only)

In bot-message-dispatch.ts lines 99-136, updateDraftFromPartial:

  1. Each onPartialReply callback receives the cumulative cleaned text from the agent
  2. A delta is extracted: delta = text.slice(lastPartialText.length) (line 113)
  3. The delta is fed to draftChunker.append(delta) (line 128)
  4. The chunker drains to draftText += chunk and calls draftStream.update(draftText) (lines 131-133)
  5. The draft stream is throttled (300ms) and capped at 4096 chars (line 45-49 in draft-stream.ts)

Key: When draftText exceeds 4096 chars, sendDraft sets stopped = true and logs a warning. After this, the draft stream silently stops updating. The last preview in the dashboard is whatever was sent before the cap was hit.

The Final Reply Path

Since disableBlockStreaming = true, the block reply pipeline is not active. Instead:

  1. The embedded agent accumulates text in assistantTexts[] via pi-embedded-subscribe.handlers.messages.ts
  2. On message_end, the full text is extracted from the assistant message
  3. buildReplyPayloads() processes the final payloads
  4. Since blockStreamingEnabled is false (due to draft stream), shouldDropFinalPayloads is false — the final payloads are used
  5. deliverReplies() sends the final text through markdownToTelegramChunkssendTelegramText

Where Garbling Occurs — Two Suspect Paths

Path A: Draft Stream Truncation → Stale Dashboard Preview

When the response exceeds 4096 chars in the draft stream:

  • stopped = true freezes the preview at a partial state
  • But the dashboard UI may show the frozen preview as the "current" message
  • The actual final delivery via deliverReplies() sends the correct full text
  • If the dashboard is showing the draft preview (not the delivered message), the garbling is a display issue

Path B: Non-Monotonic Stream Handling (the likely culprit)

In updateDraftFromPartial (lines 112-118):

if (text.startsWith(lastPartialText)) {
  delta = text.slice(lastPartialText.length);
} else {
  // Streaming buffer reset (or non-monotonic stream). Start fresh.
  draftChunker?.reset();
  draftText = "";
}
lastPartialText = text;

When the provider's streaming is non-monotonic (the new cumulative text doesn't start with the previous), the chunker resets and draftText is cleared. But lastPartialText is set to the new (different) text. On the next delta:

  • text.startsWith(lastPartialText) may succeed
  • But draftText was reset to "", so the next chunk starts from scratch
  • The gap between the old draftText and the new accumulation creates the garbling

This happens when:

  • Auto-compaction triggers mid-stream (the provider rewrites the text buffer)
  • Tool calls complete and the agent produces a new text block
  • The AI SDK resets/replaces the text content block

Path C: sanitizeUserFacingText stripping

The normalizeStreamingText function in agent-runner-execution.ts (line 130) runs sanitizeUserFacingText() on the partial text. This function:

  • Strips <final> tags
  • Collapses consecutive duplicate paragraphs
  • Sanitizes HTTP error codes

If sanitizeUserFacingText modifies the text in a way that makes it non-monotonic relative to the previous partial, the updateDraftFromPartial function triggers the reset path (Path B), causing the garbling cascade.

Summary of the Bug

The combination of:

  1. Draft stream chunker accumulating draftText by appending chunks
  2. Non-monotonic partial replies (from tool calls, compaction, or text sanitization) triggering the reset path
  3. No reconciliation between the reset draftText and what was previously accumulated

...means the final draftText can have gaps or overwrites, producing garbled output in the draft preview. If the final delivery is ALSO garbled (not just the draft), then the issue is in the assistantTexts accumulation in pi-embedded-subscribe.handlers.messages.ts, potentially in the text_end handling where:

if (content.startsWith(ctx.state.deltaBuffer)) {
  chunk = content.slice(ctx.state.deltaBuffer.length);
} else if (ctx.state.deltaBuffer.startsWith(content)) {
  chunk = "";
} else if (!ctx.state.deltaBuffer.includes(content)) {
  chunk = content;
}

If none of the conditions match (partial overlap), the text_end content is silently dropped, causing missing text in the final output.

Relevant Files

File Role
src/telegram/bot-message-dispatch.ts Draft stream setup + updateDraftFromPartial
src/telegram/draft-stream.ts Draft stream with 4096-char cap
src/telegram/draft-chunking.ts Chunking config (200-800 chars, paragraph break)
src/agents/pi-embedded-block-chunker.ts EmbeddedBlockChunker implementation
src/agents/pi-embedded-subscribe.handlers.messages.ts text_delta/text_end accumulation
src/auto-reply/reply/agent-runner-execution.ts normalizeStreamingText + sanitizeUserFacingText
src/auto-reply/reply/agent-runner-payloads.ts buildReplyPayloads final assembly
src/telegram/bot/delivery.ts deliverReplies final Telegram send

Suggested Fixes

  1. Path B fix: When the non-monotonic reset triggers, reconstruct draftText from the full text instead of clearing to empty. The text parameter already contains the full cleaned response — use it directly as draftText.

  2. Draft stream cap: When draftText exceeds 4096 chars, instead of stopped = true (which freezes the preview), truncate the preview text with "..." and continue tracking internally.

  3. Add monotonicity guard in text_end: The partial overlap case in handleMessageUpdate silently drops content. Add a fallback that uses the full content when no clean delta can be extracted.

  4. Diagnostic logging: Log when non-monotonic resets occur, with before/after text lengths, to aid debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions