Skip to content

fix(telegram): split long messages at word boundaries instead of mid-word#56595

Merged
hydro13 merged 1 commit intoopenclaw:mainfrom
hydro13:fix/telegram-message-split-word-boundary
Mar 28, 2026
Merged

fix(telegram): split long messages at word boundaries instead of mid-word#56595
hydro13 merged 1 commit intoopenclaw:mainfrom
hydro13:fix/telegram-message-split-word-boundary

Conversation

@hydro13
Copy link
Copy Markdown
Member

@hydro13 hydro13 commented Mar 28, 2026

Summary

  • Replace proportional text estimate with binary search for the largest text prefix whose rendered Telegram HTML fits the character limit
  • Split at the last whitespace boundary within the verified prefix
  • Single words longer than the limit still hard-split (unavoidable)
  • Markdown formatting stays balanced across split points

Root Cause

splitTelegramChunkByHtmlLimit in extensions/telegram/src/format.ts used a proportional estimate from rendered HTML length. When HTML escaping expanded characters (e.g. <&lt;), the estimate window was too short to reach the next whitespace, and findMarkdownIRPreservedSplitIndex fell back to a hard cut at maxEnd — mid-word.

Change Type

  • Bug fix

Testing

41 tests pass. New regression tests for:

  • Word-boundary split when HTML escaping shrinks the retry window
  • Single long word exceeding the limit (hard split)
  • Formatted text splitting at word boundary with balanced <b>...</b> tags

Fixes #36644

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram size: S maintainer Maintainer-authored PR labels Mar 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 28, 2026

Greptile Summary

This PR replaces the proportional-estimate split heuristic in splitTelegramChunkByHtmlLimit with a binary search for the largest text prefix whose rendered Telegram HTML fits within the character limit, then delegates to the existing whitespace-boundary splitter. This correctly handles the case where HTML escaping (e.g. <&lt;) caused the proportional window to fall short of the next whitespace, resulting in mid-word cuts.

Key changes:

  • New findLargestTelegramChunkTextLengthWithinHtmlLimit does a binary search over text-character prefixes, using the proportional estimate as an optimistic starting point to shrink the initial search range.
  • splitTelegramChunkByHtmlLimit is simplified to one call of splitMarkdownIRPreserveWhitespace using the binary-search result.
  • Three new regression tests cover: escaped-HTML shrinking the retry window, single words exceeding the limit (hard-split fallback), and formatting preserved across a word-boundary split.

The implementation is correct and well-tested. The one implicit assumption is that renderTelegramChunkHtml is monotonically non-decreasing in text length; this holds for any reasonable Markdown-to-HTML renderer and is never violated in practice here.

Confidence Score: 5/5

Safe to merge — the binary search is correct, the proportional-estimate optimisation is sound, edge cases are handled, and 41 tests pass.

No P0 or P1 issues found. The logic is monotone-safe (more text → more HTML bytes), the best = 0 fallback terminates correctly, and the termination of splitMarkdownIRPreserveWhitespace with limit = 1 is guaranteed. All remaining observations are P2 or lower.

No files require special attention.

Important Files Changed

Filename Overview
extensions/telegram/src/format.ts Adds findLargestTelegramChunkTextLengthWithinHtmlLimit (binary search over text-prefix lengths) and simplifies splitTelegramChunkByHtmlLimit to use it. Logic is correct: the proportional estimate is used as an optimistic lower bound, the search is bounded by currentTextLength - 1 to guarantee progress, and the Math.max(1, best) fallback is safe.
extensions/telegram/src/format.wrap-md.test.ts Adds three targeted regression tests covering the escaped-HTML window bug, single-word hard-split fallback, and bold-formatting preservation across a word-boundary split. Tests are well-structured and provide concrete assertions on both text content and HTML length invariants.

Reviews (1): Last reviewed commit: "fix(telegram): split long messages at wo..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edc3827ca8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/telegram/src/format.ts Outdated
Comment on lines +582 to +583
}
high = mid - 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace binary search with non-monotonic-safe split search

The new binary search assumes renderTelegramChunkHtml(slice(..., 0, mid)).length <= htmlLimit is monotonic as mid increases, but this formatter has non-monotonic prefixes: truncated auto-linked file refs render as long <a ...> tags, while the completed ref is de-linkified/wrapped shorter (buildTelegramLink + wrapFileReferencesInHtml). In cases like README.md z with a limit between those sizes, high = mid - 1 can prune valid larger prefixes and return 1; the retry loop then emits 1-character chunks and accepts them via the chunk.text.length <= 1 escape hatch, so resulting chunk HTML can still exceed Telegram’s limit.

Useful? React with 👍 / 👎.

…word

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
@hydro13 hydro13 force-pushed the fix/telegram-message-split-word-boundary branch from edc3827 to 00be04e Compare March 28, 2026 20:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00be04e7ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/telegram/src/format.ts
@hydro13 hydro13 merged commit ab2ef7b into openclaw:main Mar 28, 2026
34 of 35 checks passed
hydro13 added a commit that referenced this pull request Mar 28, 2026
alexcode-cc pushed a commit to alexcode-cc/clawdbot that referenced this pull request Mar 30, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
alexjiang1 pushed a commit to alexjiang1/openclaw that referenced this pull request Mar 31, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
pgondhi987 pushed a commit to pgondhi987/openclaw that referenced this pull request Mar 31, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
Tardisyuan pushed a commit to Tardisyuan/openclaw that referenced this pull request Apr 30, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…word (openclaw#56595)

Replace proportional text estimate with binary search for the largest
text prefix whose rendered Telegram HTML fits the character limit, then
split at the last whitespace boundary within that verified prefix.

Single words longer than the limit still hard-split (unavoidable).
Markdown formatting stays balanced across split points.

Fixes openclaw#36644
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: telegram Channel integration: telegram maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram message splitting breaks mid-word

1 participant