Refactor/base layer message split by Huaaudio · Pull Request #436 · sipeed/picoclaw

Huaaudio · 2026-02-18T20:55:21Z

📝 Description

This PR extracts the Discord message splitting logic from pkg/channels/discord.go into the reusable base-layer utility in pkg/utils/message.go. This enables other channels (Telegram, WhatsApp, etc.) to share the same message splitting functionality. For other channel usage: please import pkg/utils folder (for most channels this is already imported) and use chunks := utils.SplitMessage(msg.Content, <lenth limit>) . You can refer to pkg/channels/discord.go and #251 for details on handling message chunks.

This PR also fixes the edge case(Long code blocks) mentioned in #143 .

🗣️ Type of Change

🐞 Bug fix (non-breaking change which fixes an issue)
[ √ ] ✨ New feature (non-breaking change which adds functionality)
📖 Documentation update
[ √ ] ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

🤖 Fully AI-generated (100% AI, 0% Human)
🛠️ Mostly AI-generated (AI draft, Human verified/modified)
👨‍💻 Mostly Human-written (Human lead, AI assisted or none)
[√] Mostly CTRL-X and CTRL-V

🔗 Related Issue

Refactor from #143 , related #251 , related discussion #244 and #285

📚 Technical Context (Skip for Docs)

Reference URL:
Reasoning:

🧪 Test Environment

Hardware: Vultr 1c/1G VPS and PC
OS: Ubuntu 24.04 and WSL
Model/Provider: Zhipu/Z.AI
Channels: Discord

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

[ √] My code/docs follow the style of this project.
[√ ] I have performed a self-review of my own changes.
[√ ] I have updated the documentation accordingly.

- Move FindLast, findLast, and SplitMessage from discord.go to pkg/utils/message.go - Update discord.go to use utils.SplitMessage() - Makes splitting logic reusable across other channels

- Accept hard upper limit (maxLen) instead of pre-subtracted value - Caller now passes actual platform limit (e.g., 2000 for Discord) - Internal buffer of 500 chars is handled within message.go - Preferred split at maxLen - 500, may extend to maxLen for code blocks - Never exceeds maxLen, no more mental math for callers

Copilot

Pull request overview

This PR moves Discord message chunking/splitting logic into a reusable pkg/utils helper so other channel adapters can share the same behavior.

Changes:

Added utils.SplitMessage (and related helpers) to split long outbound messages while trying to preserve fenced code blocks.
Updated the Discord channel to use utils.SplitMessage and removed the now-inlined Discord-only implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
pkg/utils/message.go	Introduces shared message-splitting utility intended for reuse across channels.
pkg/channels/discord.go	Replaces local Discord splitting logic with the shared utility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/message.go

pkg/channels/discord.go

pkg/utils/message.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/message.go

Copilot · 2026-02-18T21:20:16Z

pkg/utils/message.go

+// SplitMessage splits long messages into chunks, preserving code block integrity.
+// The function prefers to split at maxLen - defaultCodeBlockBuffer to leave room for code blocks,
+// but may extend up to maxLen when needed to avoid breaking incomplete code blocks.
+// Please refer to pkg/channels/discord.go for usage.
+func SplitMessage(content string, maxLen int) []string {
+	var messages []string
+	codeBlockBuffer := defaultCodeBlockBuffer
+
+	for len(content) > 0 {
+		if len(content) <= maxLen {
+			messages = append(messages, content)
+			break
+		}
+
+		// Effective split point: maxLen minus buffer, to leave room for code blocks
+		effectiveLimit := maxLen - codeBlockBuffer
+		if effectiveLimit < maxLen/2 {
+			effectiveLimit = maxLen / 2
+		}
+
+		// Find natural split point within the effective limit
+		msgEnd := FindLastNewline(content[:effectiveLimit], 200)
+		if msgEnd <= 0 {
+			msgEnd = FindLastSpace(content[:effectiveLimit], 100)
+		}
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}
+
+		// Check if this would end with an incomplete code block
+		candidate := content[:msgEnd]
+		unclosedIdx := FindLastUnclosedCodeBlock(candidate)
+
+		if unclosedIdx >= 0 {
+			// Message would end with incomplete code block
+			// Try to extend up to maxLen to include the closing ```
+			if len(content) > msgEnd {
+				closingIdx := FindNextClosingCodeBlock(content, msgEnd)
+				if closingIdx > 0 && closingIdx <= maxLen {
+					// Extend to include the closing ```
+					msgEnd = closingIdx
+				} else {
+					// Can't find closing within maxLen, split before the code block
+					msgEnd = FindLastNewline(content[:unclosedIdx], 200)
+					if msgEnd <= 0 {
+						msgEnd = FindLastSpace(content[:unclosedIdx], 100)
+					}
+					if msgEnd <= 0 {
+						msgEnd = unclosedIdx
+					}
+				}
+			}
+		}
+
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}
+
+		messages = append(messages, content[:msgEnd])
+		content = strings.TrimSpace(content[msgEnd:])
+	}
+
+	return messages
+}


SplitMessage introduces non-trivial behavior (natural boundaries, code block fence handling, and the 500-char buffer) but there are no unit tests verifying chunk sizes, code-block integrity, and edge cases (e.g., long code blocks without closing fences, no whitespace, Unicode input). Given the repo has existing Go unit tests, please add focused tests in pkg/utils for this helper.

pkg/utils/message.go

Copilot · 2026-02-18T21:20:16Z

pkg/utils/message.go

+	for len(content) > 0 {
+		if len(content) <= maxLen {
+			messages = append(messages, content)
+			break
+		}
+
+		// Effective split point: maxLen minus buffer, to leave room for code blocks
+		effectiveLimit := maxLen - codeBlockBuffer
+		if effectiveLimit < maxLen/2 {
+			effectiveLimit = maxLen / 2
+		}
+
+		// Find natural split point within the effective limit
+		msgEnd := FindLastNewline(content[:effectiveLimit], 200)
+		if msgEnd <= 0 {
+			msgEnd = FindLastSpace(content[:effectiveLimit], 100)
+		}
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}
+
+		// Check if this would end with an incomplete code block
+		candidate := content[:msgEnd]
+		unclosedIdx := FindLastUnclosedCodeBlock(candidate)


SplitMessage uses byte-based len() and byte indices when slicing (e.g., content[:msgEnd]). If msgEnd lands in the middle of a multi-byte UTF-8 rune, this will split the string into invalid UTF-8 and can produce garbled output or API rejections. Consider operating on []rune (or otherwise ensuring msgEnd aligns to rune boundaries) so chunk boundaries are Unicode-safe and the maxLen semantics align with character-based limits.

pkg/utils/message.go

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T21:42:11Z

pkg/utils/message.go

+		// Find natural split point within the effective limit
+		msgEnd := findLastNewline(content[:effectiveLimit], 200)
+		if msgEnd <= 0 {
+			msgEnd = findLastSpace(content[:effectiveLimit], 100)
+		}
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}


This function computes split points using len(content) and slices strings by byte index (e.g., content[:effectiveLimit] / msgEnd = effectiveLimit). When there’s no newline/space, msgEnd can land in the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 chunks and potentially garbling output. Consider doing all length accounting/splitting in runes (or adjust msgEnd back to a valid rune boundary using utf8 helpers).

Copilot · 2026-02-18T21:42:12Z

pkg/utils/message.go

+						messages = append(messages, strings.TrimRight(content[:msgEnd], " \t\n\r")+"\n```")
+						content = strings.TrimSpace(header + "\n" + content[msgEnd:])
+						continue


When splitting inside a code block, content = strings.TrimSpace(header + "\n" + content[msgEnd:]) removes all leading whitespace from the reopened chunk. If the code block content starts with indentation (common in code), this changes the message content rather than just splitting it. Consider avoiding TrimSpace here (e.g., only trimming leading newlines) to preserve code formatting.

Copilot · 2026-02-18T21:42:12Z

pkg/utils/message.go

+							msgEnd = maxLen - 5
+							messages = append(messages, strings.TrimRight(content[:msgEnd], " \t\n\r")+"\n```")
+							content = strings.TrimSpace(header + "\n" + content[msgEnd:])
+							continue


Same as above: strings.TrimSpace(header + "\n" + content[msgEnd:]) in this branch can strip meaningful leading indentation/whitespace from code content after the reopened fence, altering output. Prefer preserving the original whitespace (or only trimming the newline(s) introduced by the split).

Copilot · 2026-02-18T21:42:12Z

pkg/utils/message.go

+					headerEnd := strings.Index(content[unclosedIdx:], "\n")
+					if headerEnd == -1 {
+						headerEnd = unclosedIdx + 3
+					} else {
+						headerEnd += unclosedIdx
+					}
+					header := strings.TrimSpace(content[unclosedIdx:headerEnd])
+


If the opening fence line doesn’t contain a newline (e.g., lang immediately followed by content), the `headerEnd == -1` branch sets `headerEnd = unclosedIdx + 3`, so `header` becomes just "" and the language tag is lost when reopening. Consider falling back to the next whitespace/end-of-string (or msgEnd) so the language specifier is preserved when present.

Copilot · 2026-02-18T21:42:12Z

pkg/utils/message.go

+// SplitMessage splits long messages into chunks, preserving code block integrity.
+// The function prefers to split at maxLen - defaultCodeBlockBuffer to leave room for code blocks,
+// but may extend up to maxLen when needed to avoid breaking incomplete code blocks.
+// Please refer to pkg/channels/discord.go for usage.
+func SplitMessage(content string, maxLen int) []string {
+	var messages []string
+	codeBlockBuffer := defaultCodeBlockBuffer


SplitMessage introduces fairly intricate splitting behavior (code-fence detection, fence injection/reopening, buffer logic). Adding unit tests (e.g., plain text, long code blocks, missing closing fence, and Unicode/multi-byte input) would help ensure this shared utility stays stable as other channels start using it.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/message.go

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/message.go

Copilot · 2026-02-18T22:25:54Z

pkg/utils/message_test.go

+				if !strings.HasSuffix(chunks[0], "\n```") {
+					t.Error("First chunk should end with injected closing fence")
+				}
+				// Check that second chunk starts with execution header


The comment says "Check that second chunk starts with execution header" but should say "Check that second chunk starts with code block header" for consistency with the error message on line 76.

Suggested change

// Check that second chunk starts with execution header

// Check that second chunk starts with code block header

Copilot · 2026-02-18T22:25:54Z

pkg/utils/message.go

+		messages = append(messages, content[:msgEnd])
+		content = strings.TrimSpace(content[msgEnd:])


The SplitMessage function uses byte-based indexing (string slicing) which can split multi-byte UTF-8 characters, potentially creating invalid UTF-8 sequences. When slicing at positions found by findLastNewline, findLastSpace, or at arbitrary positions like effectiveLimit or innerLimit, the function may split in the middle of a multi-byte character. Consider using rune-based indexing or validating split points to ensure they fall on character boundaries. The test case "Preserve Unicode characters" acknowledges this issue but doesn't verify the output is valid UTF-8.

Copilot · 2026-02-18T22:25:54Z

pkg/utils/message.go

+
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}


There is a potential infinite loop when msgEnd is set to 0. This can occur on line 95 when unclosedIdx is 0 (code block starts at the beginning), or potentially from other code paths. When msgEnd is 0, line 112 performs content = strings.TrimSpace(content[msgEnd:]) which doesn't reduce the content length (it's just trimming the same string), causing an infinite loop. The guard on lines 107-109 sets msgEnd to effectiveLimit only if msgEnd <= 0, but this happens AFTER the code block handling logic which may have set msgEnd to 0 and then taken a continue path earlier. Consider adding validation that msgEnd must be > 0 before line 111, or ensure line 95 sets msgEnd to at least 1.

Suggested change

}

}

// Ensure msgEnd is a valid, positive index so the loop always makes progress.

if msgEnd <= 0 {

msgEnd = 1

} else if msgEnd > len(content) {

msgEnd = len(content)

}

Copilot · 2026-02-18T22:25:54Z

pkg/utils/message_test.go

+			checkContent: func(t *testing.T, chunks []string) {
+				// Check that first chunk ends with closing fence
+				if !strings.HasSuffix(chunks[0], "\n```") {
+					t.Error("First chunk should end with injected closing fence")
+				}
+				// Check that second chunk starts with execution header
+				if !strings.HasPrefix(chunks[1], "```go") {
+					t.Error("Second chunk should start with injected code block header")
+				}
+			},


The test cases only verify that specific chunks don't exceed maxLen, but don't comprehensively check all chunks. For example, in the "Simple split regular text" test (lines 36-39), only chunk 0 is checked, not chunk 1. In the "Long code block split" test (lines 69-78), neither chunk is explicitly checked for length. Consider adding a check that iterates through all chunks and verifies each one is <= maxLen to ensure the function never produces oversized chunks.

Copilot · 2026-02-18T22:25:54Z

pkg/utils/message_test.go

+			checkContent: func(t *testing.T, chunks []string) {
+				// Just verify we didn't panic and got valid strings.
+				// Go strings are UTF-8, if we split mid-rune it would be bad,
+				// but standard slicing might do that.
+				// Let's assume standard behavior is acceptable or check if it produces invalid rune?
+				if !strings.Contains(chunks[0], "\u4e16") {
+					t.Error("Chunk should contain unicode characters")
+				}
+			},


The test case acknowledges that splitting mid-rune (multi-byte UTF-8 character) would be bad, but doesn't verify that this doesn't happen. The test only checks that the chunk contains at least one instance of the Unicode character, but doesn't validate that all chunks contain valid UTF-8. Consider adding validation using utf8.ValidString() to ensure all chunks are valid UTF-8 sequences.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T22:30:43Z

pkg/channels/discord.go

 	runes := []rune(msg.Content)
 	if len(runes) == 0 {
 		return nil
 	}

-	chunks := splitMessage(msg.Content, 1500) // Discord has a limit of 2000 characters per message, leave 500 for natural split e.g. code blocks
+	chunks := utils.SplitMessage(msg.Content, 2000) // Split messages into chunks, Discord length limit: 2000 chars


Send converts the full message to []rune only to check emptiness (len(runes) == 0), which allocates and can be expensive for long messages. This can be replaced with a simple msg.Content == "" (or len(msg.Content)==0) check without changing behavior.

Copilot · 2026-02-18T22:30:44Z

pkg/utils/message.go

+		// Find natural split point within the effective limit
+		msgEnd := findLastNewline(content[:effectiveLimit], 200)
+		if msgEnd <= 0 {
+			msgEnd = findLastSpace(content[:effectiveLimit], 100)
+		}
+		if msgEnd <= 0 {
+			msgEnd = effectiveLimit
+		}


SplitMessage computes split points using byte indexes (len, content[:effectiveLimit], content[:msgEnd]). When there are no ASCII split characters near the boundary (e.g., long CJK/emoji text), msgEnd can land in the middle of a UTF-8 sequence, producing invalid UTF-8 in a chunk (and JSON encoding will replace invalid bytes). Consider adjusting msgEnd to the nearest prior UTF-8 rune boundary (or operating on []rune) before slicing.

…plit Refactor/base layer message split from sipeed#143

Hua and others added 3 commits February 18, 2026 20:01

refactor: extract message splitting logic to shared utils

94a1b86

- Move FindLast, findLast, and SplitMessage from discord.go to pkg/utils/message.go - Update discord.go to use utils.SplitMessage() - Makes splitting logic reusable across other channels

update documents

e35a827

Copilot AI review requested due to automatic review settings February 18, 2026 20:55

Copilot started reviewing on behalf of Huaaudio February 18, 2026 20:55 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

pkg/utils/message.go Outdated Show resolved Hide resolved

pkg/utils/message.go Outdated Show resolved Hide resolved

pkg/channels/discord.go Show resolved Hide resolved

pkg/utils/message.go Show resolved Hide resolved

pkg/utils/message.go Show resolved Hide resolved

Huaaudio and others added 2 commits February 18, 2026 22:03

Merge branch 'sipeed:main' into feat/base-layer-message-split

0a9d24e

Update pkg/utils/message.go

4ccee85

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 18, 2026 21:16

Copilot started reviewing on behalf of Huaaudio February 18, 2026 21:16 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Huaaudio added 2 commits February 18, 2026 22:31

Update to support extra long code blocks

f38ce0d

Privated function

82a2fae

Copilot AI review requested due to automatic review settings February 18, 2026 21:37

Copilot started reviewing on behalf of Huaaudio February 18, 2026 21:38 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Update pkg/utils/message.go

dfc3dff

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 18, 2026 21:43

Copilot started reviewing on behalf of Huaaudio February 18, 2026 21:44 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

pkg/utils/message.go Show resolved Hide resolved

Huaaudio added 3 commits February 18, 2026 23:14

update message test, change dynamic buffer

7d8894d

update dynamic buffer

a46fe14

remove unicode

98afd39

Copilot AI review requested due to automatic review settings February 18, 2026 22:18

Copilot started reviewing on behalf of Huaaudio February 18, 2026 22:18 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Update pkg/utils/message.go

0d6b22f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 18, 2026 22:26

Copilot started reviewing on behalf of Huaaudio February 18, 2026 22:27 View session

Huaaudio merged commit 59fd391 into sipeed:main Feb 18, 2026
8 checks passed

Huaaudio deleted the feat/base-layer-message-split branch February 18, 2026 22:29

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Huaaudio mentioned this pull request Feb 18, 2026

Fix Telegram send failures for oversized messages (#244) #251

Closed

7 tasks

SebastianBoehler pushed a commit to SebastianBoehler/picoclaw that referenced this pull request Feb 22, 2026

Merge pull request sipeed#436 from Huaaudio/feat/base-layer-message-s…

72be1ca

…plit Refactor/base layer message split from sipeed#143

nikolasdehor mentioned this pull request Feb 28, 2026

Feat: Add Whatsapp channel #99

Closed

	// Check that second chunk starts with execution header
	// Check that second chunk starts with code block header

		messages = append(messages, content[:msgEnd])
		content = strings.TrimSpace(content[msgEnd:])

-		}
+		}
+		// Ensure msgEnd is a valid, positive index so the loop always makes progress.
+		if msgEnd <= 0 {
+			msgEnd = 1
+		} else if msgEnd > len(content) {
+			msgEnd = len(content)
+		}

Conversation

Huaaudio commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🗣️ Type of Change

🤖 AI Code Generation

🔗 Related Issue

📚 Technical Context (Skip for Docs)

🧪 Test Environment

📸 Evidence (Optional)

☑️ Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Huaaudio commented Feb 18, 2026 •

edited

Loading