Skip to content

Refactor/base layer message split#436

Merged
Huaaudio merged 12 commits intosipeed:mainfrom
Huaaudio:feat/base-layer-message-split
Feb 18, 2026
Merged

Refactor/base layer message split#436
Huaaudio merged 12 commits intosipeed:mainfrom
Huaaudio:feat/base-layer-message-split

Conversation

@Huaaudio
Copy link
Collaborator

@Huaaudio Huaaudio commented Feb 18, 2026

📝 Description

This PR extracts the Discord message splitting logic from pkg/channels/discord.go into the reusable base-layer utility in pkg/utils/message.go. This enables other channels (Telegram, WhatsApp, etc.) to share the same message splitting functionality. For other channel usage: please import pkg/utils folder (for most channels this is already imported) and use chunks := utils.SplitMessage(msg.Content, <lenth limit>) . You can refer to pkg/channels/discord.go and #251 for details on handling message chunks.

This PR also fixes the edge case(Long code blocks) mentioned in #143 .

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • [ √ ] ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • [ √ ] ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)
  • [√] Mostly CTRL-X and CTRL-V

🔗 Related Issue

Refactor from #143 , related #251 , related discussion #244 and #285

📚 Technical Context (Skip for Docs)

  • Reference URL:
  • Reasoning:

🧪 Test Environment

  • Hardware: Vultr 1c/1G VPS and PC
  • OS: Ubuntu 24.04 and WSL
  • Model/Provider: Zhipu/Z.AI
  • Channels: Discord

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • [ √] My code/docs follow the style of this project.
  • [√ ] I have performed a self-review of my own changes.
  • [√ ] I have updated the documentation accordingly.

Hua and others added 3 commits February 18, 2026 20:01
- Move FindLast, findLast, and SplitMessage from discord.go to pkg/utils/message.go
- Update discord.go to use utils.SplitMessage()
- Makes splitting logic reusable across other channels
- Accept hard upper limit (maxLen) instead of pre-subtracted value
- Caller now passes actual platform limit (e.g., 2000 for Discord)
- Internal buffer of 500 chars is handled within message.go
- Preferred split at maxLen - 500, may extend to maxLen for code blocks
- Never exceeds maxLen, no more mental math for callers
Copilot AI review requested due to automatic review settings February 18, 2026 20:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves Discord message chunking/splitting logic into a reusable pkg/utils helper so other channel adapters can share the same behavior.

Changes:

  • Added utils.SplitMessage (and related helpers) to split long outbound messages while trying to preserve fenced code blocks.
  • Updated the Discord channel to use utils.SplitMessage and removed the now-inlined Discord-only implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
pkg/utils/message.go Introduces shared message-splitting utility intended for reuse across channels.
pkg/channels/discord.go Replaces local Discord splitting logic with the shared utility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Huaaudio and others added 2 commits February 18, 2026 22:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 21:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +9 to +72
// SplitMessage splits long messages into chunks, preserving code block integrity.
// The function prefers to split at maxLen - defaultCodeBlockBuffer to leave room for code blocks,
// but may extend up to maxLen when needed to avoid breaking incomplete code blocks.
// Please refer to pkg/channels/discord.go for usage.
func SplitMessage(content string, maxLen int) []string {
var messages []string
codeBlockBuffer := defaultCodeBlockBuffer

for len(content) > 0 {
if len(content) <= maxLen {
messages = append(messages, content)
break
}

// Effective split point: maxLen minus buffer, to leave room for code blocks
effectiveLimit := maxLen - codeBlockBuffer
if effectiveLimit < maxLen/2 {
effectiveLimit = maxLen / 2
}

// Find natural split point within the effective limit
msgEnd := FindLastNewline(content[:effectiveLimit], 200)
if msgEnd <= 0 {
msgEnd = FindLastSpace(content[:effectiveLimit], 100)
}
if msgEnd <= 0 {
msgEnd = effectiveLimit
}

// Check if this would end with an incomplete code block
candidate := content[:msgEnd]
unclosedIdx := FindLastUnclosedCodeBlock(candidate)

if unclosedIdx >= 0 {
// Message would end with incomplete code block
// Try to extend up to maxLen to include the closing ```
if len(content) > msgEnd {
closingIdx := FindNextClosingCodeBlock(content, msgEnd)
if closingIdx > 0 && closingIdx <= maxLen {
// Extend to include the closing ```
msgEnd = closingIdx
} else {
// Can't find closing within maxLen, split before the code block
msgEnd = FindLastNewline(content[:unclosedIdx], 200)
if msgEnd <= 0 {
msgEnd = FindLastSpace(content[:unclosedIdx], 100)
}
if msgEnd <= 0 {
msgEnd = unclosedIdx
}
}
}
}

if msgEnd <= 0 {
msgEnd = effectiveLimit
}

messages = append(messages, content[:msgEnd])
content = strings.TrimSpace(content[msgEnd:])
}

return messages
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessage introduces non-trivial behavior (natural boundaries, code block fence handling, and the 500-char buffer) but there are no unit tests verifying chunk sizes, code-block integrity, and edge cases (e.g., long code blocks without closing fences, no whitespace, Unicode input). Given the repo has existing Go unit tests, please add focused tests in pkg/utils for this helper.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +40
for len(content) > 0 {
if len(content) <= maxLen {
messages = append(messages, content)
break
}

// Effective split point: maxLen minus buffer, to leave room for code blocks
effectiveLimit := maxLen - codeBlockBuffer
if effectiveLimit < maxLen/2 {
effectiveLimit = maxLen / 2
}

// Find natural split point within the effective limit
msgEnd := FindLastNewline(content[:effectiveLimit], 200)
if msgEnd <= 0 {
msgEnd = FindLastSpace(content[:effectiveLimit], 100)
}
if msgEnd <= 0 {
msgEnd = effectiveLimit
}

// Check if this would end with an incomplete code block
candidate := content[:msgEnd]
unclosedIdx := FindLastUnclosedCodeBlock(candidate)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessage uses byte-based len() and byte indices when slicing (e.g., content[:msgEnd]). If msgEnd lands in the middle of a multi-byte UTF-8 rune, this will split the string into invalid UTF-8 and can produce garbled output or API rejections. Consider operating on []rune (or otherwise ensuring msgEnd aligns to rune boundaries) so chunk boundaries are Unicode-safe and the maxLen semantics align with character-based limits.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 18, 2026 21:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +36
// Find natural split point within the effective limit
msgEnd := findLastNewline(content[:effectiveLimit], 200)
if msgEnd <= 0 {
msgEnd = findLastSpace(content[:effectiveLimit], 100)
}
if msgEnd <= 0 {
msgEnd = effectiveLimit
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function computes split points using len(content) and slices strings by byte index (e.g., content[:effectiveLimit] / msgEnd = effectiveLimit). When there’s no newline/space, msgEnd can land in the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 chunks and potentially garbling output. Consider doing all length accounting/splitting in runes (or adjust msgEnd back to a valid rune boundary using utf8 helpers).

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +73
messages = append(messages, strings.TrimRight(content[:msgEnd], " \t\n\r")+"\n```")
content = strings.TrimSpace(header + "\n" + content[msgEnd:])
continue
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When splitting inside a code block, content = strings.TrimSpace(header + "\n" + content[msgEnd:]) removes all leading whitespace from the reopened chunk. If the code block content starts with indentation (common in code), this changes the message content rather than just splitting it. Consider avoiding TrimSpace here (e.g., only trimming leading newlines) to preserve code formatting.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +91
msgEnd = maxLen - 5
messages = append(messages, strings.TrimRight(content[:msgEnd], " \t\n\r")+"\n```")
content = strings.TrimSpace(header + "\n" + content[msgEnd:])
continue
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: strings.TrimSpace(header + "\n" + content[msgEnd:]) in this branch can strip meaningful leading indentation/whitespace from code content after the reopened fence, altering output. Prefer preserving the original whitespace (or only trimming the newline(s) introduced by the split).

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +60
headerEnd := strings.Index(content[unclosedIdx:], "\n")
if headerEnd == -1 {
headerEnd = unclosedIdx + 3
} else {
headerEnd += unclosedIdx
}
header := strings.TrimSpace(content[unclosedIdx:headerEnd])

Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the opening fence line doesn’t contain a newline (e.g., lang immediately followed by content), the `headerEnd == -1` branch sets `headerEnd = unclosedIdx + 3`, so `header` becomes just "" and the language tag is lost when reopening. Consider falling back to the next whitespace/end-of-string (or msgEnd) so the language specifier is preserved when present.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +15
// SplitMessage splits long messages into chunks, preserving code block integrity.
// The function prefers to split at maxLen - defaultCodeBlockBuffer to leave room for code blocks,
// but may extend up to maxLen when needed to avoid breaking incomplete code blocks.
// Please refer to pkg/channels/discord.go for usage.
func SplitMessage(content string, maxLen int) []string {
var messages []string
codeBlockBuffer := defaultCodeBlockBuffer
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessage introduces fairly intricate splitting behavior (code-fence detection, fence injection/reopening, buffer logic). Adding unit tests (e.g., plain text, long code blocks, missing closing fence, and Unicode/multi-byte input) would help ensure this shared utility stays stable as other channels start using it.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 21:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 18, 2026 22:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if !strings.HasSuffix(chunks[0], "\n```") {
t.Error("First chunk should end with injected closing fence")
}
// Check that second chunk starts with execution header
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Check that second chunk starts with execution header" but should say "Check that second chunk starts with code block header" for consistency with the error message on line 76.

Suggested change
// Check that second chunk starts with execution header
// Check that second chunk starts with code block header

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +112
messages = append(messages, content[:msgEnd])
content = strings.TrimSpace(content[msgEnd:])
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SplitMessage function uses byte-based indexing (string slicing) which can split multi-byte UTF-8 characters, potentially creating invalid UTF-8 sequences. When slicing at positions found by findLastNewline, findLastSpace, or at arbitrary positions like effectiveLimit or innerLimit, the function may split in the middle of a multi-byte character. Consider using rune-based indexing or validating split points to ensure they fall on character boundaries. The test case "Preserve Unicode characters" acknowledges this issue but doesn't verify the output is valid UTF-8.

Copilot uses AI. Check for mistakes.

if msgEnd <= 0 {
msgEnd = effectiveLimit
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential infinite loop when msgEnd is set to 0. This can occur on line 95 when unclosedIdx is 0 (code block starts at the beginning), or potentially from other code paths. When msgEnd is 0, line 112 performs content = strings.TrimSpace(content[msgEnd:]) which doesn't reduce the content length (it's just trimming the same string), causing an infinite loop. The guard on lines 107-109 sets msgEnd to effectiveLimit only if msgEnd <= 0, but this happens AFTER the code block handling logic which may have set msgEnd to 0 and then taken a continue path earlier. Consider adding validation that msgEnd must be > 0 before line 111, or ensure line 95 sets msgEnd to at least 1.

Suggested change
}
}
// Ensure msgEnd is a valid, positive index so the loop always makes progress.
if msgEnd <= 0 {
msgEnd = 1
} else if msgEnd > len(content) {
msgEnd = len(content)
}

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +78
checkContent: func(t *testing.T, chunks []string) {
// Check that first chunk ends with closing fence
if !strings.HasSuffix(chunks[0], "\n```") {
t.Error("First chunk should end with injected closing fence")
}
// Check that second chunk starts with execution header
if !strings.HasPrefix(chunks[1], "```go") {
t.Error("Second chunk should start with injected code block header")
}
},
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases only verify that specific chunks don't exceed maxLen, but don't comprehensively check all chunks. For example, in the "Simple split regular text" test (lines 36-39), only chunk 0 is checked, not chunk 1. In the "Long code block split" test (lines 69-78), neither chunk is explicitly checked for length. Consider adding a check that iterates through all chunks and verifies each one is <= maxLen to ensure the function never produces oversized chunks.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +93
checkContent: func(t *testing.T, chunks []string) {
// Just verify we didn't panic and got valid strings.
// Go strings are UTF-8, if we split mid-rune it would be bad,
// but standard slicing might do that.
// Let's assume standard behavior is acceptable or check if it produces invalid rune?
if !strings.Contains(chunks[0], "\u4e16") {
t.Error("Chunk should contain unicode characters")
}
},
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case acknowledges that splitting mid-rune (multi-byte UTF-8 character) would be bad, but doesn't verify that this doesn't happen. The test only checks that the chunk contains at least one instance of the Unicode character, but doesn't validate that all chunks contain valid UTF-8. Consider adding validation using utf8.ValidString() to ensure all chunks are valid UTF-8 sequences.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 22:26
@Huaaudio Huaaudio merged commit 59fd391 into sipeed:main Feb 18, 2026
8 checks passed
@Huaaudio Huaaudio deleted the feat/base-layer-message-split branch February 18, 2026 22:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 103 to +108
runes := []rune(msg.Content)
if len(runes) == 0 {
return nil
}

chunks := splitMessage(msg.Content, 1500) // Discord has a limit of 2000 characters per message, leave 500 for natural split e.g. code blocks
chunks := utils.SplitMessage(msg.Content, 2000) // Split messages into chunks, Discord length limit: 2000 chars
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send converts the full message to []rune only to check emptiness (len(runes) == 0), which allocates and can be expensive for long messages. This can be replaced with a simple msg.Content == "" (or len(msg.Content)==0) check without changing behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +43
// Find natural split point within the effective limit
msgEnd := findLastNewline(content[:effectiveLimit], 200)
if msgEnd <= 0 {
msgEnd = findLastSpace(content[:effectiveLimit], 100)
}
if msgEnd <= 0 {
msgEnd = effectiveLimit
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessage computes split points using byte indexes (len, content[:effectiveLimit], content[:msgEnd]). When there are no ASCII split characters near the boundary (e.g., long CJK/emoji text), msgEnd can land in the middle of a UTF-8 sequence, producing invalid UTF-8 in a chunk (and JSON encoding will replace invalid bytes). Consider adjusting msgEnd to the nearest prior UTF-8 rune boundary (or operating on []rune) before slicing.

Copilot uses AI. Check for mistakes.
SebastianBoehler pushed a commit to SebastianBoehler/picoclaw that referenced this pull request Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants