fix: make Ellipsis operate on runes instead of bytes to prevent Unicode truncation by veeceey · Pull Request #796 · samber/lo

veeceey · 2026-02-10T08:13:41Z

Summary

Fixes Ellipsis: avoid Unicode character truncation #520: Ellipsis previously used byte-based length counting (len(str)) and byte-based slicing (str[:length-3]), which could split multi-byte Unicode characters (emoji, CJK, accented letters) in the middle, producing garbled output.
Changed Ellipsis to use []rune conversion so the length parameter counts Unicode code points (runes) instead of bytes. Multi-byte characters are now never split.
Added comprehensive Unicode test cases covering CJK characters, emoji, and accented characters.
Updated documentation in README.md, string.go godoc, and docs/data/core-ellipsis.md.

Details

The core change is minimal (3 lines in string.go):

// Before (byte-based, breaks Unicode):
if len(str) > length {
    return strings.TrimSpace(str[:length-3]) + "..."
}

// After (rune-based, Unicode-safe):
runes := []rune(str)
if len(runes) > length {
    return strings.TrimSpace(string(runes[:length-3])) + "..."
}

Note on grapheme clusters: This fix addresses rune-level truncation. Complex grapheme clusters composed of multiple code points (e.g., 🧜‍♂️ which uses zero-width joiners) may still be split between their constituent runes. Full grapheme segmentation would require either an external dependency or a substantial internal implementation, which the maintainer indicated should be avoided. The rune-based approach fixes the vast majority of real-world Unicode truncation issues.

Test plan

All existing ASCII-based TestEllipsis tests continue to pass (rune count == byte count for ASCII)
New test cases verify correct behavior with CJK characters (世界, 你好)
New test cases verify correct behavior with emoji (🏠🐶🐱🌟🎉🌈)
New test cases verify correct behavior with accented characters (café)
Full test suite passes: go test ./...

🤖 Generated with Claude Code

…de truncation The Ellipsis function previously used byte-based length counting (len(str)) and byte-based slicing (str[:length-3]), which could split multi-byte Unicode characters in the middle, producing garbled output. This changes the function to use []rune conversion so the length parameter counts Unicode code points instead of bytes. Emoji, CJK ideographs, and other multi-byte characters are now never split in the middle. Fixes samber#520 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

d-enk · 2026-02-10T11:58:11Z

Optimized code (without convertions)

func Ellipsis(str string, length int) string {
	str = strings.TrimSpace(str)

	const ellipsis = "..."

	cutPosition := 0
	for i := range str {
		if length == len(ellipsis) {
			cutPosition = i
		}

		if length--; length < 0 {
			return strings.TrimSpace(str[:cutPosition]) + ellipsis
		}
	}

	return str
}

veeceey · 2026-02-12T02:52:04Z

Nice optimization! Using range to iterate runes avoids the []rune(str) allocation entirely. I've adopted your approach with one tweak: added an early return for the length < 3 edge case before the loop to keep the intent explicit, rather than relying on cutPosition staying at 0. Updated the PR -- thanks for the suggestion!

Use range-based iteration to count runes without allocating a []rune slice, per reviewer suggestion. The early-return for length < 3 is kept explicit for clarity.

d-enk · 2026-02-12T08:27:51Z

In my opinion

An early return that almost completely duplicates the main algorithm
for the sake of an extremely rare use case seems unnecessary.

I even abandoned the check if cutPosition == 0 {}

however, at least reuse the const ellipsis = "..."

veeceey · 2026-02-14T03:35:23Z

Fair point -- the early return does duplicate logic for a rare edge case. I've simplified it: removed the early return and just use the ellipsis const for the fallback. Cleaner this way, thanks for the feedback.

… const

veeceey · 2026-02-17T08:50:29Z

Hi @samber, just a gentle ping on this one — I've incorporated the optimized rune iteration approach that @d-enk suggested and simplified the edge case handling. The fix prevents Unicode truncation in Ellipsis by operating on runes instead of bytes. Would love your review!

samber · 2026-02-21T18:29:06Z

Thanks guys, for the work on this 👏

codecov · 2026-02-21T18:29:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.30%. Comparing base (d269b33) to head (ad37e77).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #796   +/-   ##
=======================================
  Coverage   94.29%   94.30%           
=======================================
  Files          18       18           
  Lines        2857     2860    +3     
=======================================
+ Hits         2694     2697    +3     
  Misses        149      149           
  Partials       14       14

Flag	Coverage Δ
unittests	`94.30% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ccoVeille · 2026-02-21T19:15:57Z

I had surprise by using runes

Here the code won't do what is expected with Unicode character made of multiple Unicode char

https://go.dev/play/p/s-pb7mlpQVj

ccoVeille · 2026-02-21T19:34:21Z

As mentioned here:

This bug could be fixed by counting the string grapheme cluster at github.com/rivo/uniseg .

If this policy is acceptable, I'll work on a PR for the fix!

Originally posted by @yasuaki640 in #520 (comment)

Using @rivo uniseg could be a solution. But as stated before adding a dependency is not recorded by @samber, so here checking what uniseg code do abd replicate it could be a solution

samber · 2026-02-21T21:41:38Z

Another option would be to add github.com/rivo/uniseg in the comment of the helper for advanced cases.

@ccoVeille i'm going to add your example as a commented test case. If you want to make a PR to try to fix it, please do !

refactor: avoid rune slice allocation in Ellipsis

3279fbf

Use range-based iteration to count runes without allocating a []rune slice, per reviewer suggestion. The early-return for length < 3 is kept explicit for clarity.

Simplify Ellipsis: remove early return for length < 3, reuse ellipsis…

ad37e77

… const

samber merged commit 0b4623d into samber:master Feb 21, 2026
10 of 11 checks passed

samber mentioned this pull request Feb 21, 2026

test: adding missing test cases to ellipsis #809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make Ellipsis operate on runes instead of bytes to prevent Unicode truncation#796

fix: make Ellipsis operate on runes instead of bytes to prevent Unicode truncation#796
samber merged 3 commits intosamber:masterfrom
veeceey:fix/ellipsis-unicode-truncation

veeceey commented Feb 10, 2026

Uh oh!

d-enk commented Feb 10, 2026

Uh oh!

veeceey commented Feb 12, 2026

Uh oh!

d-enk commented Feb 12, 2026

Uh oh!

veeceey commented Feb 14, 2026

Uh oh!

veeceey commented Feb 17, 2026

Uh oh!

samber commented Feb 21, 2026

Uh oh!

Uh oh!

codecov bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

ccoVeille commented Feb 21, 2026

Uh oh!

ccoVeille commented Feb 21, 2026

Uh oh!

samber commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

veeceey commented Feb 10, 2026

Summary

Details

Test plan

Uh oh!

d-enk commented Feb 10, 2026

Uh oh!

veeceey commented Feb 12, 2026

Uh oh!

d-enk commented Feb 12, 2026

Uh oh!

veeceey commented Feb 14, 2026

Uh oh!

veeceey commented Feb 17, 2026

Uh oh!

samber commented Feb 21, 2026

Uh oh!

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ccoVeille commented Feb 21, 2026

Uh oh!

ccoVeille commented Feb 21, 2026

Uh oh!

samber commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 21, 2026 •

edited

Loading