Skip to content

Optimize RuneWidth and StringWidth performance#95

Merged
mattn merged 1 commit intomasterfrom
optimize-runewidth-performance
Apr 8, 2026
Merged

Optimize RuneWidth and StringWidth performance#95
mattn merged 1 commit intomasterfrom
optimize-runewidth-performance

Conversation

@mattn
Copy link
Copy Markdown
Owner

@mattn mattn commented Apr 8, 2026

The non-LUT path of RuneWidth was performing multiple binary searches per rune across combining, nonprint, doublewidth, and other tables. By merging tables that return the same width at init time, the number of searches is reduced. Specifically, combining and nonprint are merged into a single zerowidth table, and ambiguous and doublewidth are merged into a widewidth table used only in the EastAsian path where both return width 2.

The redundant narrow table lookup in the non-EastAsian path was also removed since its result (width 1) is identical to the default case. The variadic inTables function, which allocated a slice on each call, has been replaced with direct || expressions.

StringWidth now has a fast path for pure ASCII strings that bypasses the grapheme segmenter entirely. Wrap replaces string concatenation with strings.Builder, improving from O(n²) to O(n).

The merged tables are computed at init from the existing generated tables, so no manual maintenance is needed when the Unicode version is updated. TestRuneWidthChecksums confirms via SHA256 that all rune widths remain identical.

                                    │     old      │            new             │
                                    │    sec/op    │   sec/op     vs base       │
RuneWidthAll/regular-16               25.28m ± 5%   16.56m ± 4%  -34.49%
RuneWidthAll/lut-16                   5.479m ± 4%   4.487m ± 2%  -18.10%
RuneWidthAllEastAsian/regular-16      47.15m ±22%   26.03m ± 9%  -44.79%
RuneWidthAllEastAsian/lut-16          5.414m ± 2%   4.410m ± 4%  -18.54%
RuneWidth768/lut-16                   3.720µ ±10%   2.788µ ± 6%  -25.06%
RuneWidth768EastAsian/regular-16      25.72µ ± 3%   22.72µ ± 4%  -11.66%
RuneWidth768EastAsian/lut-16          3.601µ ± 5%   2.916µ ± 6%  -19.04%
String1WidthAll/regular-16            46.28m ±11%   34.47m ± 7%  -25.53%
String1WidthAll/lut-16                23.57m ± 9%   21.20m ± 7%  -10.04%
String1WidthAllEastAsian/regular-16   58.22m ± 2%   43.19m ± 2%  -25.81%
String1WidthAllEastAsian/lut-16       23.22m ± 4%   20.73m ± 3%  -10.69%
String1Width768/lut-16                14.45µ ± 2%   13.15µ ± 5%   -8.96%
String1Width768EastAsian/regular-16   37.55µ ± 4%   33.38µ ± 7%  -11.11%
String1Width768EastAsian/lut-16       14.38µ ± 5%   12.82µ ± 6%  -10.83%
geomean                               481.0µ        395.9µ       -17.69%

- Merge combining+nonprint into a single zerowidth table at init,
  reducing two binary searches to one for zero-width rune detection
- Merge ambiguous+doublewidth into widewidth table for EastAsian path
- Remove redundant narrow table check in non-EastAsian path
- Eliminate inTables variadic function overhead by using direct calls
- Simplify EastAsian StrictEmojiNeutral dead code path
- Add ASCII fast path in StringWidth to bypass grapheme segmenter
- Use strings.Builder in Wrap to avoid O(n²) string concatenation

Benchstat (8 samples):

  RuneWidthAll/regular         -34.49%
  RuneWidthAllEastAsian/regular -44.79%
  String1WidthAll/regular      -25.53%
  geomean                      -17.69%
@mattn mattn merged commit 17a7a03 into master Apr 8, 2026
9 checks passed
@mattn mattn deleted the optimize-runewidth-performance branch April 8, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant