Optimize RuneWidth and StringWidth performance#95
Merged
Conversation
- Merge combining+nonprint into a single zerowidth table at init, reducing two binary searches to one for zero-width rune detection - Merge ambiguous+doublewidth into widewidth table for EastAsian path - Remove redundant narrow table check in non-EastAsian path - Eliminate inTables variadic function overhead by using direct calls - Simplify EastAsian StrictEmojiNeutral dead code path - Add ASCII fast path in StringWidth to bypass grapheme segmenter - Use strings.Builder in Wrap to avoid O(n²) string concatenation Benchstat (8 samples): RuneWidthAll/regular -34.49% RuneWidthAllEastAsian/regular -44.79% String1WidthAll/regular -25.53% geomean -17.69%
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The non-LUT path of RuneWidth was performing multiple binary searches per rune across combining, nonprint, doublewidth, and other tables. By merging tables that return the same width at init time, the number of searches is reduced. Specifically, combining and nonprint are merged into a single zerowidth table, and ambiguous and doublewidth are merged into a widewidth table used only in the EastAsian path where both return width 2.
The redundant narrow table lookup in the non-EastAsian path was also removed since its result (width 1) is identical to the default case. The variadic inTables function, which allocated a slice on each call, has been replaced with direct || expressions.
StringWidth now has a fast path for pure ASCII strings that bypasses the grapheme segmenter entirely. Wrap replaces string concatenation with strings.Builder, improving from O(n²) to O(n).
The merged tables are computed at init from the existing generated tables, so no manual maintenance is needed when the Unicode version is updated. TestRuneWidthChecksums confirms via SHA256 that all rune widths remain identical.