Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an ASCII fast path optimization to grapheme cluster iteration by inlining the iterator implementation directly into the graphemes package. Printable ASCII bytes (0x20-0x7E) are treated as their own graphemes when not followed by non-ASCII bytes, avoiding the overhead of full Unicode grapheme cluster parsing for common cases.
Changes:
- Inlined iterator implementation with ASCII hot path optimization in
graphemes/iterator.go - Added ASCII-specific benchmark to measure optimization effectiveness
- Bumped minimum Go version from 1.18 to 1.20
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| graphemes/iterator.go | Replaced dependency on internal/iterators with inline implementation featuring ASCII fast path for printable ASCII characters |
| graphemes/comparative/comparative_test.go | Added BenchmarkGraphemesASCII to measure performance on pure ASCII text, improved existing benchmark structure |
| go.mod | Bumped minimum Go version requirement from 1.18 to 1.20 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db3707d to
6881935
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Copilot <175728472+Copilot@users.noreply.github.com>
6881935 to
965d1f8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7ba50ad to
8198a4a
Compare
graphemesPrintable ASCII bytes are their own graphemes, no need to call the real
splitFuncfor those. Gotta check the next byte to ensure it’s not a combining mark or something where the real grapheme logic would legitimately join it. To do this,graphemesgets its own iterator.Looks like a 20% perf improvement for multilingual text and 3x improvement for pure ASCII.
wordsApply similar to words, runs of adjacent ASCI alphanumeric followed by ASCII space. Looks like 2x for pure ASCII, around 5% for multilingual.
phrasesASCII optimization for runs of alphanumeric or space.
sentencesASCII optimization for runs of alphanumeric or space.
Also ASCII optimizations for
Firstmethods in all packages, with tests.