feat: Support Unicode 17.0.0 by Martin005 · Pull Request #157 · unicode-rs/unicode-segmentation

Martin005 · 2026-03-24T15:24:53Z

This PR updates Unicode support to version 17.0.0 and introduces fixes to word boundary detection, particularly for emoji and Zero Width Joiner (ZWJ) handling. The changes enhance Unicode compliance and fix edge cases in grapheme and word segmentation logic.

Unicode version update:

Updated the UNICODE_VERSION in scripts/unicode.py from 16.0.0 to 17.0.0.
Generated src/tables.rs and tests/testdata/mod.rs files using the Unicode version 17.0.0.

Word boundary and emoji handling improvements:

Added a new method next_significant_is_emoji to UWordBounds in src/word.rs, which checks if the next significant character is an emoji, skipping over Extend and Format characters.
Modified the double-ended iterator for UWordBounds to skip ZWJ characters that are followed by an emoji using the next_significant_is_emoji method.
Adjusted the state machine in UWordBounds to move the handling of emoji characters to a later match case, ensuring correct state transitions for emoji.
Those modifications fixed the failing test_words test.

Grapheme segmentation fix:

Corrected the logic in GraphemeCursor (in src/grapheme.rs) to update incb_linker_count instead of ris_count. This fixed the failing test_grapheme test.

Supersedes #156

src/word.rs

Manishearth

Oh, also, could you add comments on the added code explaining their relationship with the rules? Especially the out-of-state-machine stuff around emoji.

But I can still land if you'd just prefer to land this, this is correct as is as far as I can tell.

Martin005 · 2026-03-24T16:11:28Z

Oh, also, could you add comments on the added code explaining their relationship with the rules? Especially the out-of-state-machine stuff around emoji.

But I can still land if you'd just prefer to land this, this is correct as is as far as I can tell.

@Manishearth Done: 3a391bf, hope it is 100% correct ✅

Manishearth · 2026-03-24T16:11:33Z

Thank you so much!

Manishearth · 2026-03-24T16:26:25Z

And published a new version

Martin005 · 2026-03-24T16:33:03Z

@Manishearth Awesome, thank you so much! By the way, I see you didn't update the changelog in Readme (and it's also lacking updates for the 1.12.0 version). And haven't created a tag for 1.13.0.

Manishearth · 2026-03-24T20:27:40Z

Ah, hadn't pushed the tag. Opened #159 for the changelog.

feat: Support Unicode 17.0.0

4790e25

Manishearth reviewed Mar 24, 2026

View reviewed changes

src/word.rs Outdated Show resolved Hide resolved

feat: Store right_significant_is_emoji state instead

e035894

Martin005 force-pushed the unicode-17 branch from 344dd01 to e035894 Compare March 24, 2026 15:53

Manishearth reviewed Mar 24, 2026

View reviewed changes

src/word.rs Outdated Show resolved Hide resolved

fix: Move check for right_significant_is_emoji above state machine

c9ec06a

Manishearth reviewed Mar 24, 2026

View reviewed changes

doc: Add comments explaining connection to word boundary rules

3a391bf

Manishearth approved these changes Mar 24, 2026

View reviewed changes

Manishearth merged commit 13862d8 into unicode-rs:master Mar 24, 2026
2 checks passed

Martin005 deleted the unicode-17 branch March 24, 2026 16:20

Martin005 mentioned this pull request Mar 25, 2026

chore(deps): update unicode-segmentation to v1.13.2 WeblateOrg/unicode-segmentation-rs#154

Merged

orhun mentioned this pull request Mar 30, 2026

build(deps): bump unicode-segmentation from 1.12.0 to 1.13.2 ratatui/ratatui#2468

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Unicode 17.0.0#157

feat: Support Unicode 17.0.0#157
Manishearth merged 4 commits intounicode-rs:masterfrom
Martin005:unicode-17

Martin005 commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Manishearth left a comment

Uh oh!

Martin005 commented Mar 24, 2026

Uh oh!

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Martin005 commented Mar 24, 2026 •

edited

Loading

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Martin005 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Manishearth left a comment

Choose a reason for hiding this comment

Uh oh!

Martin005 commented Mar 24, 2026

Uh oh!

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Martin005 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Martin005 commented Mar 24, 2026 •

edited

Loading

Martin005 commented Mar 24, 2026 •

edited

Loading