Skip to content

Try to fix Unicode 17 issues#156

Closed
Manishearth wants to merge 2 commits intomasterfrom
unicode-17
Closed

Try to fix Unicode 17 issues#156
Manishearth wants to merge 2 commits intomasterfrom
unicode-17

Conversation

@Manishearth
Copy link
Copy Markdown
Member

The state machines for the reverse iterators are tricky. I managed to fix the specific issue in test_words where an emoji zwj sequence is not occurring but we lose our actual state by preemptively checking.

Unfortunately this causes reliable failures in the quickcheck test.

Haven't looked at the grapheme one. I don't plan on working on this soon, so if someone wants to have a look they're welcome to.

@Martin005
Copy link
Copy Markdown
Contributor

Martin005 commented Mar 20, 2026

@Manishearth The test_grapheme test fails because of a mistake at the following lines:

                if let Some(incb_linker_count) = self.incb_linker_count {
                    self.ris_count = if incb_linker_count > 0 && crate::tables::is_incb_linker(ch) {
                        Some(incb_linker_count - 1)
                    } else if crate::tables::derived_property::InCB_Extend(ch) {
                        Some(incb_linker_count)
                    } else {
                        None
                    };
                }

Instead of self.ris_count, the code should assign the value to self.incb_linker_count. The current code corrupts the ris_count cache and leaves incb_linker_count stale – causing incorrect GB9c boundary decisions when iterating backwards through Balinese (and other InCB=Consonant/Linker) sequences.

After that modification, the test_grapheme passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants