-
Notifications
You must be signed in to change notification settings - Fork 6
Unicode 17 support #43
Copy link
Copy link
Closed
Description
Hey @clipperhouse
Thank you for the awesome library, Bubble Tea v2 and other Charm projects depend heavily on this!
I just noticed Unicode 17 has been out for some time now, and many popular libraries like libutf8proc and libgrapheme updated their specs to support version 17. I tried to generate the new specs but some tests failed.
~/Source/clipperhouse/uax29 master*
› cd internal/gen
~/Source/clipperhouse/uax29/internal/gen master*
› go run .
https://www.unicode.org/Public/17.0.0/ucd/emoji/emoji-data.txt
https://www.unicode.org/Public/17.0.0/ucd/DerivedCoreProperties.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakTest.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/GraphemeBreakTest.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/SentenceBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/SentenceBreakTest.txt
~/Source/clipperhouse/uax29/internal/gen master*
› cd -
~/Source/clipperhouse/uax29
~/Source/clipperhouse/uax29 master*
› go test ./...
? github.com/clipperhouse/uax29/v2 [no test files]
--- FAIL: TestBytesUnicode (0.00s)
bytes_test.go:29:
for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
expected [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
got [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]]
spec ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
bytes_test.go:29:
for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
expected [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
got [[225 172 167] [225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]]
spec ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
bytes_test.go:29:
for input [225 158 149 225 159 146 225 158 175 225 158 152]
expected [[225 158 149 225 159 146 225 158 175] [225 158 152]]
got [[225 158 149 225 159 146 225 158 175 225 158 152]]
spec ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
bytes_test.go:29:
for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
expected [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
got [[225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]]
spec ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
--- FAIL: TestStringUnicode (0.00s)
string_test.go:34:
for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
expected [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
got [ᬒᬁ ᬲ᭄ᬯᬲ᭄ᬢ᭄ᬬᬲ᭄ᬢᬸ]
spec ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
string_test.go:34:
for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
expected [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
got [ᬧ ᬓ᭄ᬋᬋᬄ]
spec ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
string_test.go:34:
for input [225 158 149 225 159 146 225 158 175 225 158 152]
expected [[225 158 149 225 159 146 225 158 175] [225 158 152]]
got [ផ្ឯម]
spec ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
string_test.go:34:
for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
expected [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
got [ហ្ឫទ័យ]
spec ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
--- FAIL: TestScannerUnicode (0.01s)
reader_test.go:36:
for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
expected [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
got [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]]
spec ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
reader_test.go:36:
for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
expected [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
got [[225 172 167] [225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]]
spec ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
reader_test.go:36:
for input [225 158 149 225 159 146 225 158 175 225 158 152]
expected [[225 158 149 225 159 146 225 158 175] [225 158 152]]
got [[225 158 149 225 159 146 225 158 175 225 158 152]]
spec ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
reader_test.go:36:
for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
expected [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
got [[225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]]
spec ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
reader_test.go:45: passed 762, failed 4
FAIL
FAIL github.com/clipperhouse/uax29/v2/graphemes 0.577s
ok github.com/clipperhouse/uax29/v2/internal/iterators (cached)
? github.com/clipperhouse/uax29/v2/internal/iterators/filter [no test files]
ok github.com/clipperhouse/uax29/v2/phrases (cached)
ok github.com/clipperhouse/uax29/v2/sentences (cached)
ok github.com/clipperhouse/uax29/v2/words (cached)
FAIL
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels