Skip to content

Unicode 17 support #43

@aymanbagabas

Description

@aymanbagabas

Hey @clipperhouse

Thank you for the awesome library, Bubble Tea v2 and other Charm projects depend heavily on this!

I just noticed Unicode 17 has been out for some time now, and many popular libraries like libutf8proc and libgrapheme updated their specs to support version 17. I tried to generate the new specs but some tests failed.

~/Source/clipperhouse/uax29 master*
› cd internal/gen

~/Source/clipperhouse/uax29/internal/gen master*
› go run .
https://www.unicode.org/Public/17.0.0/ucd/emoji/emoji-data.txt
https://www.unicode.org/Public/17.0.0/ucd/DerivedCoreProperties.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakTest.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/GraphemeBreakTest.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/SentenceBreakProperty.txt
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/SentenceBreakTest.txt

~/Source/clipperhouse/uax29/internal/gen master*
› cd -
~/Source/clipperhouse/uax29

~/Source/clipperhouse/uax29 master*
› go test ./...
?   	github.com/clipperhouse/uax29/v2	[no test files]
--- FAIL: TestBytesUnicode (0.00s)
    bytes_test.go:29:
        	for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
        	expected  [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
        	got       [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]]
        	spec      ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
    bytes_test.go:29:
        	for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
        	expected  [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
        	got       [[225 172 167] [225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]]
        	spec      ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
    bytes_test.go:29:
        	for input [225 158 149 225 159 146 225 158 175 225 158 152]
        	expected  [[225 158 149 225 159 146 225 158 175] [225 158 152]]
        	got       [[225 158 149 225 159 146 225 158 175 225 158 152]]
        	spec      ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
    bytes_test.go:29:
        	for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
        	expected  [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
        	got       [[225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]]
        	spec      ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
--- FAIL: TestStringUnicode (0.00s)
    string_test.go:34:
        	for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
        	expected  [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
        	got       [ᬒᬁ ᬲ᭄ᬯᬲ᭄ᬢ᭄ᬬᬲ᭄ᬢᬸ]
        	spec      ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
    string_test.go:34:
        	for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
        	expected  [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
        	got       [ᬧ ᬓ᭄ᬋᬋᬄ]
        	spec      ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
    string_test.go:34:
        	for input [225 158 149 225 159 146 225 158 175 225 158 152]
        	expected  [[225 158 149 225 159 146 225 158 175] [225 158 152]]
        	got       [ផ្ឯម]
        	spec      ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
    string_test.go:34:
        	for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
        	expected  [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
        	got       [ហ្ឫទ័យ]
        	spec      ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
--- FAIL: TestScannerUnicode (0.01s)
    reader_test.go:36:
        	for input [225 172 146 225 172 129 225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]
        	expected  [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175] [225 172 178 225 173 132 225 172 162 225 173 132 225 172 172] [225 172 178 225 173 132 225 172 162 225 172 184]]
        	got       [[225 172 146 225 172 129] [225 172 178 225 173 132 225 172 175 225 172 178 225 173 132 225 172 162 225 173 132 225 172 172 225 172 178 225 173 132 225 172 162 225 172 184]]
        	spec      ÷ [0.2] BALINESE LETTER OKARA TEDUNG (XXmLinkingConsonantmExtPict) × [9.0] BALINESE SIGN ULU CANDRA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER WA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER YA (LinkingConsonant) ÷ [999.0] BALINESE LETTER SA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER TA (LinkingConsonant) × [9.0] BALINESE VOWEL SIGN SUKU (Extend_ConjunctExtendermConjunctLinker) ÷ [0.3]
    reader_test.go:36:
        	for input [225 172 167 225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]
        	expected  [[225 172 167] [225 172 147 225 173 132 225 172 139] [225 172 139 225 172 132]]
        	got       [[225 172 167] [225 172 147 225 173 132 225 172 139 225 172 139 225 172 132]]
        	spec      ÷ [0.2] BALINESE LETTER PA (LinkingConsonant) ÷ [999.0] BALINESE LETTER KA (LinkingConsonant) × [9.0] BALINESE ADEG ADEG (Extend_ConjunctLinker) × [9.3] BALINESE LETTER RA REPA (LinkingConsonant) ÷ [999.0] BALINESE LETTER RA REPA (LinkingConsonant) × [9.1] BALINESE SIGN BISAH (SpacingMark) ÷ [0.3]
    reader_test.go:36:
        	for input [225 158 149 225 159 146 225 158 175 225 158 152]
        	expected  [[225 158 149 225 159 146 225 158 175] [225 158 152]]
        	got       [[225 158 149 225 159 146 225 158 175 225 158 152]]
        	spec      ÷ [0.2] KHMER LETTER PHA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL QE (LinkingConsonant) ÷ [999.0] KHMER LETTER MO (LinkingConsonant) ÷ [0.3]
    reader_test.go:36:
        	for input [225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]
        	expected  [[225 158 160 225 159 146 225 158 171] [225 158 145 225 159 144] [225 158 153]]
        	got       [[225 158 160 225 159 146 225 158 171 225 158 145 225 159 144 225 158 153]]
        	spec      ÷ [0.2] KHMER LETTER HA (LinkingConsonant) × [9.0] KHMER SIGN COENG (Extend_ConjunctLinker) × [9.3] KHMER INDEPENDENT VOWEL RY (LinkingConsonant) ÷ [999.0] KHMER LETTER TO (LinkingConsonant) × [9.0] KHMER SIGN SAMYOK SANNYA (Extend_ConjunctExtendermConjunctLinker) ÷ [999.0] KHMER LETTER YO (LinkingConsonant) ÷ [0.3]
    reader_test.go:45: passed 762, failed 4
FAIL
FAIL	github.com/clipperhouse/uax29/v2/graphemes	0.577s
ok  	github.com/clipperhouse/uax29/v2/internal/iterators	(cached)
?   	github.com/clipperhouse/uax29/v2/internal/iterators/filter	[no test files]
ok  	github.com/clipperhouse/uax29/v2/phrases	(cached)
ok  	github.com/clipperhouse/uax29/v2/sentences	(cached)
ok  	github.com/clipperhouse/uax29/v2/words	(cached)
FAIL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions