Skip to content

vim-patch:9.1.1276: inline word diff treats multibyte chars as word char#33323

Merged
zeertzjq merged 1 commit intoneovim:masterfrom
zeertzjq:vim-9.1.1276
Apr 5, 2025
Merged

vim-patch:9.1.1276: inline word diff treats multibyte chars as word char#33323
zeertzjq merged 1 commit intoneovim:masterfrom
zeertzjq:vim-9.1.1276

Conversation

@zeertzjq
Copy link
Member

@zeertzjq zeertzjq commented Apr 5, 2025

vim-patch:9.1.1276: inline word diff treats multibyte chars as word char

Problem: inline word diff treats multibyte chars as word char
(after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
(Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: vim/vim#16881 (diff inline highlight)
closes: vim/vim#17050

vim/vim@9aa120f

Co-authored-by: Yee Cheng Chin ychin.git@gmail.com

Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: vim/vim#16881 (diff inline highlight)
closes: vim/vim#17050

vim/vim@9aa120f

Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
@github-actions github-actions bot added the diff label Apr 5, 2025
@zeertzjq zeertzjq changed the title vim-patch:9.1.1276 vim-patch:9.1.1276: inline word diff treats multibyte chars as word char Apr 5, 2025
@github-actions github-actions bot added the vim-patch See https://neovim.io/doc/user/dev_vimpatch.html label Apr 5, 2025
@github-actions github-actions bot requested a review from lewis6991 April 5, 2025 00:06
@zeertzjq zeertzjq merged commit e8785c2 into neovim:master Apr 5, 2025
34 checks passed
@zeertzjq zeertzjq deleted the vim-9.1.1276 branch April 5, 2025 01:42
@github-actions github-actions bot removed the request for review from lewis6991 April 5, 2025 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diff vim-patch See https://neovim.io/doc/user/dev_vimpatch.html

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant