make unicode_width() understand more Unicode characters#679
make unicode_width() understand more Unicode characters#679jonas merged 1 commit intojonas:masterfrom
Conversation
10b6b24 to
42173c7
Compare
|
It looks like this also "further" fixes the emoji test. (OK, I didn't QA that one very well). BTW, I looked into switching to https://github.com/JuliaLang/utf8proc at some point. Do you know that one? While it is quite large/heavy it would also improve support for |
|
Odd that I can't duplicate the Travis failure on OS X. Force-pushing a hack now where control characters are left as before. I haven't used utf8proc. libicu is very complete, which is nice because Unicode is entirely made of edge cases. But libicu is mostly not UTF8-oriented; you have to convert to/from UTF16. |
c7c9091 to
e38be72
Compare
* several more width-2 characters * many more width-0 characters * change control characters to width-0 * don't change NUL but make it explicit with notes * doc some apparent bugs
e38be72 to
9c80109
Compare
|
Your emoji test is great because it catches the issue which is now worked around and commented BUG. It may be a difficult bug to solve but it shouldn't be hard to narrow it down to a TODO test. I have noted some platforms, but it could easily be related to libiconv version, locale environment vars, etc. In the meantime this patch should only improve correctness, where correctness is guessing what the terminal is going to do. |
|
This is amazing. Most of the Unicode/UTF-8 code was copied from ELinks with very minimal changes. Very nice to have this improved. |
Synced up to a newer version of Markus Kuhn's
wcwidth().Example improvements
︖"PRESENTATION FORM FOR VERTICAL QUESTION MARK" was width 1, now 2ט֑Tet composed with "HEBREW ACCENT ETNAHTA" was width 2, now 1