1

I have already checked this document:

http://www.unicode.org/Public/8.0.0/ucd/EastAsianWidth.txt

and this one:

https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

(actually, wcwitdh is Unicode 5.0 version of east asian width).

But some words, such as ("\u2027" in C++), are ambiguous in east asian width.

In Linux, it is as narrow as plain alphabet (such as a).

However, in Windows, it is as wide as Chinese word (such as ).

Besides, for some words, such as ® ("\u00ae" in C++), are also ambiguous in east asian width.

But ® is as narrow as plain alphabet in both Linux and Windows.

How to correctly determine if a Unicode word is wide or narrow in different platforms? (for ambiguous words).

4
  • Windows does not deal in UTF-8 to begin with, APIs operate on strings encoded in either ANSI (using a local user-defined charset) or UTF-16. Linux is primarily UTF-8 instead. You need to pay attention to the charset encoding that you are working with. ANSI, UTF-8, UTF-16, they all encode the same data in different ways. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Commented Jun 10, 2016 at 20:10
  • What's the problem? If the width of a character is "ambiguous" in EastAsianWidth.txt, you need to look at the surrounding context to determine the width: unicode.org/reports/tr11/#Ambiguous Commented Jun 11, 2016 at 3:07
  • @RemyLebeau, thanks, but the problem is not encoding. What I want to know is how many columns (the width after cout or printf in terminal) are occupied by a Unicode word. I am not talking about how to determine bytes count of a Unicode. But thank you for your replying. Commented Jun 11, 2016 at 3:08
  • @一二三 I will read the paragraph more carefully. Commented Jun 11, 2016 at 3:11

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.