-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Description
Lets say I have two null-terminated C strings containing UTF8 encoded text. The case insensitive UTF8 compare function will not correctly compare these two strings. Test case:
// Create test string "A"
const wchar_t* wa = L"åäö";
std::string sa;
UnicodeConverter::toUTF8(wa, sa);
const char* ca = sa.c_str();
// Create test string "B"
const wchar_t* wb = L"ÅÄÖ";
std::string sb;
UnicodeConverter::toUTF8(wb, sb);
const char* cb = sb.c_str();
// Comparing the std::strings works as expected
bool sr = UTF8::icompare(sa, sb) == 0;
poco_assert (sr);
// Comparing the raw null-terminated strings does not work as expected
bool cr = UTF8::icompare(ca, cb) == 0;
poco_assert (cr);The reason is quite obvious when looking at the source code for the icompare function. For the first case (both std::string) TextIterators are used for both arguments. Hence it iterates both arguments by character (not byte).
In the second case (null-term strings) a TextIterator is only used for the first argument. Hence, it iterates the first argument by character, and iterates the second argument by byte, which will obviously not work for wide UTF8 characters.
int UTF8::icompare(const std::string& str,
std::string::size_type pos,
std::string::size_type n,
const std::string::value_type* ptr)Perhaps this is "as-designed", but it seems a bit strange to me.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels