UTF8::icompare unexpected behavior

Lets say I have two null-terminated C strings containing UTF8 encoded text. The case insensitive UTF8 compare function will not correctly compare these two strings. Test case:

``` c++
// Create test string "A"
const wchar_t* wa = L"åäö";
std::string sa;
UnicodeConverter::toUTF8(wa, sa);
const char* ca = sa.c_str();

// Create test string "B"
const wchar_t* wb = L"ÅÄÖ";
std::string sb;
UnicodeConverter::toUTF8(wb, sb);
const char* cb = sb.c_str();

// Comparing the std::strings works as expected
bool sr = UTF8::icompare(sa, sb) == 0;
poco_assert (sr);

// Comparing the raw null-terminated strings does not work as expected
bool cr = UTF8::icompare(ca, cb) == 0;
poco_assert (cr);
```

The reason is quite obvious when looking at the source code for the icompare function. For the first case (both std::string) TextIterators are used for both arguments. Hence it iterates both arguments by character (not byte). 

In the second case (null-term strings) a TextIterator is only used for the first argument. Hence, it iterates the first argument by character, and iterates the second argument by byte, which will obviously not work for wide UTF8 characters.

``` c++
int UTF8::icompare(const std::string& str, 
    std::string::size_type pos, 
    std::string::size_type n, 
    const std::string::value_type* ptr)
```

Perhaps this is "as-designed", but it seems a bit strange to me.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF8::icompare unexpected behavior #254

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF8::icompare unexpected behavior #254

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions