Skip to content

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

@GrabYourPitchforks

Description

@GrabYourPitchforks

The CompareInfo.IndexOf(..., CompareOptions.OrdinalIgnoreCase) functions on ICU use u_toupper, though they should really use u_caseFold. Case mapping (e.g., u_toupper and u_tolower) are used when converting strings to a standard casing. Case folding (u_caseFold) should be used when comparing strings for ordinal / non-linguistic equality. In particular, we should use simple case folding instead of full case folding.

Example line that exhibits the problem:

return u_toupper(one) == u_toupper(two);

This means that, for instance, the strings "ß" (U+00DF LATIN SMALL LETTER SHARP S) and "ẞ" (U+1E9E LATIN CAPITAL LETTER SHARP S) will be treated as unequal under an OrdinalIgnoreCase comparer.

To be fair, the current behavior of performing an uppercase mapping does match the Windows NLS behavior, but the Windows NLS behavior is a legacy behavior that for compatibility reasons cannot be updated to match Unicode best practices as described in https://unicode.org/faq/casemap_charprop.html. In the ICU code paths we should follow the Unicode recommendations as often as we can.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions