Skip to content

Bug in IndexOf with InvariantCultureIgnoreCase in .NET 5 and above? #108424

@rhalaly

Description

@rhalaly

Description

We found a weird behavior in the IndexOf method of strings.

The official Unicode specs regarding Special Casing claims that the upper case of the character is ST.

And as expected the following comparison, returns true.

string.Equals("est", "est", System.StringComparison.InvariantCultureIgnoreCase) // returns True

When we go to the IndexOf method, the following code returns 0, which is expected, as we saw that both string are equivalent under invariant culture and ignore case.

"est".IndexOf("est",  System.StringComparison.InvariantCultureIgnoreCase) // returns 0

However, when we use any letters from the English alphabet or spaces at the beginning of the string, we starting to get -1. But if we use some other letters (Cyrillic, Arabic, Hebrew, Latin with umlauts), it gives proper result again. Example in the next section.

Reproduction Steps

"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
" st".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅

Expected behavior

The expected behavior is that IndexOf will behave in the same way no matter the other characters in the string. So in the examples where we got -1

" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)

We will get 1 and 3, respectively.

Actual behavior

The actual behavior is that IndexOf with InvariantCultureIgnoreCase is not constant and may return wrong output based on surrounding string content.

" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)

Regression?

We get the observed results under .NET 5 and above. In .NET Framework 4.7.2 we get the expected results:

"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
" st".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅

Known Workarounds

Turning of the usage of ICU can solve that issue, but this is unwanted workaround, since other part of the code uses the ICU logic.

Configuration

.NET 5, .NET 6, .NET 7, .NET 8
Windows 11
x64, x86

Other information

No response

Metadata

Metadata

Assignees

Labels

area-System.Globalizationin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions