-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description
We found a weird behavior in the IndexOf method of strings.
The official Unicode specs regarding Special Casing claims that the upper case of the character st is ST.
And as expected the following comparison, returns true.
string.Equals("est", "est", System.StringComparison.InvariantCultureIgnoreCase) // returns TrueWhen we go to the IndexOf method, the following code returns 0, which is expected, as we saw that both string are equivalent under invariant culture and ignore case.
"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase) // returns 0However, when we use any letters from the English alphabet or spaces at the beginning of the string, we starting to get -1. But if we use some other letters (Cyrillic, Arabic, Hebrew, Latin with umlauts), it gives proper result again. Example in the next section.
Reproduction Steps
"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
" st".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅Expected behavior
The expected behavior is that IndexOf will behave in the same way no matter the other characters in the string. So in the examples where we got -1
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)We will get 1 and 3, respectively.
Actual behavior
The actual behavior is that IndexOf with InvariantCultureIgnoreCase is not constant and may return wrong output based on surrounding string content.
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)Regression?
We get the observed results under .NET 5 and above. In .NET Framework 4.7.2 we get the expected results:
"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
" st".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅Known Workarounds
Turning of the usage of ICU can solve that issue, but this is unwanted workaround, since other part of the code uses the ICU logic.
Configuration
.NET 5, .NET 6, .NET 7, .NET 8
Windows 11
x64, x86
Other information
No response