-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Why we use CompareInfo.Invariant for Ordinal? #27738
Description
I am not so familiar with unicode in-depth and I am amazed to see that
public int IndexOf(char value, StringComparison comparisonType)
and all IndexOf overloads for strings like
public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)
call CompareInfo.Invariant.IndexOf() for both InvariantCulture (InvariantCultureIgnoreCase) and Ordinal (OrdinalIgnoreCase).
In the same time we have IndexOf overloads for chars which use really ordinal comparison from SpanHelpers with SIMD accelerations.
public int IndexOf(char value) => SpanHelpers.IndexOf(ref _firstChar, value, Length);
We even have [public static int CompareOrdinal(string strA, int indexA, string strB, int indexB, int length) }(https://source.dot.net/#System.Private.CoreLib/shared/System/String.Comparison.cs,341906879aa2feb9).
(internally SpanHelpers with SIMD accelerations.)
I'd expect the same for strings too if StringComparison is Ordinal or OrdinalIgnoreCase.
In the case we could implement very fast SIMD accelerated methods like IndexOf(), LastIndexOf(), Replace() and perhaps some more.
For OrdinalIgnoreCase we could implement unicode simple case folding (#17233). With this enhancement we could to get great wins in RegEx (see Rust RegEx lib) and simplify Boyer-Moore implementation (which is also useful in IndexOf() and Replace() - #20674 and https://github.com/dotnet/coreclr/issues/6918).
Some languages (like PowerShell) could get great wins in identifier comparisons.