Skip to content

Why we use CompareInfo.Invariant for Ordinal? #27738

@iSazonov

Description

@iSazonov

I am not so familiar with unicode in-depth and I am amazed to see that
public int IndexOf(char value, StringComparison comparisonType)
and all IndexOf overloads for strings like
public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)
call CompareInfo.Invariant.IndexOf() for both InvariantCulture (InvariantCultureIgnoreCase) and Ordinal (OrdinalIgnoreCase).

In the same time we have IndexOf overloads for chars which use really ordinal comparison from SpanHelpers with SIMD accelerations.
public int IndexOf(char value) => SpanHelpers.IndexOf(ref _firstChar, value, Length);

We even have [public static int CompareOrdinal(string strA, int indexA, string strB, int indexB, int length) }(https://source.dot.net/#System.Private.CoreLib/shared/System/String.Comparison.cs,341906879aa2feb9).
(internally SpanHelpers with SIMD accelerations.)


I'd expect the same for strings too if StringComparison is Ordinal or OrdinalIgnoreCase.


In the case we could implement very fast SIMD accelerated methods like IndexOf(), LastIndexOf(), Replace() and perhaps some more.

For OrdinalIgnoreCase we could implement unicode simple case folding (#17233). With this enhancement we could to get great wins in RegEx (see Rust RegEx lib) and simplify Boyer-Moore implementation (which is also useful in IndexOf() and Replace() - #20674 and https://github.com/dotnet/coreclr/issues/6918).
Some languages (like PowerShell) could get great wins in identifier comparisons.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions