Skip to content

[API Proposal]: string.GetHashCodeNonRandomized #77679

@stephentoub

Description

@stephentoub

Background and motivation

In .NET Core, string hash codes are always randomized. This is critical to avoid certain kinds of attacks when adding arbitrary, untrusted inputs into types like Dictionary<,> and HashSet<>. However, for situations where the inputs are trusted, the overhead of these randomized hash codes makes them measurably more expensive than their non-randomized counterparts. As such, Dictionary<,> and HashSet<> both start out with non-randomized hash codes and only upgrade to randomized ones when enough collisions are detected. Such a capability is valuable for other collection types as well, but the raw primitives (the non-randomized hash code implementations) aren't trivial to implement efficiently and aren't exposed.

API Proposal

namespace System
{
    public sealed class String
    {
        public static int GetHashCode(ReadOnlySpan<char> value);
        public static int GetHashCode(ReadOnlySpan<char> value, StringComparison comparisonType);

+       public static int GetHashCodeNonRandomized(ReadOnlySpan<char> value);
+       public static int GetHashCodeNonRandomized(ReadOnlySpan<char> value, StringComparison comparisonType);
    }
}

API Usage

int hashcode = string.GetHashCodeNonRandomized(value, StringComparison.OrdinalIgnoreCase);

Alternative Designs

We could instead or in addition expose StringComparer singletons:

namespace System
{
    public abstract class StringComparer
    {
        public static StringComparer Ordinal { get; }
        public static StringComparer OrdinalIgnoreCase { get; }

+       public static StringComparer OrdinalNonRandomized { get; }
+       public static StringComparer OrdinalIgnoreCaseNonRandomized { get; }
    }
}

If we did that instead of the proposed APIs, we should also consider adding Equals/GetHashCode overloads for ReadOnlySpan<char> to StringComparer (something we might want to do anyway as part of #27229).

Risks

A risk could be developers defaulting to using these instead of the randomized implementations in situations where the randomized implementations are warranted. However, some developers are already writing their own hash implementations to avoid the randomized overhead, and their implementations may be worse or less efficient than what's already in the box.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions