-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and motivation
In .NET Core, string hash codes are always randomized. This is critical to avoid certain kinds of attacks when adding arbitrary, untrusted inputs into types like Dictionary<,> and HashSet<>. However, for situations where the inputs are trusted, the overhead of these randomized hash codes makes them measurably more expensive than their non-randomized counterparts. As such, Dictionary<,> and HashSet<> both start out with non-randomized hash codes and only upgrade to randomized ones when enough collisions are detected. Such a capability is valuable for other collection types as well, but the raw primitives (the non-randomized hash code implementations) aren't trivial to implement efficiently and aren't exposed.
API Proposal
namespace System
{
public sealed class String
{
public static int GetHashCode(ReadOnlySpan<char> value);
public static int GetHashCode(ReadOnlySpan<char> value, StringComparison comparisonType);
+ public static int GetHashCodeNonRandomized(ReadOnlySpan<char> value);
+ public static int GetHashCodeNonRandomized(ReadOnlySpan<char> value, StringComparison comparisonType);
}
}API Usage
int hashcode = string.GetHashCodeNonRandomized(value, StringComparison.OrdinalIgnoreCase);Alternative Designs
We could instead or in addition expose StringComparer singletons:
namespace System
{
public abstract class StringComparer
{
public static StringComparer Ordinal { get; }
public static StringComparer OrdinalIgnoreCase { get; }
+ public static StringComparer OrdinalNonRandomized { get; }
+ public static StringComparer OrdinalIgnoreCaseNonRandomized { get; }
}
}If we did that instead of the proposed APIs, we should also consider adding Equals/GetHashCode overloads for ReadOnlySpan<char> to StringComparer (something we might want to do anyway as part of #27229).
Risks
A risk could be developers defaulting to using these instead of the randomized implementations in situations where the randomized implementations are warranted. However, some developers are already writing their own hash implementations to avoid the randomized overhead, and their implementations may be worse or less efficient than what's already in the box.