Skip to content

Proposal: Add System.HashCode to make it easier to generate good hash codes. #19621

@jamesqo

Description

@jamesqo

Update 6/16/17: Looking for volunteers

The API shape has been finalized. However, we're still deciding on the best hash algorithm out of a list of candidates to use for the implementation, and we need someone to help us measure the throughput/distribution of each algorithm. If you'd like to take that role up, please leave a comment below and @karelz will assign this issue to you.

Update 6/13/17: Proposal accepted!

Here's the API that was approved by @terrajobst at https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321:

// Will live in the core assembly
// .NET Framework : mscorlib
// .NET Core      : System.Runtime / System.Private.CoreLib
namespace System
{
    public struct HashCode
    {
        public static int Combine<T1>(T1 value1);
        public static int Combine<T1, T2>(T1 value1, T2 value2);
        public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3);
        public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4);
        public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5);
        public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8);

        public void Add<T>(T value);
        public void Add<T>(T value, IEqualityComparer<T> comparer);

        [Obsolete("Use ToHashCode to retrieve the computed hash code.", error: true)]
        [EditorBrowsable(Never)]
        public override int GetHashCode();

        public int ToHashCode();
    }
}

The original text of this proposal follows.

Rationale

Generating a good hash code should not require use of ugly magic constants and bit twiddling on our code. It should be less tempting to write a bad-but-concise GetHashCode implementation such as

class Person
{
    public override int GetHashCode() => FirstName.GetHashCode() + LastName.GetHashCode();
}

Proposal

We should add a HashCode type to enscapulate hash code creation and avoid forcing devs to get mixed up in the messy details. Here is my proposal, which is based off of https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329, with a few minor revisions.

// Will live in the core assembly
// .NET Framework : mscorlib
// .NET Core      : System.Runtime / System.Private.CoreLib
namespace System
{
    public struct HashCode
    {
        public static int Combine<T1>(T1 value1);
        public static int Combine<T1, T2>(T1 value1, T2 value2);
        public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3);
        public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4);
        public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5);
        public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8);

        public void Add<T>(T value);
        public void Add<T>(T value, IEqualityComparer<T> comparer);
        public void AddRange<T>(T[] values);
        public void AddRange<T>(T[] values, int index, int count);
        public void AddRange<T>(T[] values, int index, int count, IEqualityComparer<T> comparer);

        [Obsolete("Use ToHashCode to retrieve the computed hash code.", error: true)]
        public override int GetHashCode();

        public int ToHashCode();
    }
}

Remarks

See @terrajobst's comment at https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329 for the goals of this API; all of his remarks are valid. I would like to point out these ones in particular, however:

  • The API does not need to produce a strong cryptographic hash
  • The API will provide "a" hash code, but not guarantee a particular hash code algorithm. This allows us to use a different algorithm later or use different algorithms on different architectures.
  • The API will guarantee that within a given process the same values will yield the same hash code. Different instances of the same app will likely produce different hash codes due to randomization. This allows us to ensure that consumers cannot persist hash values and accidentally rely on them being stable across runs (or worse, versions of the platform).

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.Numericshelp wanted[up-for-grabs] Good issue for external contributors

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions