Skip to content

✨ Implement SegmentedHashSet<T>#54574

Merged
sharwell merged 12 commits intodotnet:mainfrom
sharwell:segmented-hashset
May 12, 2022
Merged

✨ Implement SegmentedHashSet<T>#54574
sharwell merged 12 commits intodotnet:mainfrom
sharwell:segmented-hashset

Conversation

@sharwell
Copy link
Copy Markdown
Contributor

@sharwell sharwell commented Jul 2, 2021

This is a drop-in replacement for HashSet<T>, with equivalent semantics and algorithmic complexity but does not require the use of the Large Object Heap.

@sharwell sharwell requested a review from a team as a code owner July 2, 2021 21:56
@ghost ghost added the Area-Compilers label Jul 2, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Jul 30, 2021
@sharwell
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-compiler for reviews

@sharwell
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-compiler for reviews

1 similar comment
@sharwell
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-compiler for reviews

@RikkiGibson RikkiGibson self-assigned this Aug 25, 2021
@sharwell
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-compiler for reviews


public int GetHashCode(SegmentedHashSet<T>? obj)
{
var hashCode = 0; // default to 0 for null/empty set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would System.HashCode be a good fit here here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use System.HashCode, so this question is probably better for dotnet/runtime.

{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is not nameof(collection) (applies to all ExceptionArgument usages)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use nameof(collection), so this question is probably better for dotnet/runtime.

}

// Equals method for the comparer itself.
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>;
public override bool Equals([NotNullWhen(true)] object? obj) => obj is SegmentedHashSetEqualityComparer<T>;

This was added in dotnet/runtime:

https://github.com/dotnet/runtime/blob/64245ab3966abde394767fbd6531e7edbea2802e/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L75

Note: requires using System.Diagnostics.CodeAnalysis;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of this file. The 5.0.7 tag does not contain this attribute; it will be added during the update step if/when a newer version is adopted.

}
else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
{
_comparer = (IEqualityComparer<T>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these special cases eliminated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They require porting a huge amount of platform-specific culture code. The same change was adopted for SegmentedDictionary<TKey, TValue>.

@RikkiGibson
Copy link
Copy Markdown
Member

It looks like the important commit here is c0c39b1.

@sharwell
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-compiler for second review

@dotnet dotnet deleted a comment from azure-pipelines bot May 11, 2022
@dotnet dotnet deleted a comment from azure-pipelines bot May 11, 2022
return false;
}

var defaultComparer = EqualityComparer<T>.Default;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this to line 33 to avoid .Default on unnecessary code paths

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would typically avoid this primarily so we can keep the file as close to the reference source as possible (i.e. if the code changes in dotnet/runtime, we would adopt that change in the next update here per the link in the comment at the top of the file).

}
}

return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if x has every element in y and one additional element? That will pass the test above and fall through to return true even though the x has more elements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the expected behavior here. This is not documented on docs.microsoft.com for HashSet<T>.CreateSetComparer(), and is the same behavior seen here:
https://github.com/dotnet/runtime/blob/6ca8c9bc0c4a5fc1082c690b6768ab3be8761b11/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L25-L51

Copy link
Copy Markdown
Member

@jaredpar jaredpar May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems like a bug in the original source code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet/runtime#69218

Can we reference that bug here? If they decide to fix that in their source we can follow up with a fix here.

@sharwell sharwell enabled auto-merge May 12, 2022 14:16
@sharwell sharwell merged commit ef54147 into dotnet:main May 12, 2022
@ghost ghost added this to the Next milestone May 12, 2022
@sharwell sharwell deleted the segmented-hashset branch May 12, 2022 18:43
@Cosifne Cosifne modified the milestones: Next, 17.3 P2 May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants