Improve get/set performance in HashSet<T>#40106
Conversation
Similar optimization in Dictionary<,> has been ported to HashSet<>.
`_comparer` has been made nullable, and default by `null`. This
makes us possible to check if it's null for faster branches, instead
of comparing it to `EqualityComparer<T>.Default`. Since `_comparer`
is `null` by default, the constructors and the `OnDeserialization`
method have been modified accordingly to save `null` instead of the
`EqualityComparer<T>.Default` if it's been passed/saved. Also, the
places that use `_comparer` directly much do a null check and switch
to `EqualityComparer<T>.Default` instead.
The major optimization is under `Contains` and `AddIfNotPresent`
methods. Basically the code was:
``
// Code uses _comparer
``
Now it becomes:
```
var comparer = _comparer;
if (comparer == null)
{
if (default(T) != null)
{
// Code uses EqualityComparer<T>.Default for de-virtualizing.
}
else
{
var defaultComparer = EqualityComparer<T>.Default;
// Code uses cached defaultComparer
}
}
else
{
// Code uses comparer
}
```
I've intentionally kept the origin implementation and comments as
much as possible, and the optimized code is not exactly the same as
in `Dictionary<T>`. Further micro-optimizations might be possible
but I think it's cleaner to keep it that way in the first step.
The `Remove` method hasn't been optimized since it's complicated,
and not been done in `Dictionary` either. Some other methods like
`AddOrGetLocation` or `InternalIndexOf` are not optimized since
they're only used in less-common scenarios, and not in the scope
of the current issue.
The benchmark shows there're obvious improvements in get and set
operations for all cases, especialy if the item is a value type.
I was actually expecting to see some pull back for "slower" case,
like if we use a custom comparer, but it turns out it's getting
faster too, since we have cached the comparer instead of using
`_comparer` directly in the loop.
The benchmark also shows the optimized `HashSet` has close or
better performance comparing to `Dictionary` in almost all cases,
except for `HashSet<string>` with default comparer. The default
comparer in `Dictionary<string>` is a "non-random" implementation
that performs better. Unfortunately the comparer is internal to
the CoreLib.
The optimized `HashSet<string>` is still faster since it's using
cached comparer of type `EqualityComparer<string>`, which uses
virtual method dispatching instead of interface dispatching. It's
just not as faster as `Dictionary<string, T>` yet.
Fix #36448
|
The benchmark code can be found in https://github.com/JeffreyZhao/HashSetPerf, in which the Result: BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17763.615 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview7-012821
[Host] : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
DefaultJob : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
|
|
The check failed at "Send to Helix". Is it a glitch on check server? Can we re-run the checks? |
|
You should be able to see the test issue by clicking through the log. It's apparently unrelated Http failure. I'll open issue. |
|
Thanks for hte contribution. Do we have sufficient micro benchmark coverage in https://github.com/dotnet/performance ? If not would you consider improving coverage there? This is how we typically measure and prevent regression. cc @adamsitnik |
|
It would not be part of this change, but another pending backport to HashSet is dotnet/coreclr#23591. |
Thanks, I've noticed that now. BTW why it's not part of previous step (compile and test) but "Send to Helix", and what's that step?
It seems we don't have any coverage for
Yes, I've noticed that too. That's what I mean by this:
There're actually more optimizations can be put in |
| private int _lastIndex; | ||
| private int _freeList; | ||
| private IEqualityComparer<T> _comparer = default!; | ||
| private IEqualityComparer<T>? _comparer = default!; |
There was a problem hiding this comment.
Do you still need the !? I don't think so.
There was a problem hiding this comment.
You're right. Actually the whole default! isn't required. Do you think I should fix it now, or commit with other changes (if any) when the review is complete? Thanks.
There was a problem hiding this comment.
Let's do it in a separate PR 😄 -- feel free to start preparing it.
@ViktorHofer would this break serialization scenarios? |
No it doesn't break serialization as during serialization the default equality comparer is used if the comparer is null. The description is a bit misleading but the change is fine. |
After deserialization the `_comparer` will be set to a different instance of default comparer, so we need to use `Equals` to compare `_comparer` with `EqualityComparer<T>.Default` instead of reference comparison by `==`.
Previously we tried to reset `_comparer` in `OnDeserialization` if it's `EqualityComparer<T>.Default`. But after discussion in #40150 we decide to follow the same approach as in `Dictionary<,>` to make it simple.
| Slot[] slots = _slots; | ||
| // see note at "HashSet" level describing why "- 1" appears in for loop | ||
| for (int i = _buckets[hashCode % _buckets.Length] - 1; i >= 0; i = slots[i].next) | ||
|
|
| if (slots[i].hashCode == hashCode && _comparer.Equals(slots[i].value, item)) | ||
| int hashCode = item == null ? 0 : InternalGetHashCode(item.GetHashCode()); | ||
|
|
||
| if (default(T)! != null) // TODO-NULLABLE: default(T) == null warning (https://github.com/dotnet/roslyn/issues/34757) |
There was a problem hiding this comment.
Let's remove this trailing comment.
There was a problem hiding this comment.
@safern, I don't have a strong opinion, but why do you want to remove the comment? We still have this in a bunch of places in coreclr... do you want to remove it from all of those (separate from this PR)?
There was a problem hiding this comment.
This was a new line added in this PR, I was suggesting to not include a comment based on past discussion here: #40651 (comment) -- but I don't have a strong opinion on this either.
If we'd like to remove all the comments and then address the removal of ! later I can happily submit a PR.
| if (_buckets != null) | ||
| { | ||
| int hashCode = InternalGetHashCode(item); | ||
| IEqualityComparer<T> comparer = _comparer ?? EqualityComparer<T>.Default; |
There was a problem hiding this comment.
Do these null checks introduced, impact perf?
There was a problem hiding this comment.
I just realize that the code here may not be optimal.
In Dictionary<T> it uses code like below to get hash code and compare two keys to leverage EqualityComparer<TKey>.Default intrinsic some other methods (e.g. Remove):
// get hash code
comparer?.GetHashCode(key) ?? key.GetHashCode()
// compare keys
comparer?.Equals(entry.key, key) ?? EqualityComparer<TKey>.Default.Equals(entry.key, key)It seems like having saved one more indirection by checking if the comparer is null, but it also requires null check for each call. I'll do a quick benchmark to confirm this micro-optimization and submit another commit.
There was a problem hiding this comment.
The benchmark calls GetHashCode once and Equals twice to simulate a lookup in methods like Remove.
[GenericTypeArguments(typeof(int))]
[GenericTypeArguments(typeof(object))]
public class NullComparerPerf<T>
{
private IEqualityComparer<T> _comparer;
public T Value1;
public T Value2;
public T Key = Activator.CreateInstance<T>();
[Benchmark]
public int AssignDefault()
{
var comparer = _comparer ?? EqualityComparer<T>.Default;
var hashCode = comparer.GetHashCode(Key);
var b1 = comparer.Equals(Value1, Key);
var b2 = comparer.Equals(Value2, Key);
return hashCode;
}
[Benchmark]
public int NullCheck()
{
var comparer = _comparer;
var hashCode = comparer?.GetHashCode(Key) ?? Key.GetHashCode();
var b1 = comparer?.Equals(Value1, Key) ?? EqualityComparer<T>.Default.Equals(Value1, Key);
var b2 = comparer?.Equals(Value2, Key) ?? EqualityComparer<T>.Default.Equals(Value2, Key);
return hashCode;
}
}Benchmark shows for int it takes 75% less time, but for object it takes 25% more:
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17763.805 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
[Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT| Type | Method | Mean | Error | StdDev |
|---|---|---|---|---|
| NullComparerPerf<Int32> | AssignDefault | 6.536 ns | 0.0768 ns | 0.0680 ns |
| NullComparerPerf<Int32> | NullCheck | 1.450 ns | 0.0439 ns | 0.0411 ns |
| NullComparerPerf<Object> | AssignDefault | 27.053 ns | 0.2639 ns | 0.2469 ns |
| NullComparerPerf<Object> | NullCheck | 33.090 ns | 0.5051 ns | 0.4477 ns |
In theory we can do the same thing for Remove as in lookup but that's what Dictionary<,> currently doing so I just keep it the same.
safern
left a comment
There was a problem hiding this comment.
Overall, looks good. Are you including benchmark tests into https://github.com/dotnet/performance/tree/master/src/benchmarks?
I'd like to get @stephentoub input on this as well.
|
@JeffreyZhao any update on this? |
…insic. Same to the Equals calls.
@safern I'll move my benchmark to performance project when I finish learning how to do it properly. |
|
@JeffreyZhao thank you for continuing to drive this along, it is likely to be a significant customer benefit. |
|
@adamsitnik for that question. |
Make methods like "Remove" faster.
|
@danmosemsft @safern Find some more information regarding
Anyway, I've added some more benchmarks in my fork of dotnet/performance: https://github.com/JeffreyZhao/performance/tree/Collection-Remove-Benchmarks I've added two new set of benchmarks ( I've also found the method call of I've made three builds of CoreFx for benchmark:
Expand to see benchmark results.BenchmarkDotNet=v0.11.5.1159-nightly, OS=Windows 10.0.17763.805 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
[Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
Job-QFCKJK : .NET Core ? (CoreCLR 5.0.19.38002, CoreFX 5.0.19.51701), X64 RyuJIT
Job-TMFJJO : .NET Core ? (CoreCLR 5.0.19.38002, CoreFX 5.0.19.51001), X64 RyuJIT
Job-FUBAAA : .NET Core ? (CoreCLR 5.0.19.38002, CoreFX 5.0.19.51701), X64 RyuJIT
|
|
Cc @adamsitnik |
|
It seems like |
Yes, it's explained by the benchmark I posted above. Basically the original code is like: private readonly IEqualityComparer<T> _comparer = EqualityComparer<T>.Default; // not null
int hashCode = _comparer.GetHashCode(key); // GetHashCode
bool equals = _comparer.Equals(key, item); // EqualsAnd after the "improvement" (with regression for private readonly IEqualityComparer<T>? _comparer; // nullable
var comparer = _comparer;
int hashCode = comparer?.GetHashCode(key) ?? key.GetHashCode(); // GetHashCode
bool equals = comparer?.Equals(key, item) ?? EqualityComparer<T>.Default.Equals(key, item); // EqualsFor value types like This code pattern is actually used by current There's no "quick" way to avoid the trade-off by minimum code changes, and initially this pull request is for get/set operations so the More benchmarksSince I think the dotnet/performance project doesn't have enough benchmarks for https://github.com/JeffreyZhao/performance/tree/Improve-Collection-Benchmarks I've added There're two reasons for adding those two types:
Benchmark result for
|
| Type | Method | Toolchain | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CopyAndRemove<CustomObject> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 14.798 us | 0.1527 us | 0.1428 us | 14.774 us | 14.552 us | 15.149 us | 0.98 | 3.4624 | - | - | 10536 B |
| CopyAndRemove<CustomObject> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 15.093 us | 0.1118 us | 0.1045 us | 15.099 us | 14.883 us | 15.267 us | 1.00 | 3.4524 | - | - | 10536 B |
| CopyAndRemove<CustomValue> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 6.422 us | 0.0536 us | 0.0501 us | 6.426 us | 6.331 us | 6.495 us | 0.67 | 4.5900 | - | - | 8456 B |
| CopyAndRemove<CustomValue> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 9.660 us | 0.0951 us | 0.0843 us | 9.676 us | 9.537 us | 9.813 us | 1.00 | 4.5685 | - | - | 8456 B |
| CopyAndRemove<Int32> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 6.655 us | 0.0662 us | 0.0619 us | 6.652 us | 6.561 us | 6.770 us | 0.70 | 4.5833 | - | - | 8456 B |
| CopyAndRemove<Int32> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 9.470 us | 0.1155 us | 0.1080 us | 9.474 us | 9.321 us | 9.702 us | 1.00 | 4.5945 | - | - | 8456 B |
| CopyAndRemove<String> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 28.187 us | 0.2034 us | 0.1903 us | 28.169 us | 27.834 us | 28.509 us | 1.04 | 3.4091 | - | - | 10536 B |
| CopyAndRemove<String> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 27.064 us | 0.2087 us | 0.1952 us | 27.031 us | 26.824 us | 27.377 us | 1.00 | 3.4542 | - | - | 10536 B |
| CreateAddAndRemove<CustomObject> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 32.130 us | 0.3202 us | 0.2839 us | 32.176 us | 31.388 us | 32.539 us | 0.95 | 10.9148 | - | - | 34480 B |
| CreateAddAndRemove<CustomObject> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 33.922 us | 0.3267 us | 0.3056 us | 34.029 us | 33.462 us | 34.349 us | 1.00 | 10.9816 | - | - | 34480 B |
| CreateAddAndRemove<CustomValue> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 16.088 us | 0.1815 us | 0.1515 us | 16.134 us | 15.771 us | 16.313 us | 0.70 | 8.7617 | - | - | 27712 B |
| CreateAddAndRemove<CustomValue> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 23.018 us | 0.2179 us | 0.2038 us | 23.010 us | 22.690 us | 23.398 us | 1.00 | 8.7719 | - | - | 27712 B |
| CreateAddAndRemove<Int32> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 16.182 us | 0.1129 us | 0.1056 us | 16.178 us | 15.981 us | 16.388 us | 0.71 | 8.7810 | - | - | 27712 B |
| CreateAddAndRemove<Int32> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 22.651 us | 0.2656 us | 0.2485 us | 22.562 us | 22.274 us | 23.200 us | 1.00 | 8.7356 | - | - | 27712 B |
| CreateAddAndRemove<String> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 56.573 us | 0.5230 us | 0.4892 us | 56.661 us | 55.265 us | 57.185 us | 1.01 | 10.8303 | - | - | 34480 B |
| CreateAddAndRemove<String> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 55.952 us | 0.6334 us | 0.5925 us | 56.232 us | 54.980 us | 56.727 us | 1.00 | 10.8599 | - | - | 34480 B |
| RemoveFalse<CustomObject> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 8.866 us | 0.0474 us | 0.0444 us | 8.867 us | 8.791 us | 8.938 us | 0.87 | - | - | - | - |
| RemoveFalse<CustomObject> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 10.191 us | 0.1157 us | 0.1082 us | 10.148 us | 10.058 us | 10.385 us | 1.00 | - | - | - | - |
| RemoveFalse<CustomValue> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 3.196 us | 0.0306 us | 0.0271 us | 3.201 us | 3.151 us | 3.233 us | 0.33 | - | - | - | - |
| RemoveFalse<CustomValue> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 9.779 us | 0.1318 us | 0.1233 us | 9.767 us | 9.633 us | 10.031 us | 1.00 | - | - | - | - |
| RemoveFalse<Int32> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 3.235 us | 0.0436 us | 0.0408 us | 3.243 us | 3.165 us | 3.294 us | 0.47 | - | - | - | - |
| RemoveFalse<Int32> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 6.937 us | 0.0672 us | 0.0628 us | 6.955 us | 6.857 us | 7.036 us | 1.00 | - | - | - | - |
| RemoveFalse<String> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 20.515 us | 0.2362 us | 0.2209 us | 20.584 us | 20.143 us | 20.829 us | 1.00 | 0.00 | - | - | - |
| RemoveFalse<String> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 20.817 us | 0.1720 us | 0.1609 us | 20.833 us | 20.534 us | 21.116 us | 1.01 | 0.01 | - | - | - |
The benchmark result shows all cases become faster with the optimization, except CopyAndRemove<String>, which is a little bit slower. I haven't had an answer for that, especially CopyAndRemove<CustomObject> is always faster than original. But even for CopyAndRemove<String> it's actually quite close, and I've seen it with smaller gap, or even a little bit faster in some runs. I'll keep looking into this.
Benchmark result for TryGetValue method
There's no benchmarks for TryGetValue for HashSet, although there're some for other collections like Dictionary<,>. I've added benchmarks into TryGetValueTrue and TryGetValueFalse for HashSet.
dotnet run -c Release -f netcoreapp3.0 --filter *TryGetValue*.HashSet --join --coreRun \
C:\Projects\dotnet\netcoreapp-original\coreRun.exe \
C:\Projects\dotnet\netcoreapp-hashset\coreRun.exe
Expand to see benchmark result for TryGetValue method.
BenchmarkDotNet=v0.11.5.1159-nightly, OS=Windows 10.0.17763.805 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
[Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
Job-YSRJTI : .NET Core ? (CoreCLR 5.0.19.38002, CoreFX 5.0.19.52301), X64 RyuJIT
Job-JTOUAA : .NET Core ? (CoreCLR 5.0.19.38002, CoreFX 5.0.19.51701), X64 RyuJIT| Type | Method | Toolchain | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TryGetValueFalse<CustomObject, CustomObject> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 11.983 us | 0.0940 us | 0.0833 us | 11.971 us | 11.880 us | 12.115 us | 0.90 | - | - | - | - |
| TryGetValueFalse<CustomObject, CustomObject> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 13.324 us | 0.1154 us | 0.1079 us | 13.323 us | 13.179 us | 13.509 us | 1.00 | - | - | - | - |
| TryGetValueFalse<CustomValue, CustomValue> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 6.120 us | 0.1153 us | 0.1078 us | 6.113 us | 5.945 us | 6.278 us | 0.63 | - | - | - | - |
| TryGetValueFalse<CustomValue, CustomValue> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 9.688 us | 0.1215 us | 0.1136 us | 9.683 us | 9.525 us | 9.940 us | 1.00 | - | - | - | - |
| TryGetValueFalse<Int32, Int32> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 6.190 us | 0.0730 us | 0.0648 us | 6.199 us | 6.025 us | 6.276 us | 0.78 | - | - | - | - |
| TryGetValueFalse<Int32, Int32> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 7.913 us | 0.0741 us | 0.0656 us | 7.898 us | 7.835 us | 8.048 us | 1.00 | - | - | - | - |
| TryGetValueFalse<String, String> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 22.196 us | 0.1752 us | 0.1638 us | 22.118 us | 22.011 us | 22.505 us | 0.86 | - | - | - | - |
| TryGetValueFalse<String, String> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 25.755 us | 0.1675 us | 0.1485 us | 25.719 us | 25.496 us | 26.012 us | 1.00 | - | - | - | - |
| TryGetValueTrue<CustomObject, CustomObject> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 15.858 us | 0.1389 us | 0.1299 us | 15.830 us | 15.727 us | 16.191 us | 0.98 | - | - | - | - |
| TryGetValueTrue<CustomObject, CustomObject> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 16.138 us | 0.1624 us | 0.1520 us | 16.160 us | 15.836 us | 16.384 us | 1.00 | - | - | - | - |
| TryGetValueTrue<CustomValue, CustomValue> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 7.272 us | 0.0645 us | 0.0603 us | 7.268 us | 7.198 us | 7.415 us | 0.73 | - | - | - | - |
| TryGetValueTrue<CustomValue, CustomValue> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 9.973 us | 0.1102 us | 0.1031 us | 9.955 us | 9.817 us | 10.141 us | 1.00 | - | - | - | - |
| TryGetValueTrue<Int32, Int32> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 7.061 us | 0.0953 us | 0.0891 us | 7.029 us | 6.936 us | 7.236 us | 0.70 | - | - | - | - |
| TryGetValueTrue<Int32, Int32> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 10.078 us | 0.0540 us | 0.0505 us | 10.074 us | 9.998 us | 10.161 us | 1.00 | - | - | - | - |
| TryGetValueTrue<String, String> | HashSet | \netcoreapp-hashset\coreRun.exe | 512 | 24.457 us | 0.2453 us | 0.2295 us | 24.478 us | 24.053 us | 24.884 us | 0.86 | - | - | - | - |
| TryGetValueTrue<String, String> | HashSet | \netcoreapp-original\coreRun.exe | 512 | 28.455 us | 0.2050 us | 0.1818 us | 28.420 us | 28.172 us | 28.839 us | 1.00 | - | - | - | - |
The benchmark result shows that it's always faster with the optimization, including the cases for string.
Before end
Currently only the SymmetricExceptWith has the non-optimized implementation. Do I need to do the same thing? It doesn't have benchmark and I'm quite sure it's not used a lot.
It a little bit strange to me that why there're several duplications in the code. For example, Contains has it's own implementation, which is so close to InternalIndexOf that it can be implemented as return InternalIndexOf(item) >= 0;. Probably it's a about saving a call instruction, isn't it?
Yes, Dictionary is so perf critical that the perf justifies that code duplication. cc @adamsitnik for this PR. Thanks for continuing to move it forward. |
|
(Previous comment removed as I realized I was running with a different data set.) Today's runs of Expand to see benchmark results.
We can see that the optimized version is always faster. |
|
@jkotas any feedback? |
| return hashCode & Lower31BitMask; | ||
| } | ||
|
|
||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Does this need really need to be marked with aggressive inlining?
There was a problem hiding this comment.
May not necessary for this one, but just keep it the same with other Internal* methods.
| } | ||
|
|
||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private bool InternalEquals(T x, T y, IEqualityComparer<T>? comparer) |
There was a problem hiding this comment.
Wrappers like this add overhead for larger structs; even with AggressiveInlining.
There was a problem hiding this comment.
The fine-tuned Dictionary implementation avoids them for a good reason, so you may want to avoid them too - if you are shooting for HashSet to be fine-tuned in the same way.
There was a problem hiding this comment.
I see. Actually it's only used in SymmetricExceptWith now. All the other usages (in Remove, TryGetValue etc.) have been replace by the optimization. I'll manually inline the method.
|
Thanks @JeffreyZhao for your work and patience here. Let's try to get this merged before 11/13 so we don't have to move it to the new repo: dotnet/announcements#133 |
safern
left a comment
There was a problem hiding this comment.
LGTM, thanks @JeffreyZhao -- @jkotas do you have any other feedback.
| /// <returns>hash code</returns> | ||
| private int InternalGetHashCode(T item) | ||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static int InternalGetHashCode(T item, IEqualityComparer<T>? comparer) |
There was a problem hiding this comment.
Wrappers like this add overhead for larger structs; even with AggressiveInlining.
(similar comment as the other one you have fixed already)
|
Dictionary keeps the hashCode as unsigned to get another bit of performance. Do the same for HashSet? |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private int InternalGetHashCode(int hashCode) | ||
| { | ||
| return hashCode & Lower31BitMask; |
There was a problem hiding this comment.
E.g. this masking is not necessary once you switch to uint hashCode.
There was a problem hiding this comment.
@jkotas we mentioned that above, I believe @JeffreyZhao was thinking of looking at that next. #40106 (comment)
|
Merged. Thank you @JeffreyZhao ! Normally PR's do not take so long, but we are specially careful with such core types that are performance sensitive. If you are still interested, it would be good to bring over dotnet/coreclr#23591 |
|
Thanks for everyone's help for reviewing! @danmosemsft Yes, I will do the backport after the repository consolidation. There's actually another improvement on my list is to use |
|
Sounds good. Do you think we have all the right coverage in the perf repo now? |
I think we should add more cases for removal operations. I've added some two benchmarks I've made some changes into forked repository in Improve-Collection-Benchmarks branch. Do you think it's a good time to create new PR as the repo consolidation coming up? I don't find perf repo being mentioned in the announcement, so I assume it's not affected, is it? |
|
The performance repo is not part of the consolidation and will not be affected. Yes a PR there would be great. |
|
@danmosemsft A PR (dotnet/performance#1010) has been created. Thanks |
Similar optimization in Dictionary<,> has been ported to HashSet<>.
_comparerhas been made nullable, and default bynull. Thismakes us possible to check if it's null for faster branches, instead
of comparing it to
EqualityComparer<T>.Default. Since_compareris
nullby default, the constructors and theOnDeserializationmethod have been modified accordingly to save
nullinstead of theEqualityComparer<T>.Defaultif it's been passed/saved. Also, theplaces that use
_comparerdirectly much do a null check and switchto
EqualityComparer<T>.Defaultinstead.The major optimization is under
ContainsandAddIfNotPresentmethods. Basically the code was:
// Code uses _comparerNow it becomes:
I've intentionally kept the origin implementation and comments as
much as possible, and the optimized code is not exactly the same as
in
Dictionary<T>. Further micro-optimizations might be possiblebut I think it's cleaner to keep it that way in the first step.
The
Removemethod hasn't been optimized since it's complicated,and not been done in
Dictionaryeither. Some other methods likeAddOrGetLocationorInternalIndexOfare not optimized sincethey're only used in less-common scenarios, and not in the scope
of the current issue.
The benchmark shows there're obvious improvements in get and set
operations for all cases, especialy if the item is a value type.
I was actually expecting to see some pull back for "slower" case,
like if we use a custom comparer, but it turns out it's getting
faster too, since we have cached the comparer instead of using
_comparerdirectly in the loop.The benchmark also shows the optimized
HashSethas close orbetter performance comparing to
Dictionaryin almost all cases,except for
HashSet<string>with default comparer. The defaultcomparer in
Dictionary<string>is a "non-random" implementationthat performs better. Unfortunately the comparer is internal to
the CoreLib.
The optimized
HashSet<string>is still faster since it's usingcached comparer of type
EqualityComparer<string>, which usesvirtual method dispatching instead of interface dispatching. It's
just not as faster as
Dictionary<string, T>yet.Fix #36448