Skip to content

Improve collection benchmarks#1010

Open
JeffreyZhao wants to merge 4 commits intodotnet:mainfrom
JeffreyZhao:Improve-Collection-Benchmarks
Open

Improve collection benchmarks#1010
JeffreyZhao wants to merge 4 commits intodotnet:mainfrom
JeffreyZhao:Improve-Collection-Benchmarks

Conversation

@JeffreyZhao
Copy link

I've made some changes to the collection benchmarks when I was working on HashSet improvement (dotnet/corefx#40106).

Currently there's only one benchmark CreateAddAndRemove for HashSet.Remove. The problem of CreateAddAndRemove is that it has the “Create" and "Add" part which usually take more time than the "Remove" part so it doesn't like a proper benchmark for removal operations to me.

I've created two additional benchmarks RemoveFalse and CopyAndRemove. RemoveFalse has no side-affect cause nothing's been removed. CopyAndRemove would create a copy of collection from constructor and remove all the items from it one by one. Making a self-copy from constructor is much faster than through Add method (for most collections) so the benchmark is more focusing on removal operations.

I also introduced two new types CustomValue (a struct) and CustomObject (a class). I want to make sure the optimizations applied to custom structs (by using CustomValue) and the current reference generic argument String for collection benchmarks doesn't seem like a good target to me as it has a relatively complex/slow implementation. Those new types have the simplest implementations that meet the minimum requirement of benchmark so the benchmark result would be sorely affected by collection implementation.

Those two new types have been applied to collection benchmarks (not limit to new ones mentioned above) as generic arguments.

HashSet<TKey> collection = _hashSet;
TKey[] notFound = _notFound;
for (int i = 0; i < notFound.Length; i++)
result ^= collection.TryGetValue(notFound[i], out _);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI fails with:

 error CS1061: 'HashSet<TKey>' does not contain a definition for 'TryGetValue' and no accessible extension method 'TryGetValue' accepting a first argument of type 'HashSet<TKey>'

The method got introduced int .NET 4.7.2.

When we were starting the performance repo project, we decided to target .NET 4.6.1 mostly to allow users to open the solution file with old VS. Since using netcoreapp3.0 requires VS 2019, most our users most probably already have .NET 4.7.2|4.8 SDK installed.

@DrewScoggins @billwert should we update the micro benchmarks project to target net48 instead of net461?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffreyZhao could you please use #if !NETFRAMEWORK for benchmarks that call HashSet.TryGetValue? this should fix the failing CI

#if !NETFRAMEWORK
        [Benchmark]
        ...
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the long term Adam I am fine with moving the targeting to .NET 4.8.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I still need to add #if !NETFRAMEWORK around TryGetValue benchmarks?

@adamsitnik
Copy link
Member

Hi @JeffreyZhao

thank you for excellent benchmarks and your improvements in HashSet<T>!

Could you please share results of a benchmark where the performance of int and string are much different than CustomValue and CustomObject? Adding these two generic type arguments doubles the number of System.Collections benchmarks (and the time it takes to run them). I want to make sure that it's worth it.

@DrewScoggins
Copy link
Member

I have a question about the CopyAndRemove benchmarks that are being added? Is that a common scenario that we see with user code? If it is than I think the test makes a lot of sense, but if all we are trying to do is measure the time for removing objects from various collection types it seems like we should try and move the copying of the data outside of the timing loop.

@JeffreyZhao
Copy link
Author

@DrewScoggins Is that a common scenario that we see with user code? It seems like we should try and move the copying of the data outside of the timing loop.

No, I don't think it's a common scenarios.

A benchmark is required to be no side-effect (so that BenchmarkDotNet could run it repeatedly to get an accurate result), so we have to make a copy before actual removal from a collection. I believe it's the same reason that currently we don't have a dedicated benchmark for removal operation, instead, we have to use CreateAddAndRemove for the same reason. I've tried to applied [IterationSetup] but unfortunately BenchmarkDotNet will only run the actual benchmark once if there's a setup for a specific benchmark method.

However, we have RemoveFalse for removing items are not included in the collection, since trying to remove something not there has no side-effect.

@JeffreyZhao
Copy link
Author

JeffreyZhao commented Nov 6, 2019

@adamsitnik

The benchmark results forint and CustomValue are close in all cases. The question is that we only know it when we have the benchmark. My concern is that we don't know if code works the same good or bad for int (a special type close to the runtime), and for another custom struct (even it has the same size as an int, e.g., CustomValue).

The case for string and CustomObject is more interesting since sometimes we treat string differently in .NET.

For example:

public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
{
    if (capacity < 0) ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
    if (capacity > 0) Initialize(capacity);
    if (comparer != EqualityComparer<TKey>.Default)
    {
        _comparer = comparer;
    }

    if (typeof(TKey) == typeof(string) && _comparer == null)
    {
        // To start, move off default comparer for string which is randomised
        _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.Default;
    }
}

In Dictoinary we'll use NonRandomizedStringEqualityComparer (which runs faster than the default comparer) instead of EqualityComparer<String>.Default for string. It means we'll never run into the code branch prepared for a reference type with no comparer specified. So at least for Dictionary class, a benchmark for string cannot represent another reference type, and existing benchmarks haven't covered that case at all.

Also I can imagine more cases that hasn't been covered by existing benchmarks. For example, larger structs.

We know copying large structs (frequently) hurts performance, but sometimes we would make copies in code naturally (e.g., using temporary variable, which is considered a good code practice in many places), and we need to modify the code as accessing struct array by index repeatedly, or using ref/in etc. Using large struct in collection might not be a typical scenario for everyday use, but I do remember seeing code like using 24-byte struct as the key of a dictionary in many occasions, especially now ValueTuple has been included in .NET so people tend to use them more often.

So I have feelings that using fundamental types like int or string (which are too close to the runtime) only might not be enough.

@JeffreyZhao
Copy link
Author

JeffreyZhao commented Nov 7, 2019

And regarding the benchmark result. Here's an example of ContainsTrue for HashSet. As I mentioned earlier, String and CustomObject doesn't go to same code branch in Dictionary, so I use HashSet as the example here.

dotnet run -c Release -f netcoreapp3.0 --filter "*ContainsTrue*.HashSet" --join
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.805 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  Job-HEYZLQ : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Type Method Size Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
ContainsTrue<CustomValue> HashSet 512 8.168 us 0.1614 us 0.1348 us 8.194 us 7.948 us 8.499 us - - - -
ContainsTrue<Int32> HashSet 512 7.858 us 0.1499 us 0.1540 us 7.813 us 7.630 us 8.199 us - - - -
ContainsTrue<String> HashSet 512 27.092 us 0.2991 us 0.2798 us 27.094 us 26.454 us 27.549 us - - - -
ContainsTrue<CustomObject> HashSet 512 14.858 us 0.1473 us 0.1377 us 14.848 us 14.651 us 15.163 us - - - -

So int and CustomValue are close but CustomObject is 45% faster than string. Since they're going to the same code branch, the difference can be considered as the cost of GetHashCode and Equals implementations for string. So as I stated in the initial post, when we using string we're not only benchmarking collection operations.

@adamsitnik
Copy link
Member

A benchmark is required to be no side-effect (so that BenchmarkDotNet could run it repeatedly to get an accurate result), so we have to make a copy before actual removal from a collection. I believe it's the same reason that currently we don't have a dedicated benchmark for removal operation, instead, we have to use CreateAddAndRemove for the same reason.

I agree, it's more a BenchmarkDotNet design limitation

@adamsitnik
Copy link
Member

The benchmark results for int and CustomValue are close in all cases.
We know copying large structs (frequently) hurts performance

I think that we should then increase the size of CustomValue to something like 3 long fields so we are going to have 4 scenarios covered:

  • int - a very commonly used primitive type, small copying does not hurt
  • CustomValue - big value type, copying hurts
  • string - a very commonly used reference type with dedicated execution path for Equals and GetHashCode
  • CustomObject - simple reference type that implements IEqutable

Increasing the size of CustomValue would add value to my Linux vs Windows performance investigation which has already shown some perf issues with structs on Linux.

@JeffreyZhao
Copy link
Author

@adamsitnik

May I ask why BenchmarkDotNet choose to run a benchmark only once if there's a [IterationSetup] for it?

I think that we should then increase the size of CustomValue to something like 3 long fields so we are going to have 4 scenarios covered:

I think that's a good idea. And what do you think of making the names more obvious, like, LargeValue (or even Value24Bytes) and SmallObject?

result &= copy.Remove(items[i]);

if (!result)
throw new InvalidOperationException("All items must be in the collection.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be throwing inside the timing loop. We don't want to be doing any unnecessary work inside of the timing loop, and we shouldn't be doing correctness checks in performance tests. I know that this is a pattern in other parts of the benchmarks, but we should be working to remove those as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why we shouldn't do that? The check is per iteration instead of per inner-loop of each Remove call so it shouldn't affect the benchmark result. The code is to ensure that the data prepared is what we want, since like RemoveTrue and RemoveFalse are two distinct cases that we want to test, so IMHO the code should be able catch the issue here.

HashSet<TKey> collection = _hashSet;
TKey[] notFound = _notFound;
for (int i = 0; i < notFound.Length; i++)
result ^= collection.TryGetValue(notFound[i], out _);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the long term Adam I am fine with moving the targeting to .NET 4.8.

Base automatically changed from master to main March 18, 2021 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants