Adding benchmarks covering float.IsNaN and double.IsNaN by tannergooding · Pull Request #952 · dotnet/performance

tannergooding · 2019-10-17T23:17:42Z

This just adds basic benchmarks for float.IsNaN and double.IsNaN.

adamsitnik

LGTM, thank you!

adamsitnik · 2019-10-18T12:27:10Z

src/benchmarks/micro/corefx/System.Runtime/Perf.Double.cs

+
+            bool result = false;
+
+            for (int i = 0; i < 1000000; i++)


In general, we try to avoid loops in the benchmarks: https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Loops

But I remember our discussion from the past when you have mentioned that JIT might in future remember the results of some Math operations, so testing the values from 0 to 1000000 is a good idea here. 👍

Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here.

Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.

I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.

Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here

Are you sure? I've tried it myself it was working fine.

Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.
I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.

BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.

Some time ago I've created the following example to show the "evolution" of such a benchmark, where as an example I've used Math.Abs. The last method (SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll) is more or less what BDN does.

using System; using System.Diagnostics; using System.Runtime.CompilerServices; using System.Threading; using BenchmarkDotNet.Attributes; namespace BenchmarkDotNet.Samples { public class MathNano { [Benchmark] [Arguments(-1.0)] public double AbsDoubleBenchmark(double value) => Math.Abs(value); } public class StepByStep { private double _doubleHolder; public double AbsDoubleBenchmark(double value) => Math.Abs(value); public double Empty(double value) => value; public double SimpleLoop_Hardcoded(long operations) { double result = 0; const double hardcoded = -1.0; for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = AbsDoubleBenchmark(hardcoded); actual.Stop(); double total = actual.Elapsed.TotalMilliseconds; double perOperation = total / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Hardcoded {nanosecondPerOperation} ns/op"); } Console.WriteLine(); Console.WriteLine(); Console.WriteLine(); return result; } public double SimpleLoop_Overhead_Hardcoded(long operations) { double result = 0; const double hardcoded = -1.0; for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = AbsDoubleBenchmark(hardcoded); actual.Stop(); var overhead = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = Empty(hardcoded); overhead.Stop(); double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds; double perOperation = diff / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Overhead_Hardcoded {nanosecondPerOperation} ns/op"); } Console.WriteLine(); Console.WriteLine(); Console.WriteLine(); return result; } public double SimpleLoop_Overhead_Passed(long operations, double value) { double result = 0; for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = AbsDoubleBenchmark(value); actual.Stop(); var overhead = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = Empty(value); overhead.Stop(); double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds; double perOperation = diff / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Overhead_Passed {nanosecondPerOperation} ns/op"); } Console.WriteLine(); Console.WriteLine(); Console.WriteLine(); return result; } [MethodImpl(MethodImplOptions.NoInlining)] public double AbsDoubleBenchmarkNoInline(double value) => Math.Abs(value); [MethodImpl(MethodImplOptions.NoInlining)] public double EmptyNoInline(double value) => value; public double SimpleLoop_Overhead_Passed_NoInline(long operations, double value) { double result = 0; for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = AbsDoubleBenchmarkNoInline(value); actual.Stop(); var overhead = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) result = EmptyNoInline(value); overhead.Stop(); double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds; double perOperation = diff / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline {nanosecondPerOperation} ns/op"); } Console.WriteLine(); Console.WriteLine(); Console.WriteLine(); return result; } public void SimpleLoop_Overhead_Passed_NoInline_Volatile(long operations, double value) { for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); actual.Stop(); var overhead = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations; innerIteration++) Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); overhead.Stop(); double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds; double perOperation = diff / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile {nanosecondPerOperation} ns/op"); } Console.WriteLine(); Console.WriteLine(); Console.WriteLine(); } public void SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll(long operations, double value) { for (int iteration = 0; iteration < 20; iteration++) { var actual = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++) { Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); } actual.Stop(); var overhead = Stopwatch.StartNew(); for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++) { Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); } overhead.Stop(); double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds; double perOperation = diff / operations; double nanosecondPerOperation = perOperation * 1_000_000.0; Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll {nanosecondPerOperation} ns/op"); } } } }

Are you sure? I've tried it myself it was working fine.

Positive. Changing the test to just do:

[Benchmark] [Arguments(0.0)] [Arguments(double.NaN)] public bool IsNaN(double value) => double.IsNaN(value);

Results in:

| Method | Toolchain | value | Mean | Error | StdDev | Median | Min | Max | Ratio | MannWhitney(3%) | Gen 0 | Gen 1 | Gen 2 | Allocated | |------- |--------------------------------------------------------------------------- |------ |---------:|----------:|----------:|---------:|---------:|---------:|------:|---------------- |------:|------:|------:|----------:| | IsNaN | \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | NaN | 1.238 ns | 0.0012 ns | 0.0010 ns | 1.238 ns | 1.236 ns | 1.239 ns | 0.99 | Same | - | - | - | - | | IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | NaN | 1.254 ns | 0.0019 ns | 0.0016 ns | 1.254 ns | 1.252 ns | 1.257 ns | 1.00 | Base | - | - | - | - | | | | | | | | | | | | | | | | | | IsNaN | \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | 0 | 1.237 ns | 0.0029 ns | 0.0026 ns | 1.237 ns | 1.235 ns | 1.244 ns | 0.99 | Same | - | - | - | - | | IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | 0 | 1.255 ns | 0.0029 ns | 0.0024 ns | 1.255 ns | 1.252 ns | 1.260 ns | 1.00 | Base | - | - | - | - |

It also sometimes results in the new implementation being reported as slower, there are often 4-6 outliers being removed, and BDN sometimes complains about the results not being reported correctly.

BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.

Yes, I understand this part, but I was referring to IsNaN being inlined into the thing marked [Benchmark].

The root problem here is that the core of IsNaN(double value) takes ~5 cycles to execute. So the method prologue/epilogue and the call itself take significantly longer to execute than the thing we want to be measuring. This functionally means that BDN is attempting to determine an inner iteration count based on what is functionally just "noise" (because the amount of time the "core" method takes to execute is significantly less than the precision of the timer 😄).

So, the only way to correctly measure the cost of something like IsNaN or Sqrt is to "boost" the numbers such that thing you are testing takes at least longer than a single tick of the hardware timer.

EgorBo · 2019-10-19T18:39:13Z

@tannergooding @adamsitnik
can we at least make something like

[MethodImpl(MethodImplOptions.NoInlining)]
static int GetIterations() => 100000;

This new benchmark is simply xor eax eax for Mono-LLVM...

tannergooding · 2019-10-19T18:43:58Z

This new benchmark is simply xor eax eax for Mono-LLVM.

That sounds like a bug that needs to be addressed in Mono-LLVM. This function is an explicit IsNaN check and it should not be optimized that way.

There needs to be some way to block fast math optimizations for certain methods, if nothing else. Otherwise, you flip a switch and can break arbitrary code that isn't yours.

EgorBo · 2019-10-19T18:47:35Z

@tannergooding it's not the fast math mode. IsNaN is now inlined as x != x and I guess LLVM simply pre-calculated the whole loop as it's simple.

EgorBo · 2019-10-19T18:50:08Z

@tannergooding here is the IR we emit: https://godbolt.org/z/Crz1o7

tannergooding · 2019-10-19T18:50:45Z

IsNaN is now inlined as x != x and I guess LLVM simply pre-calculated the whole loop as it's simple.

That still sounds like a bug, this method takes an arbitrary input (testing both 0.0 and NaN, depending on the call) and so it should be impossible for LLVM to optimize to either or. At best, it could optimize it to a single IsNaN check and not always return false.

EgorBo · 2019-10-19T18:54:03Z

@tannergooding C++ compilers seem do the same: https://godbolt.org/z/huc8am (without the unsafe math)

tannergooding · 2019-10-19T18:56:19Z

I see the issue. The code should be doing result |= rather than result &=

&= is optimized away because it starts as false, so it can never become true.

EgorBo · 2019-10-19T19:05:21Z

@tannergooding yeah, llvm is smart :) and it did in just one pass (sparse conditional constant propogation):

Adding benchmarks covering float.IsNaN and double.IsNaN

f44ea62

tannergooding requested a review from adamsitnik October 17, 2019 23:17

tannergooding mentioned this pull request Oct 17, 2019

Improve codegen for IsNan dotnet/coreclr#27272

Merged

adamsitnik approved these changes Oct 18, 2019

View reviewed changes

adamsitnik merged commit 89a3ff1 into dotnet:master Oct 18, 2019

tannergooding mentioned this pull request Oct 19, 2019

Fixing Perf_Single.IsNaN and Perf_Double.IsNaN to do result |= rather than result &= #960

Merged

Conversation

tannergooding commented Oct 17, 2019

Uh oh!

adamsitnik left a comment

Choose a reason for hiding this comment

Uh oh!

adamsitnik Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

adamsitnik Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

EgorBo commented Oct 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Oct 19, 2019

Uh oh!

EgorBo commented Oct 19, 2019

Uh oh!

EgorBo commented Oct 19, 2019

Uh oh!

tannergooding commented Oct 19, 2019

Uh oh!

EgorBo commented Oct 19, 2019

Uh oh!

tannergooding commented Oct 19, 2019

Uh oh!

EgorBo commented Oct 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EgorBo commented Oct 19, 2019 •

edited

Loading