Adding benchmarks covering float.IsNaN and double.IsNaN#952
Adding benchmarks covering float.IsNaN and double.IsNaN#952adamsitnik merged 1 commit intodotnet:masterfrom tannergooding:isnan
Conversation
|
|
||
| bool result = false; | ||
|
|
||
| for (int i = 0; i < 1000000; i++) |
There was a problem hiding this comment.
In general, we try to avoid loops in the benchmarks: https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Loops
But I remember our discussion from the past when you have mentioned that JIT might in future remember the results of some Math operations, so testing the values from 0 to 1000000 is a good idea here. 👍
There was a problem hiding this comment.
Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here.
Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.
I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.
There was a problem hiding this comment.
Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here
Are you sure? I've tried it myself it was working fine.
Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.
I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.
BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.
Some time ago I've created the following example to show the "evolution" of such a benchmark, where as an example I've used Math.Abs. The last method (SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll) is more or less what BDN does.
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;
using BenchmarkDotNet.Attributes;
namespace BenchmarkDotNet.Samples
{
public class MathNano
{
[Benchmark]
[Arguments(-1.0)]
public double AbsDoubleBenchmark(double value) => Math.Abs(value);
}
public class StepByStep
{
private double _doubleHolder;
public double AbsDoubleBenchmark(double value) => Math.Abs(value);
public double Empty(double value) => value;
public double SimpleLoop_Hardcoded(long operations)
{
double result = 0;
const double hardcoded = -1.0;
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = AbsDoubleBenchmark(hardcoded);
actual.Stop();
double total = actual.Elapsed.TotalMilliseconds;
double perOperation = total / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Hardcoded {nanosecondPerOperation} ns/op");
}
Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
return result;
}
public double SimpleLoop_Overhead_Hardcoded(long operations)
{
double result = 0;
const double hardcoded = -1.0;
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = AbsDoubleBenchmark(hardcoded);
actual.Stop();
var overhead = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = Empty(hardcoded);
overhead.Stop();
double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
double perOperation = diff / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Overhead_Hardcoded {nanosecondPerOperation} ns/op");
}
Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
return result;
}
public double SimpleLoop_Overhead_Passed(long operations, double value)
{
double result = 0;
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = AbsDoubleBenchmark(value);
actual.Stop();
var overhead = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = Empty(value);
overhead.Stop();
double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
double perOperation = diff / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Overhead_Passed {nanosecondPerOperation} ns/op");
}
Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
return result;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public double AbsDoubleBenchmarkNoInline(double value) => Math.Abs(value);
[MethodImpl(MethodImplOptions.NoInlining)]
public double EmptyNoInline(double value) => value;
public double SimpleLoop_Overhead_Passed_NoInline(long operations, double value)
{
double result = 0;
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = AbsDoubleBenchmarkNoInline(value);
actual.Stop();
var overhead = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
result = EmptyNoInline(value);
overhead.Stop();
double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
double perOperation = diff / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline {nanosecondPerOperation} ns/op");
}
Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
return result;
}
public void SimpleLoop_Overhead_Passed_NoInline_Volatile(long operations, double value)
{
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
actual.Stop();
var overhead = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations; innerIteration++)
Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
overhead.Stop();
double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
double perOperation = diff / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile {nanosecondPerOperation} ns/op");
}
Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
}
public void SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll(long operations, double value)
{
for (int iteration = 0; iteration < 20; iteration++)
{
var actual = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++)
{
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
}
actual.Stop();
var overhead = Stopwatch.StartNew();
for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++)
{
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
}
overhead.Stop();
double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
double perOperation = diff / operations;
double nanosecondPerOperation = perOperation * 1_000_000.0;
Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll {nanosecondPerOperation} ns/op");
}
}
}
}There was a problem hiding this comment.
Are you sure? I've tried it myself it was working fine.
Positive. Changing the test to just do:
[Benchmark]
[Arguments(0.0)]
[Arguments(double.NaN)]
public bool IsNaN(double value) => double.IsNaN(value);Results in:
| Method | Toolchain | value | Mean | Error | StdDev | Median | Min | Max | Ratio | MannWhitney(3%) | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |--------------------------------------------------------------------------- |------ |---------:|----------:|----------:|---------:|---------:|---------:|------:|---------------- |------:|------:|------:|----------:|
| IsNaN | \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | NaN | 1.238 ns | 0.0012 ns | 0.0010 ns | 1.238 ns | 1.236 ns | 1.239 ns | 0.99 | Same | - | - | - | - |
| IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | NaN | 1.254 ns | 0.0019 ns | 0.0016 ns | 1.254 ns | 1.252 ns | 1.257 ns | 1.00 | Base | - | - | - | - |
| | | | | | | | | | | | | | | |
| IsNaN | \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | 0 | 1.237 ns | 0.0029 ns | 0.0026 ns | 1.237 ns | 1.235 ns | 1.244 ns | 0.99 | Same | - | - | - | - |
| IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe | 0 | 1.255 ns | 0.0029 ns | 0.0024 ns | 1.255 ns | 1.252 ns | 1.260 ns | 1.00 | Base | - | - | - | - |
It also sometimes results in the new implementation being reported as slower, there are often 4-6 outliers being removed, and BDN sometimes complains about the results not being reported correctly.
BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.
Yes, I understand this part, but I was referring to IsNaN being inlined into the thing marked [Benchmark].
The root problem here is that the core of IsNaN(double value) takes ~5 cycles to execute. So the method prologue/epilogue and the call itself take significantly longer to execute than the thing we want to be measuring. This functionally means that BDN is attempting to determine an inner iteration count based on what is functionally just "noise" (because the amount of time the "core" method takes to execute is significantly less than the precision of the timer 😄).
So, the only way to correctly measure the cost of something like IsNaN or Sqrt is to "boost" the numbers such that thing you are testing takes at least longer than a single tick of the hardware timer.
|
@tannergooding @adamsitnik [MethodImpl(MethodImplOptions.NoInlining)]
static int GetIterations() => 100000;This new benchmark is simply |
That sounds like a bug that needs to be addressed in Mono-LLVM. This function is an explicit There needs to be some way to block |
|
@tannergooding it's not the fast math mode. IsNaN is now inlined as |
|
@tannergooding here is the IR we emit: https://godbolt.org/z/Crz1o7 |
That still sounds like a bug, this method takes an arbitrary input (testing both |
|
@tannergooding C++ compilers seem do the same: https://godbolt.org/z/huc8am (without the unsafe math) |
|
I see the issue. The code should be doing
|
|
@tannergooding yeah, llvm is smart :) and it did in just one pass (sparse conditional constant propogation): |

This just adds basic benchmarks for
float.IsNaNanddouble.IsNaN.