Skip to content

Adding benchmarks covering float.IsNaN and double.IsNaN#952

Merged
adamsitnik merged 1 commit intodotnet:masterfrom
tannergooding:isnan
Oct 18, 2019
Merged

Adding benchmarks covering float.IsNaN and double.IsNaN#952
adamsitnik merged 1 commit intodotnet:masterfrom
tannergooding:isnan

Conversation

@tannergooding
Copy link
Member

This just adds basic benchmarks for float.IsNaN and double.IsNaN.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!


bool result = false;

for (int i = 0; i < 1000000; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we try to avoid loops in the benchmarks: https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Loops

But I remember our discussion from the past when you have mentioned that JIT might in future remember the results of some Math operations, so testing the values from 0 to 1000000 is a good idea here. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here.

Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.

I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark.NET was also not doing a good job auto determining how many times to run IsNaN here

Are you sure? I've tried it myself it was working fine.

Once inlined, IsNaN is two instructions (a ucomis and setp), which is not enough for any kind of reliable measurement.
I dont believe Benchmark.NET has any feature which would allow accurate measurement of APIs like this without the author explicitly looping.

BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.

Some time ago I've created the following example to show the "evolution" of such a benchmark, where as an example I've used Math.Abs. The last method (SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll) is more or less what BDN does.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;
using BenchmarkDotNet.Attributes;

namespace BenchmarkDotNet.Samples
{
    public class MathNano
    {
        [Benchmark]
        [Arguments(-1.0)]
        public double AbsDoubleBenchmark(double value) => Math.Abs(value);
    }
    
    public class StepByStep
    {
        private double _doubleHolder;
        
        public double AbsDoubleBenchmark(double value) => Math.Abs(value);
        
        public double Empty(double value) => value;
        
        public double SimpleLoop_Hardcoded(long operations)
        {
            double result = 0;
            const double hardcoded = -1.0;

            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = AbsDoubleBenchmark(hardcoded);
                
                actual.Stop();

                double total = actual.Elapsed.TotalMilliseconds;
                double perOperation = total / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Hardcoded {nanosecondPerOperation} ns/op");
            }
            
            Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
            return result;
        }
        
        public double SimpleLoop_Overhead_Hardcoded(long operations)
        {
            double result = 0;
            const double hardcoded = -1.0;
            
            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = AbsDoubleBenchmark(hardcoded);
                
                actual.Stop();
                
                var overhead = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = Empty(hardcoded);
                
                overhead.Stop();

                double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
                double perOperation = diff / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Overhead_Hardcoded {nanosecondPerOperation} ns/op");
            }
            
            Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
            return result;
        }
        
        public double SimpleLoop_Overhead_Passed(long operations, double value)
        {
            double result = 0;
            
            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = AbsDoubleBenchmark(value);
                
                actual.Stop();
                
                var overhead = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = Empty(value);
                
                overhead.Stop();

                double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
                double perOperation = diff / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Overhead_Passed {nanosecondPerOperation} ns/op");
            }
            
            Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
            return result;
        }
        
        [MethodImpl(MethodImplOptions.NoInlining)]
        public double AbsDoubleBenchmarkNoInline(double value) => Math.Abs(value);
        
        [MethodImpl(MethodImplOptions.NoInlining)]
        public double EmptyNoInline(double value) => value;

        
        public double SimpleLoop_Overhead_Passed_NoInline(long operations, double value)
        {
            double result = 0;
            
            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = AbsDoubleBenchmarkNoInline(value);
                
                actual.Stop();
                
                var overhead = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    result = EmptyNoInline(value);
                
                overhead.Stop();

                double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
                double perOperation = diff / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline {nanosecondPerOperation} ns/op");
            }
            
            Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
            return result;
        }
        
        public void SimpleLoop_Overhead_Passed_NoInline_Volatile(long operations, double value)
        {
            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); 
                
                actual.Stop();
                
                var overhead = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations; innerIteration++)
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); 
                
                overhead.Stop();

                double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
                double perOperation = diff / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile {nanosecondPerOperation} ns/op");
            }
            
            Console.WriteLine(); Console.WriteLine(); Console.WriteLine();
        }
        
        public void SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll(long operations, double value)
        {
            for (int iteration = 0; iteration < 20; iteration++)
            {
                var actual = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++)
                {
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                    Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value)); Volatile.Write(ref _doubleHolder, AbsDoubleBenchmarkNoInline(value));
                }
                
                actual.Stop();

                var overhead = Stopwatch.StartNew();

                for (long innerIteration = 0; innerIteration < operations / 16; innerIteration++)
                {
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                    Volatile.Write(ref _doubleHolder, EmptyNoInline(value)); Volatile.Write(ref _doubleHolder, EmptyNoInline(value));
                }

                overhead.Stop();

                double diff = actual.Elapsed.TotalMilliseconds - overhead.Elapsed.TotalMilliseconds;
                double perOperation = diff / operations;
                double nanosecondPerOperation = perOperation * 1_000_000.0;
                
                Console.WriteLine($"SimpleLoop_Overhead_Passed_NoInline_Volatile_Unroll {nanosecondPerOperation} ns/op");
            }
        }
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I've tried it myself it was working fine.

Positive. Changing the test to just do:

[Benchmark]
[Arguments(0.0)]
[Arguments(double.NaN)]
public bool IsNaN(double value) => double.IsNaN(value);

Results in:

| Method |                                                                  Toolchain | value |     Mean |     Error |    StdDev |   Median |      Min |      Max | Ratio | MannWhitney(3%) | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |--------------------------------------------------------------------------- |------ |---------:|----------:|----------:|---------:|---------:|---------:|------:|---------------- |------:|------:|------:|----------:|
|  IsNaN |      \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe |   NaN | 1.238 ns | 0.0012 ns | 0.0010 ns | 1.238 ns | 1.236 ns | 1.239 ns |  0.99 |            Same |     - |     - |     - |         - |
|  IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe |   NaN | 1.254 ns | 0.0019 ns | 0.0016 ns | 1.254 ns | 1.252 ns | 1.257 ns |  1.00 |            Base |     - |     - |     - |         - |
|        |                                                                            |       |          |           |           |          |          |          |       |                 |       |       |       |           |
|  IsNaN |      \coreclr\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe |     0 | 1.237 ns | 0.0029 ns | 0.0026 ns | 1.237 ns | 1.235 ns | 1.244 ns |  0.99 |            Same |     - |     - |     - |         - |
|  IsNaN | \coreclr_base\bin\tests\Windows_NT.x64.Release\Tests\Core_Root\CoreRun.exe |     0 | 1.255 ns | 0.0029 ns | 0.0024 ns | 1.255 ns | 1.252 ns | 1.260 ns |  1.00 |            Base |     - |     - |     - |         - |

It also sometimes results in the new implementation being reported as slower, there are often 4-6 outliers being removed, and BDN sometimes complains about the results not being reported correctly.

BenchmarkDotNet prevents from inlining of the benchmark by wrapping it with a delegate. It also performs manual loop unrolling and few other things to make such measurement accurate.

Yes, I understand this part, but I was referring to IsNaN being inlined into the thing marked [Benchmark].

The root problem here is that the core of IsNaN(double value) takes ~5 cycles to execute. So the method prologue/epilogue and the call itself take significantly longer to execute than the thing we want to be measuring. This functionally means that BDN is attempting to determine an inner iteration count based on what is functionally just "noise" (because the amount of time the "core" method takes to execute is significantly less than the precision of the timer 😄).

So, the only way to correctly measure the cost of something like IsNaN or Sqrt is to "boost" the numbers such that thing you are testing takes at least longer than a single tick of the hardware timer.

@adamsitnik adamsitnik merged commit 89a3ff1 into dotnet:master Oct 18, 2019
@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2019

@tannergooding @adamsitnik
can we at least make something like

[MethodImpl(MethodImplOptions.NoInlining)]
static int GetIterations() => 100000;

This new benchmark is simply xor eax eax for Mono-LLVM...

@tannergooding
Copy link
Member Author

This new benchmark is simply xor eax eax for Mono-LLVM.

That sounds like a bug that needs to be addressed in Mono-LLVM. This function is an explicit IsNaN check and it should not be optimized that way.

There needs to be some way to block fast math optimizations for certain methods, if nothing else. Otherwise, you flip a switch and can break arbitrary code that isn't yours.

@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2019

@tannergooding it's not the fast math mode. IsNaN is now inlined as x != x and I guess LLVM simply pre-calculated the whole loop as it's simple.

@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2019

@tannergooding here is the IR we emit: https://godbolt.org/z/Crz1o7

@tannergooding
Copy link
Member Author

IsNaN is now inlined as x != x and I guess LLVM simply pre-calculated the whole loop as it's simple.

That still sounds like a bug, this method takes an arbitrary input (testing both 0.0 and NaN, depending on the call) and so it should be impossible for LLVM to optimize to either or. At best, it could optimize it to a single IsNaN check and not always return false.

@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2019

@tannergooding C++ compilers seem do the same: https://godbolt.org/z/huc8am (without the unsafe math)

@tannergooding
Copy link
Member Author

I see the issue. The code should be doing result |= rather than result &=

&= is optimized away because it starts as false, so it can never become true.

@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2019

@tannergooding yeah, llvm is smart :) and it did in just one pass (sparse conditional constant propogation):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants