Replace (val / 2) with (val * 0.5) in Jit by EgorBo · Pull Request #24584 · dotnet/coreclr

EgorBo · 2019-05-15T05:32:42Z

vmulss/vmulsd has better both latency and throughput than vdivss/vdivsd at least for the hardware I have. e.g. on my MacBook's Haswell:

vdivss (Latency: 10-20,  R.Throughput: 7-14)
vmulss (Latency:     5,  R.Throughput:  0.5)

So if a divisor is a constant power of two we can optimize it, e.g.:

a = b / 2;
becomes:
a = b * 0.5;

See https://godbolt.org/z/rz9h4E (clang, gcc, msvc, x86, AMD64, AArch64 - everywhere this optimization is applied. Btw, LLVM also helps Mono to optimize this case for C#)

I wrote a small benchmark:

[Benchmark]
[Arguments(MathF.PI)]
public float Div_old(float value)
{
    for (int i = 0; i < 10; i++)
        value = value / 2f;
    return value;
}

[Benchmark(Baseline = true)]
[Arguments(MathF.PI)]
public float Div_new(float value)
{
    for (int i = 0; i < 10; i++)
        value = value * 0.5f;
    return value;
}

and the results are (Haswell):

BenchmarkDotNet=v0.11.5, OS=macOS Mojave 10.14 (18A391) [Darwin 18.0.0]
Intel Core i7-4980HQ CPU 2.80GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview5-011568
  [Host]   : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
  ShortRun : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3

|  Method |     value |      Mean |     Error |    StdDev | Ratio | RatioSD |
|-------- |---------- |----------:|----------:|----------:|------:|--------:|
| Div_old | 3.1415927 | 16.848 ns | 1.6733 ns | 0.0917 ns |  3.14 |    0.02 |
| Div_new | 3.1415927 |  5.359 ns | 0.5572 ns | 0.0305 ns |  1.00 |    0.00 |

/cc: @tannergooding

src/jit/morph.cpp

src/jit/utils.h

mikedn · 2019-05-19T18:48:04Z

FWIW I tried to do this or a similar FP optimization in the past but didn't bother with it because it wasn't very clear how useful it is, the framework doesn't have a lot of FP code. But now we have WPF in .NET Core - its PresentationCore.dll and PresentationFramework.dll contain a bit more than 100 hits. So it's quite useful, while many developers do this optimization manually there are still enough opportunities.

I haven't noticed any regressions. In theory this can block CSE because VN doesn't understand that x * 0.5 == x / 2.0. That can be fixed later if necessary, VN has other similar issues anyway.

src/jit/utils.cpp

EgorBo · 2019-05-23T00:00:26Z

@mikedn @tannergooding sorry for the delayed response. I've refactored it to frexp (I need to double check if it's 100% portable) based on your HasInverse
the float / const pattern can be found in game engines too, already found several places.

EgorBo · 2019-05-23T00:04:42Z

Also, I did some testing locally, e.g. https://gist.github.com/EgorBo/866a49334291c1ac3b108eb9341681ae (similar for double and for non-power-of-two constants)

src/jit/morph.cpp

src/pal/src/cruntime/math.cpp

EgorBo · 2019-05-23T14:31:46Z

@mikedn @tannergooding could you please take a look one more time? I think I handled all the cases. Also added a test.
isNormal implementation was copied from the managed double.IsNormal() impl.

Also: should I take care about Big Endian?

src/jit/morph.cpp

src/jit/utils.cpp

tests/src/JIT/opt/InstructionCombining/DivToMul.cs

…oducible)

EgorBo · 2019-06-19T18:19:53Z

A small Roslyn-based script to find places where this optimization can be applied (found some in various math/graphics related C# repositories): https://gist.github.com/EgorBo/74b034fe1936c43fcd0b42934322557c
Also, there are places in CoreFX where * 0.5 is used (I guess authors knew it's better than / 2.0 😄) e.g.: https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Numerics/src/System/Numerics/Complex.cs#L418

mikedn · 2019-06-19T18:23:16Z

A small Roslyn-based script to find places where this optimization can be applied (found some in various math/graphics related C# repositories):

Erm, why do you need such a contraption? Perhaps you're not aware how run JIT diffs?

EgorBo · 2019-06-19T18:26:52Z

@mikedn I wanted to quickly find such places without even downloading repositories (via HttpClient) 🙂. Also I made a list of patterns that LLVM is able to optimize (InstCombine transforms) and looking for them in those repos.

mikedn · 2019-06-19T18:40:22Z

I wanted to quickly find such places without even downloading repositories

That's not going to find the interesting case, where expressions like x / 2.0 appear as a result of other JIT optimizations (inlining, constant folding & propagation etc.). Cases where x / 2.0 literally appears in the code aren't very interesting, developers can easily change that to x * 0.5 if they wish so. Adding a zillion of such expression transformations to the JIT might not be the best thing to do, especially in the current JIT state (morph is really a nightmare).

Anyway, here's a x64 FX diff:

Summary:
(Lower is better)
Total bytes of diff: 0 (0.00% of base)
0 total files with size differences (0 improved, 0 regressed), 129 unchanged.
0 total methods with size differences (0 improved, 0 regressed), 146772 unchanged.
3 files had text diffs but not size diffs.
Microsoft.Diagnostics.Tracing.TraceEvent.dasm had 12 diffs
System.Private.CoreLib.dasm had 8 diffs
Microsoft.DotNet.Cli.Utils.dasm had 4 diffs
Completed analysis in 28.14s

As I mentioned in a previous post, it's a bit of a strange case because divss and mulss instructions have the same size so nothing shows up in terms of size differences. Text diff shows that there are only 6 diffs in the entire framework. Not surprising as there's not a lot of floating point code in corelib and corefx.

EgorBo · 2019-06-19T18:56:18Z

@mikedn yeah, but as you mentioned earlier there are cases in dotnet/wpf repo:

ReachFramework\AlphaFlattener\BrushProxy.cs:
        (-B + root) / A / 2

ReachFramework\AlphaFlattener\BrushProxy.cs:
        (-B - root) / A / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (rRadSquared + rDot) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (xEnd - xStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (yEnd - yStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (xEnd + xStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (yEnd + yStart) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        - (dstwidth - width) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        (dstwidth - width) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        - (dstheight - height) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        (dstheight - height) / 2

PresentationCore\MS\internal\Ink\StrokeRenderer.cs:
        Math.PI * 3.0 / 2.0

PresentationCore\MS\internal\Shaping\CompositeTypefaceMetrics.cs:
        (-UnderlineOffsetDefaultInEm) / 2

PresentationCore\System\Windows\Media\CompositionTarget.cs:
        Double.MinValue / 2.0

PresentationCore\System\Windows\Media\CompositionTarget.cs:
        Double.MinValue / 2.0

PresentationCore\System\Windows\Media\GlyphRun.cs:
        (firstStopAdvance + currentAdvance) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (centerWidth - ExtentWidth) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (ViewportHeight - ExtentHeight) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (ExtentWidth - _lastRowChangeExtentWidth) / 2.0

PresentationFramework\MS\Internal\documents\TextBoxView.cs:
        (width - lineWidth) / 2

PresentationFramework\MS\Internal\Ink\PenCursorManager.cs:
        (width - originalWidth) / 2

PresentationFramework\MS\Internal\Ink\PenCursorManager.cs:
        (height - originalHeight) / 2

PresentationFramework\System\Windows\Controls\Grid.cs:
        (minPower + maxPower) / 2.0

PresentationFramework\System\Windows\Documents\CaretElement.cs:
        double.MaxValue/2

PresentationFramework\System\Windows\Documents\CaretElement.cs:
        double.MaxValue/2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (_parameter + top) / 2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (_parameter + bottom) / 2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (bottom + top) / 2

mikedn · 2019-06-19T19:00:22Z

Unfortunately running diffs on wpf assemblies is a bit more tricky at the moment so I haven't done it again. The ~100 hits estimation from my previous post on the matter likely still stands.

sandreenko

The change looks good, thanks @EgorBo.

However, I am not 100% sure that it is worth taking (with the current morph state and the lack of a separated expression transformation optimizer), @BruceForstall?

sandreenko · 2019-09-04T22:31:51Z

src/jit/utils.cpp

+    return (bits < 0x7FF0000000000000) && (bits != 0) && ((bits & 0x7FF0000000000000) != 0);
+}
+
+bool FloatingPointUtils::isNormal(float x)


I do not like this bit checks, but without C++ isnormal I do not see any better solution.

tests/src/JIT/opt/InstructionCombining/DivToMul.cs

tests/src/JIT/opt/InstructionCombining/DivToMul.csproj

briansull · 2019-09-05T17:40:38Z

For the non power of two float (32-bit) divides it would also always be a win to perform a reciprocal multiply operation using a (64-bit) multiply and conversion back to float (32-bit)

Turns out it isn't a win :-(

tannergooding · 2019-09-05T17:51:12Z

For float (32-bit) divides it would also always be a win to perform a reciprocal multiply operation using a (64-bit) multiply and conversion back to float (32-bit)

Why would you do that rather than just doing the 32-bit reciprocal multiplication? For cases where it is known to be equivalent, just keeping it as a single-precision float would be more efficient..

briansull · 2019-09-05T17:59:29Z

It was to cover the non power of two divide by constant case:

        public static float DivGen_new(float value)
        {
            for (int i = 0; i < 10; i++)
            {
                double valueD  = (double) value;
	        double resultD = valueD * (1.0 / 3.0);
	        value = (float) resultD;
            }
            return value;
        }

However it turns out the the convert instructions are pretty slow, so I believe that this transformation loses:

G_M2669_IG03:		;; bbWeight=10   
       F30F5AC0             cvtss2sd  xmm0, xmm0
       F20F590500000000     mulsd    xmm0, qword ptr [reloc @RWD00]
       F20F5AC0             cvtsd2ss  xmm0, xmm0
       FFC0                 inc      eax
       83F80A               cmp      eax, 10
       7CE9                 jl       SHORT G_M2669_IG03

RWD00  dq	3FD5555555555555h

EgorBo · 2019-09-05T18:54:45Z

@briansull nice try anyway 🙂

briansull

LGTM

tannergooding · 2019-09-05T21:04:49Z

src/jit/morph.cpp

+                {
+                    oper = GT_MUL;
+                    tree->ChangeOper(oper);
+                    op2->AsDblCon()->gtDconVal = 1.0 / divisor;


It might be worth noting that this is safe and doing the single operation in single precision isn't required.

The paper Innocuous Double Rounding of Basic Arithmetic Operations provides a proof that a single primitive operation done to at least twice the precision of the target format does not incur error due to double-rounding (hence for float divisor doing (float)(1.0 / divisor) is the same as (1.0f / divisor)).

This is not a safe thing to do across multiple operations (you must downcast back to float after each individual operation) nor is it safe if one of the inputs could not be exactly represented as a float (e.g. if you have double divisor and (float)divisor != divisor).

sandreenko · 2019-09-24T05:42:49Z

Thanks @EgorBo.

EgorBo added 3 commits May 14, 2019 21:42

Replace "val / dcon" with "val * (1.0 / dcon)"

fc400d3

fix formatting issue

5c11507

replace tree->gtOp.gtOp2 with op2

40bb91a

tannergooding reviewed May 15, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 15, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

mikedn reviewed May 15, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

mikedn reviewed May 15, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

mikedn reviewed May 15, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

EgorBo added 3 commits May 15, 2019 14:32

Address feedback

3938761

fix mantissa calculations

9594249

fix double mantissa

0aa8c6c

tannergooding reviewed May 16, 2019

View reviewed changes

src/jit/utils.h Outdated Show resolved Hide resolved

tannergooding reviewed May 16, 2019

View reviewed changes

src/jit/utils.h Outdated Show resolved Hide resolved

tannergooding reviewed May 16, 2019

View reviewed changes

src/jit/utils.h Outdated Show resolved Hide resolved

EgorBo added 2 commits May 23, 2019 01:56

use frexp+isnormal

bcd0bb7

use frexp

eef975d

tannergooding reviewed May 22, 2019

View reviewed changes

src/jit/utils.cpp Outdated Show resolved Hide resolved

mikedn reviewed May 23, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

mikedn reviewed May 23, 2019

View reviewed changes

src/pal/src/cruntime/math.cpp Outdated Show resolved Hide resolved

EgorBo added 5 commits May 23, 2019 05:22

rollback frexp impl

e8d997d

Add isNormal(float/double)

aae83cb

fix exponent in isPow2(double)

868f3b9

cleanup

95542d3

Add tests

cfa0414

cleanup

59d24d8

jkotas added the area-CodeGen label May 23, 2019

CarolEidt suggested changes Jun 19, 2019

View reviewed changes

src/jit/morph.cpp Outdated Show resolved Hide resolved

src/jit/utils.cpp Show resolved Hide resolved

tests/src/JIT/opt/InstructionCombining/DivToMul.cs Outdated Show resolved Hide resolved

EgorBo added 3 commits June 19, 2019 20:08

add fixed seed to Random in DivToMul (to make potential failures repr…

a084401

…oducible)

use TestLibrary.Generator.GetDouble() instead of Random

4d68539

Remove Random-based values

11fb130

EgorBo added 2 commits June 20, 2019 10:35

Update utils.cpp

a5353ed

Update utils.cpp

a1d1dcf

EgorBo closed this Jun 28, 2019

EgorBo reopened this Jun 28, 2019

Merge remote-tracking branch 'dotnet/master' into opt-div-pow2

40380f9

sandreenko approved these changes Sep 4, 2019

View reviewed changes

EgorBo added 2 commits September 5, 2019 13:45

Merge remote-tracking branch 'dotnet/master' into opt-div-pow2

d750e9f

Address feedback

312c1d0

briansull mentioned this pull request Sep 5, 2019

Implement PerfScore #26456

Merged

briansull approved these changes Sep 5, 2019

View reviewed changes

tannergooding reviewed Sep 5, 2019

View reviewed changes

sandreenko merged commit 40faef6 into dotnet:master Sep 24, 2019

This was referenced Jan 31, 2020

Configurable floating-point behavior (Fast Math) dotnet/runtime#12753

Open

Optimize Math.Max/Min(X, C) in JIT dotnet/runtime#13372

Open

tannergooding mentioned this pull request May 4, 2020

RyuJIT: x*2 -> x+x; x*1 -> x (fp) dotnet/runtime#33024

Merged

Conversation

EgorBo commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikedn commented May 19, 2019

Uh oh!

Uh oh!

EgorBo commented May 23, 2019

Uh oh!

EgorBo commented May 23, 2019

Uh oh!

Uh oh!

Uh oh!

EgorBo commented May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikedn commented Jun 19, 2019

Uh oh!

EgorBo commented Jun 19, 2019

Uh oh!

mikedn commented Jun 19, 2019

Uh oh!

EgorBo commented Jun 19, 2019

Uh oh!

mikedn commented Jun 19, 2019

Uh oh!

sandreenko left a comment

Choose a reason for hiding this comment

Uh oh!

sandreenko Sep 4, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

briansull commented Sep 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Sep 5, 2019

Uh oh!

briansull commented Sep 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Sep 5, 2019

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

tannergooding Sep 5, 2019

Choose a reason for hiding this comment

Uh oh!

sandreenko commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

EgorBo commented May 15, 2019 •

edited

Loading

EgorBo commented May 23, 2019 •

edited

Loading

EgorBo commented Jun 19, 2019 •

edited

Loading

briansull commented Sep 5, 2019 •

edited

Loading

briansull commented Sep 5, 2019 •

edited

Loading