Skip to content

Add loop related optimizations to LLVM JIT#16436

Merged
marek-safar merged 6 commits intomono:masterfrom
EgorBo:improve-fpm
Sep 24, 2019
Merged

Add loop related optimizations to LLVM JIT#16436
marek-safar merged 6 commits intomono:masterfrom
EgorBo:improve-fpm

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 23, 2019

Example:

[MethodImpl(MethodImplOptions.NoInlining)]
static long Test(long x, bool condition)
{
    for (int i=10; i<100000; i+=2)
    {
        if (condition)
            x++;
    }
    return x;
}

Before my change:

LLVM IR:

define i64 @"Program:Test (long,bool)"(i64 %arg_x, i32 %arg_condition) {
BB0:
  br label %BB3
BB3:                                              ; preds = %BB0
  br label %BB2
BB2:                                              ; preds = %BB3
  br label %BB4
BB4:                                              ; preds = %BB6, %BB2
  %0 = phi i32 [ 10, %BB2 ], [ %t37, %BB6 ]
  %1 = phi i64 [ %arg_x, %BB2 ], [ %4, %BB6 ]
  %2 = icmp slt i32 %0, 100000
  br i1 %2, label %BB5, label %BB8
BB5:                                              ; preds = %BB4
  %3 = icmp eq i32 %arg_condition, 0
  br i1 %3, label %BB6, label %BB7
BB8:                                              ; preds = %BB4
  br label %BB1
BB6:                                              ; preds = %BB7, %BB5
  %4 = phi i64 [ %1, %BB5 ], [ %t35, %BB7 ]
  %t37 = add i32 %0, 2
  br label %BB4
BB7:                                              ; preds = %BB5
  %t35 = add i64 %1, 1
  br label %BB6
BB1:                                              ; preds = %BB8
  ret i64 %1
}

ASM:

  mov eax, 10
  cmp eax, 99999
  jg .LBB0_3
.LBB0_2: 
  cmp esi, 1
  sbb rdi, -1
  add eax, 2
  cmp eax, 99999
  jle .LBB0_2
.LBB0_3: 
  mov rax, rdi
  ret

After my change:

LLVM IR:

define i64 @"Program:Test (long,bool)"(i64 %arg_x, i32 %arg_condition) {
  %0 = icmp eq i32 %arg_condition, 0
  %1 = select i1 %0, i64 0, i64 49995
  %2 = add i64 %1, %arg_x
  ret i64 %2
}

ASM:

  xor ecx, ecx
  test esi, esi
  mov eax, 49995
  cmove rax, rcx
  add rax, rdi
  ret

I spent some time analyzing the "Before/After" diffs generated by opt -O1 for our IR: https://godbolt.org/z/vRCpS3 in order to compose this set.
Some dotnet/performance benchmarks use such loops (with a constant)

Diffs:

-simplifycfg diff
-sroa (no changes)
-early-cse (no changes)
-instcombine diff
-correlated-propagation diff
-instcombine diff
-loop-simplify (no changes)
-lcssa diff
-loop-rotate diff
-licm diff
-instcombine diff
-lcssa diff
-indvars diff
-loop-deletion diff
-instcombine diff
(TODO: add -simplifycfg at the end to remove that redundant BB0 after previous pass) - diff

// FIXME: find optimal mono specific order of passes
// see https://llvm.org/docs/Frontend/PerformanceTips.html#pass-ordering
opts = " -simplifycfg -sroa -instcombine -gvn";
opts = " -simplifycfg -sroa -early-cse -instcombine -correlated-propagation -instcombine -loop-simplify -lcssa -loop-rotate -licm -instcombine -lcssa -indvars -loop-deletion -instcombine -simplifycfg";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: simplifycfg is there twice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipnavara the last -simplifycfg removes a redundant block generated by previous pass (instcombine). The first -simplifycfg removes redundant blocks generated by mono 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo Maybe it's worth adding a comment in the code so nobody removes them in the future?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 26, 2019

It seems we should also add the Lower 'expect' Intrinsics pass (to convert our llvm.expect calls into weights)

Copy link
Contributor

@Therzok Therzok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@lewurm
Copy link
Contributor

lewurm commented Aug 29, 2019

@monojenkins build failed

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2019

I am not 100% in this set yet, need to run the benchmarks and diffs one more time to make sure it doesn't regress anything. (however some passes are indeed should be added)

@lewing
Copy link
Member

lewing commented Sep 3, 2019

Looks like this needs a conflict resolved

@marek-safar
Copy link
Member

@EgorBo what is the plan with this PR?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 9, 2019

@marek-safar yeah will update it soon, this one have two mistakes:
-early-cse for some reason slows down benchmarks - need to figure out why
-ipsccp should be replaced with -sccp (ipsccp is interproc)

@lewing
Copy link
Member

lewing commented Sep 16, 2019

Checking in again.

@EgorBo EgorBo requested a review from imhameed as a code owner September 19, 2019 01:12
@EgorBo
Copy link
Member Author

EgorBo commented Sep 19, 2019

Just updated the pass list.
My algorithm:
Run a benchmark in LLVM AOT mode with -print-before-all, -print-after-all and look for passes with useful changes -- make a list of those (some common sequence):

Benchmarks (the new pass order is a baseline):
image

(benchmarks: https://github.com/nxrighthere/BurstBenchmarks)

dotnet/benchmarks: mostly positive effect, a few 5-10% regressions:
image

@EgorBo
Copy link
Member Author

EgorBo commented Sep 19, 2019

@monojenkins build failed

@marek-safar marek-safar merged commit 8f85967 into mono:master Sep 24, 2019
ManickaP pushed a commit to ManickaP/runtime that referenced this pull request Jan 20, 2020
* Add loop-related optimizations

* Update llvm-jit.cpp

* Update llvm-jit.cpp

* Update llvm-jit.cpp


Commit migrated from mono/mono@8f85967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants