Skip to content

Lower Or pattern without allocating place#111752

Merged
bors merged 5 commits intorust-lang:masterfrom
dingxiangfei2009:lower-or-pattern
Sep 1, 2023
Merged

Lower Or pattern without allocating place#111752
bors merged 5 commits intorust-lang:masterfrom
dingxiangfei2009:lower-or-pattern

Conversation

@dingxiangfei2009
Copy link
Contributor

@dingxiangfei2009 dingxiangfei2009 commented May 19, 2023

cc @azizghuloum @cjgillot

Related to #111583 and #111644

While reviewing #111644, it occurs to me that while we directly lower conjunctive predicates, which are connected with &&, into the desirable control flow, today we don't directly lower the disjunctive predicates, which are connected with ||, in the similar fashion. Instead, we allocate a place for the boolean temporary to hold the result of evaluating the || expression.

Usually I would expect optimization at later stages to "inline" the evaluation of boolean predicates into simple CFG, but #111583 is an example where && is failing to be optimized away and the assembly shows that both the expensive operands are evaluated. Therefore, I would like to make a small change to make the CFG a bit more straight-forward without invoking the as_temp machinery, and plus avoid allocating the place to hold the boolean result as well.

@rustbot
Copy link
Collaborator

rustbot commented May 19, 2023

r? @davidtwco

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 19, 2023
@rust-log-analyzer

This comment has been minimized.

@azizghuloum
Copy link
Contributor

@dingxiangfei2009 was going to come after || once done with the && case. There are also the ! and if cases that probably needlessly allocate as well. One step at a time I guess.

@dingxiangfei2009
Copy link
Contributor Author

@azizghuloum Okay, I will do the Not case too.

Copy link
Contributor

@cjgillot cjgillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few tests for this? For instance a mir-opt/built test to show the resulting MIR with temporaries to drop?
If you can, make the test a preceding commit, so we can see the diff with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the occasion to add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the ExprKind::And, ExprKind::Use, ExprKind::Not cases too?
And also a complex one with a let condition somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I will also incorprate the other one from #111644.

@dingxiangfei2009 dingxiangfei2009 marked this pull request as draft June 5, 2023 20:10
@dingxiangfei2009
Copy link
Contributor Author

I am trying another way to reduce the basic block count.

@dingxiangfei2009 dingxiangfei2009 force-pushed the lower-or-pattern branch 3 times, most recently from 48643ab to 268c712 Compare June 13, 2023 06:00
@rust-log-analyzer

This comment has been minimized.

@dingxiangfei2009
Copy link
Contributor Author

cc @cjgillot Good day! I am going to drop the draft status on this PR and requesting for another review.

The majority of excessive block production is due to scope maintenance. To reduce this, this new patch sees a bit of overhaul of the lowering process. To assist construction of scope maintenance, a helper ThenElseBreakPads is introduced, which keeps a stack of "landing pads" to be connected after exiting the scopes. These "landing pads" are the blocks at the tails of a chain of blocks produced in order to break out one or more scopes by dropping temps and storages. With this helper, we "recycle" clean blocks as much as we can and avoid creation of simple "goto" blocks if exiting some scopes are no-ops.

As a result, for the following Rust program,

if Droppy(0).0 > 0 || Droppy(1).0 > 1 {}

... we have this control flow in MIR, visualised.

g

I have to confess, this may not be the best way to do it. Any hint or pointer is very much welcome. 🙇

@dingxiangfei2009 dingxiangfei2009 marked this pull request as ready for review June 13, 2023 22:46
@bors
Copy link
Collaborator

bors commented Jun 14, 2023

☔ The latest upstream changes (presumably #112418) made this pull request unmergeable. Please resolve the merge conflicts.

@cjgillot
Copy link
Contributor

You'll need to bless coverage tests too.
r=me once that is done
@bors delegate+

@bors
Copy link
Collaborator

bors commented Aug 30, 2023

✌️ @dingxiangfei2009, you can now approve this pull request!

If @cjgillot told you to "r=me" after making some further change, please make that change, then do @bors r=@cjgillot

@dingxiangfei2009
Copy link
Contributor Author

@bors r=@cjgillot

I figured out the way to run the full test suite this time. Hopefully this would work.

@bors
Copy link
Collaborator

bors commented Aug 30, 2023

📌 Commit 97cf7eb234b9aa55cafe964e1f2ccfaa1c0e4771 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 30, 2023
@bors
Copy link
Collaborator

bors commented Aug 30, 2023

⌛ Testing commit 97cf7eb234b9aa55cafe964e1f2ccfaa1c0e4771 with merge e266e12b66468989a379299a29d9bc7cb0ab050f...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Aug 30, 2023

💔 Test failed - checks-actions

@cjgillot
Copy link
Contributor

cjgillot commented Sep 1, 2023

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 1, 2023

📌 Commit 67553e8 has been approved by cjgillot

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Sep 1, 2023

⌛ Testing commit 67553e8 with merge f4555ef...

@bors
Copy link
Collaborator

bors commented Sep 1, 2023

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing f4555ef to master...

Comment on lines 34 to 37
LL| | if
LL| 1| !
LL| | !
LL| 1| is_true
LL| 0| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a coverage perspective this seems wrong; the ! operator should have the same execution count as its operand.

However, at a glance I can't say whether this PR is actually doing something wrong, or just exposing a latent issue in the coverage instrumentor's heuristics.

(I'm not particularly concerned about this being a problem in practice; I just want to explicitly note it as something for me to keep my eye on.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be the case that because ! in if condition is now directly lowered into switchInts, by-passing the operator evaluation, the profiler might not pick it up. In other contexts, such as boolean expressions, ! should still show up I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. So if !cond { a } else { b } gets lowered as though it were if cond { b } else { a }. The ! doesn’t exist in MIR, and the coverage pass doesn’t have a special case to detect this, so it isn’t included in the coverage region.

This seems acceptable for now. The mappings are slightly worse, but they aren’t lying, and most coverage users probably won’t notice the difference.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f4555ef): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 1.0%] 8
Regressions ❌
(secondary)
0.9% [0.5%, 1.6%] 21
Improvements ✅
(primary)
-1.0% [-3.7%, -0.3%] 10
Improvements ✅
(secondary)
-0.4% [-0.5%, -0.3%] 3
All ❌✅ (primary) -0.3% [-3.7%, 1.0%] 18

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.6% [1.3%, 11.6%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.4% [-7.5%, -0.1%] 7
Improvements ✅
(secondary)
-2.8% [-3.5%, -2.5%] 3
All ❌✅ (primary) -0.5% [-7.5%, 11.6%] 11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.4%, 1.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.6% [-2.7%, -0.5%] 2
Improvements ✅
(secondary)
-2.3% [-2.4%, -2.2%] 3
All ❌✅ (primary) -0.6% [-2.7%, 1.4%] 3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.4%] 3
Regressions ❌
(secondary)
0.3% [0.1%, 1.3%] 13
Improvements ✅
(primary)
-0.5% [-2.3%, -0.0%] 102
Improvements ✅
(secondary)
-0.4% [-0.7%, -0.0%] 17
All ❌✅ (primary) -0.5% [-2.3%, 0.4%] 105

Bootstrap: 629.909s -> 631.77s (0.30%)
Artifact size: 316.39 MiB -> 316.73 MiB (0.11%)

@dingxiangfei2009
Copy link
Contributor Author

dingxiangfei2009 commented Sep 3, 2023

I also owe here an explanation about a change in codegen test slice-as_chunks.rs, which expects lshr exact or udiv exact call in the LLVM IR.

It turns out that because the emitted instructions are re-arranged, leading to the optimizer to recognize the udiv exact .. 4 instruction as a target for common expression elimination with the first udiv ... 4. When InstCombine is run, udiv instructions are rewritten with bit operations and a lshr, but unfortunately not a lshr exact.

The IR related to this issue is attached here.

@nnethercote
Copy link
Contributor

Performance-wise, icounts/cycles/wall-times have a mix of wins and losses that more or less balance out. But binary size has lots of wins, which is nice.

@pnkfelix
Copy link
Contributor

pnkfelix commented Sep 6, 2023

Performance-wise, icounts/cycles/wall-times have a mix of wins and losses that more or less balance out. But binary size has lots of wins, which is nice.

@rustbot label: +perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team

Projects

None yet

Development

Successfully merging this pull request may close these issues.