Cranelift: fix branch-of-icmp/fcmp regression: look through uextend.#5487
Merged
cfallin merged 1 commit intobytecodealliance:mainfrom Dec 22, 2022
Merged
Conversation
elliottt
approved these changes
Dec 22, 2022
Member
elliottt
left a comment
There was a problem hiding this comment.
This fix looks good to me! It's unfortunate that we can't implement maybe_uextend in ISLE currently.
Separately, it seems very reasonable to switch to having comparisons produce an i32 result, given that our primary consumer is wasm. As you mentioned, we can avoid materializing it in many cases, and it would cut out some unnecessary uextend instructions when we're compiling wasm.
In bytecodealliance#5031, we removed `bool` types from CLIF, using integers instead for "truthy" values. This greatly simplified the IR, and was generally an improvement. However, because x86's `SETcc` instruction sets only the low 8 bits of a register, we chose to use `i8` types as the result of `icmp` and `fcmp`, to avoid the need for a masking operation when materializing the result. Unfortunately this means that uses of truthy values often now have `uextend` operations, especially when coming from Wasm (where truthy values are naturally `i32`-typed). For example, where we previously had `(brz (icmp ...))`, we now have `(brz (uextend (icmp ...)))`. It's arguable whether or not we should switch to `i32` truthy values -- in most cases we can avoid materializing a value that's immediately used for a branch or select, so a mask would in most cases be unnecessary, and it would be a win at the IR level -- but irrespective of that, this change *did* regress our generated code quality: our backends had patterns for e.g. `(brz (icmp ...))` but not with the `uextend`, so we were *always* materializing truthy values. Many blocks thus ended with "cmp; setcc; cmp; test; branch" rather than "cmp; branch". In bytecodealliance#5391 we noticed this and fixed it on x64, but it was a general problem on aarch64 and riscv64 as well. This PR introduces a `maybe_uextend` extractor that "looks through" uextends, and uses it where we consume truthy values, thus fixing the regression. This PR also adds compile filetests to ensure we don't regress again. The riscv64 backend has not been updated here because doing so appears to trigger another issue in its branch handling; fixing that is TBD.
4622d4c to
6cde468
Compare
Member
Author
|
Reverted the riscv64 changes as there seems to be a deeper issue uncovered (branch in the middle of a basic block?). Someone can look at that separately but for now I'll merge the remainder of the PR so we have the aarch64 fixes in. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #5031, we removed
booltypes from CLIF, using integers instead for "truthy" values. This greatly simplified the IR, and was generally an improvement.However, because x86's
SETccinstruction sets only the low 8 bits of a register, we chose to usei8types as the result oficmpandfcmp, to avoid the need for a masking operation when materializing the result.Unfortunately this means that uses of truthy values often now have
uextendoperations, especially when coming from Wasm (where truthy values are naturallyi32-typed). For example, where we previously had(brz (icmp ...)), we now have(brz (uextend (icmp ...))).It's arguable whether or not we should switch to
i32truthy values -- in most cases we can avoid materializing a value that's immediately used for a branch or select, so a mask would in most cases be unnecessary, and it would be a win at the IR level -- but irrespective of that, this change did regress our generated code quality: our backends had patterns for e.g.(brz (icmp ...))but not with theuextend, so we were always materializing truthy values. Many blocks thus ended with "cmp; setcc; cmp; test; branch" rather than "cmp; branch".In #5391 we noticed this and fixed it on x64, but it was a general problem on aarch64 and riscv64 as well. This PR introduces a
maybe_uextendextractor that "looks through" uextends, and uses it where we consume truthy values, thus fixing the regression. This PR also adds compile filetests to ensure we don't regress again.