-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Optimize checked_ilog and pow when base is a power of two
#147250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize checked_ilog and pow when base is a power of two
#147250
Conversation
8f19ce6 to
e5ca8cd
Compare
This comment has been minimized.
This comment has been minimized.
|
does this affect the codegen in practical circumstances? I would expect even a fairly weak optimizer to perform this optimization, making us hand-coding it irrelevant and potentially even harmful because we introduce more conditional logic to chew through. |
checked_ilog when base is a power of twochecked_ilog and pow when base is a power of two
It replaces a loop by some bit manipulations. Whether anyone actually calls either function with a compile-time known, power of 2 base is another question. But it can't hurt, since it is guarded by
No, in either function, LLVM is not able to see that the loop is performing a log/pow and apply the identities: https://godbolt.org/z/6vMsxc9Kh
They're guarded by |
2faa397 to
abb7f32
Compare
...then I'm kinda surprised! Nice catch. |
Would be good to have codegen tests to demonstrate what this is doing -- especially since that way there's a way for people to try removing the special cases later if they think that LLVM no longer needs them. |
abb7f32 to
9ab0f63
Compare
9ab0f63 to
1d0ac82
Compare
|
thanks for the codegen tests! |
1d0ac82 to
c84b99e
Compare
|
@rustbot ready |
|
@scottmcm ping? |
| #[no_mangle] | ||
| pub fn checked_ilog16(val: u32) -> Option<u32> { | ||
| // CHECK: %[[ICMP:.+]] = icmp ne i32 %val, 0 | ||
| // CHECK: %[[CTZ:.+]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 %val, i1 true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really this PR's problem, but consider filing an LLVM bug (assuming it's also true in trunk) that this range is wider than necessary -- we're passing true for is_zero_poison https://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic so the range here should be range(i32 0, 32).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range attribute is too large, but it seems LLVM is still able to propagate the knowledge:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is looking good, but can you make sure there's normal runtime behaviour tests for it too? Notably, after the base conversation (that's fixed in the code) it made me think that that's not currently covered by any tests, so we should have something -- maybe just add to the # Examples that it returns None for base zero or one? (They seem like perfectly reasonable and helpful examples, in addition to giving coverage for this stuff.)
And maybe add some should_panic tests to ensure that the overflow checking is still correct for pow when overflow checks are enabled?
|
@scottmcm ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have concerns about the correctness of this; see above.
|
Reminder, once the PR becomes ready for a review, use |
53449c9 to
a3a4b22
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
be3ec71 to
5251791
Compare
This comment has been minimized.
This comment has been minimized.
e8a603e to
c2f36c5
Compare
This comment has been minimized.
This comment has been minimized.
if base == 2 ** k, then log(base, n) == log(2, n) / k
c2f36c5 to
2f902f7
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
Increase test coverage to check all interesting edge cases and all variants.
`strict_pow` can be implemented in terms of `checked_pow`, `wrapping_pow` can be implemented in terms of `overflowing_pow`, and `pow` can be implemented in terms of `strict_pow` or `wrapping_pow`.
Copy the optimization that unrolls the loop from `pow` to `checked_pow` and `overflowing_pow`.
2f902f7 to
659e612
Compare
if base == 2 ** k, then (2 ** k) ** n == 2 ** (k * n) == 1 << (k * n)
659e612 to
b1b3348
Compare
|
@rustbot ready |
| if intrinsics::is_val_statically_known(exp) { | ||
| while exp > 1 { | ||
| if (exp & 1) == 1 { | ||
| (acc, tmp_overflow) = acc.overflowing_mul(base); | ||
| overflow |= tmp_overflow; | ||
| } | ||
| exp /= 2; | ||
| (base, tmp_overflow) = base.overflowing_mul(base); | ||
| overflow |= tmp_overflow; | ||
| } | ||
|
|
||
| // since exp!=0, finally the exp must be 1. | ||
| // Deal with the final bit of the exponent separately, since | ||
| // squaring the base afterwards is not necessary and may cause a | ||
| // needless overflow. | ||
| (acc, tmp_overflow) = acc.overflowing_mul(base); | ||
| overflow |= tmp_overflow; | ||
| return (acc, overflow); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, here's an idea: If either input is statically known, checking for overflow could be folded into a single range check on the other input. Doing so would allow all of the variants to delegate to wrapping_pow.
Quick test that this would help:
https://rust.godbolt.org/z/oEWWGx4f9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That should be easy enough when base is known: overflow = exp > $T::MAX.ilog(base). If the exp is known, that would be harder: overflow = base > $T::MAX.nth_root(exp), but we don't have a way of calculating the nth root of an integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any exp >= T::BITS, the condition is just base > 1, which means that a pre-computed array of length T::BITS would be sufficient. Computing them at compile time should be reasonable, but it could be too much additional complexity to this PR.
You could maybe factor out a helper like
const fn statically_cheap_overflow_condition(base, exp) -> Option<bool> {
if intrinsics::is_val_statically_known(base) {
Some(exp > u64::MAX.ilog(base))
} else {
None
}
}The known exp case would be easy to add later.
Callsites can then do
if let Some(overflow) = statically_cheap_overflow_condition(base, exp) {
// delegate to wrapping_pow where necessary, as overflow is already known
} else {
// the usual runtime thing
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, I suppose it's really the known exponent cases that would benefit the most by removing the need for having the LLVM-unrollable blocks duplicated for each variant.
In case it's of use, here's the code I tested with: https://godbolt.org/z/5svnKM7hM
(Includes computing the roots for unsigned types. Signed would have some additional complications)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is getting complicated enough already. Let's get this merged first, then we can further optimise in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is getting complicated enough already. Let's get this merged first, then we can further optimise in a follow up PR
Optimize
checked_ilogandpowwhen the base is a power of two