perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843
perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843TKanX wants to merge 1 commit intorust-lang:mainfrom
size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler |
|
r? codegen |
This comment has been minimized.
This comment has been minimized.
|
Reminder, once the PR becomes ready for a review, use |
a9ec27f to
8339cfe
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@rustbot ready |
| // Alignment rounding can only increase the size, never decrease it: | ||
| // `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the | ||
| // addition above, LLVM can therefore deduce | ||
| // `full_size >= unrounded_size >= offset`, which proves `full_size > 0` | ||
| // for types with a non-zero-sized prefix (#152788). | ||
| let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size); | ||
| bx.assume(size_ge); |
There was a problem hiding this comment.
Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...
There was a problem hiding this comment.
I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.
There was a problem hiding this comment.
Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...
Tried nuw-only (unchecked_uadd) first. That gives LLVM unrounded >= offset > 0 but it stops at the rounding — LLVM can't prove (x + a-1) & -a >= x. Also checked whether feeding ctpop(align) == 1 would help, but there's no fold for "round-up is monotonic when alignment is pow2" in InstCombine/ValueTracking. So the assume tells LLVM the conclusion directly.
nsw (making it unchecked_suadd) is because unrounded ≤ rounded ≤ isize::MAX. Same reasoning as your #152867.
I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.
Sorry about the OP — English isn't my native language, I overwrite when trying to be precise. Will clean it up.
For the tests: CHECK-NOT: icmp broke because assume itself emits an icmp. The !range checks on the first two functions were dropped because the assume keeps the size computation alive, so there's now a size load before the alignment load — FileCheck hits the wrong one. Range metadata is still verified in align_load_from_align_of_val below. RANGE_META → ALIGN_RANGE since it only covers alignment loads now. Range value {1, 0} → {1, 0x20000001} is Align::max_for_target (same change as #152929).
Happy to close this if you'd rather land it as part of #152867.
There was a problem hiding this comment.
Landing this separately is great -- I opened the issue because this particular bit about what LLVM can prove is different enough from the point of layout_of_val that it's better to have the changes separated. (That's why I pulled out #152929 too 🙂 )
There was a problem hiding this comment.
Hmm, yeah, I experimented a bit https://llvm.godbolt.org/z/haGYz7aax and even getting lots of annotations on everything and assume it's still not able to understand what's happening properly.
(Also it's so annoying to see add nsw i64 %4, -1 since that used to be sub nuw nsw i64 %4, 1 but LLVM just insists on throwing that information away.)
|
r? scottmcm |
| // CHECK: load [[USIZE:i[0-9]+]], {{.+}} !range [[RANGE_META:![0-9]+]] | ||
| // CHECK: load [[USIZE:i[0-9]+]] | ||
| // CHECK-NOT: llvm.umax | ||
| // CHECK-NOT: icmp | ||
| // CHECK-NOT: select | ||
| // CHECK: ret |
There was a problem hiding this comment.
So the problem here is that if this was testing for "not icmp", just removing that check means this test is (potentially) no longer testing what it was trying to test before.
If there's an icmp now, probably what you want instead is something like
// CHECK-NOT: llvm.umax
// CHECK-NOT: icmp
// CHECK-NOT: select
// CHECK: [[DOES_NOT_SHRINK:%.+]] = icmp ... something here ...
// CHECK-NEXT: call void @llvm.assume(i1 [[DOES_NOT_SHRINK]])
// CHECK-NOT: llvm.umax
// CHECK-NOT: icmp
// CHECK-NOT: select
so that the test is that the only icmp is the expected one that's used for the assume.
Similarly, why remove the !range check? It's not being optimized out, is it? (If it is, that's also interesting.)
There was a problem hiding this comment.
Checked the emitted IR — the assume (and the entire size computation) gets DCE'd in these two functions at -O3, since they only need alignment for the field projection. So there's no extra icmp at all, and the alignment load is still the first one with !range. Restored the original patterns as-is; the file is now unchanged from main.
There was a problem hiding this comment.
This file is no longer unchanged, so this comment applies again.
@rustbot author
There was a problem hiding this comment.
currently failing on LLVM20, i added min-llvm-version: 21 thinking the ret i1 false fold would work there, but that was just an assumption. on LLVM20 the add nuw nsw + assume(icmp uge ...) survive in IR and can be checked directly. should i rewrite the test to check the emission side instead?
There was a problem hiding this comment.
I would suggest you survey what testing exists for it and what the intent of the various tests are.
In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well.
On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs.
|
Hmm, why would aarch64 do anything different here? The codegen-llvm tests are running only the middle-end of llvm, not the backend, so it shouldn't matter... |
This comment has been minimized.
This comment has been minimized.
You're right — the architecture has nothing to do with it. I was testing locally against LLVM 22, which DCEs the assume entirely. I verified this just now: same unoptimized IR through |
|
Finished benchmarking commit (3cf407e): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 483.037s -> 479.541s (-0.72%) |
|
Also, please cleanup the history here once you've addressed the comment above. Squashing all into one commit is fine, as is separating out into 2 meaningful ones if you prefer that, but we don't need 6 commits going back and forth. |
45b1d74 to
5423cd4
Compare
|
@rustbot ready |
This comment has been minimized.
This comment has been minimized.
5423cd4 to
a184e09
Compare
This comment has been minimized.
This comment has been minimized.
a184e09 to
c45ca83
Compare
This comment has been minimized.
This comment has been minimized.
c45ca83 to
7f42ac4
Compare
This comment has been minimized.
This comment has been minimized.
|
I would suggest you survey what testing exists for it and what the intent of the various tests are. In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well. On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs. Another thing you could try would be whether |
7f42ac4 to
e5e8c07
Compare
…assume Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>
e5e8c07 to
5eec5e3
Compare
View all comments
Summary:
Problem:
size_of_val(p) == 0fails to optimize away for DST types that have a statically-known non-zero-sized prefix:Foohas a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters becauseBox<dyn T>drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.The slice tail variant
Foo<[i32]>already optimized correctly;Foo<dyn Trait>andFoo<[u8]>did not.Root Cause:
In
size_and_align_of_dst(the ADT/Tuple branch), the size computation is:LLVM cannot prove
full_size > 0because:offset + unsized_sizeused plainadd— no overflow flags, so LLVM cannot conclude the result is ≥offset.(x + addend) & -align— LLVM has no fold to prove that alignment rounding never reduces the value belowx.Solution:
Two changes:
add nuw nswonoffset + unsized_size— the sum is bounded by the rounded size ≤isize::MAX, so neither signed nor unsigned overflow is possible. Tells LLVM:unrounded_size ≥ offset.assume(full_size ≥ unrounded_size)—round_up(x, a) ≥ xis a mathematical identity for power-of-twoa. Tells LLVM:full_size ≥ unrounded_size ≥ offset. Ifoffset > 0, the chain provesfull_size > 0.LLVM IR Comparison:
Foo<dyn Debug>— before (godbolt):Foo<dyn Debug>— after:Foo<[u8]>— before:Foo<[u8]>— after:Changes:
compiler/rustc_codegen_ssa/src/size_of_val.rs:add→unchecked_suadd(NUW+NSW) onoffset + unsized_size; addassume(full_size ≥ unrounded_size).tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifyingsize_of_val == 0folds toret i1 falseforFoo<dyn Debug>,Foo<[u8]>, andFoo<[i32]>.Fixes #152788.