perf(codegen): Eliminate `size_of_val == 0` for DSTs with Non-zero-sized Prefix via NUW and Assume by TKanX · Pull Request #152843 · rust-lang/rust

TKanX · 2026-02-19T11:40:00Z

Summary:

Problem:

size_of_val(p) == 0 fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}

Foo has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because Box<dyn T> drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant Foo<[i32]> already optimized correctly; Foo<dyn Trait> and Foo<[u8]> did not.

Root Cause:

In size_and_align_of_dst (the ADT/Tuple branch), the size computation is:

full_size = (offset + unsized_size + (align-1)) & -align

LLVM cannot prove full_size > 0 because:

offset + unsized_size used plain add — no overflow flags, so LLVM cannot conclude the result is ≥ offset.
(x + addend) & -align — LLVM has no fold to prove that alignment rounding never reduces the value below x.

Solution:

Two changes:

add nuw nsw on offset + unsized_size — the sum is bounded by the rounded size ≤ isize::MAX, so neither signed nor unsigned overflow is possible. Tells LLVM: unrounded_size ≥ offset.
assume(full_size ≥ unrounded_size) — round_up(x, a) ≥ x is a mathematical identity for power-of-two a. Tells LLVM: full_size ≥ unrounded_size ≥ offset. If offset > 0, the chain proves full_size > 0.

LLVM IR Comparison:

Foo<dyn Debug> — before (godbolt):

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}

Foo<dyn Debug> — after:

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}

Foo<[u8]> — before:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}

Foo<[u8]> — after:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}

Changes:

compiler/rustc_codegen_ssa/src/size_of_val.rs: add → unchecked_suadd (NUW+NSW) on offset + unsized_size; add assume(full_size ≥ unrounded_size).
tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifying size_of_val == 0 folds to ret i1 false for Foo<dyn Debug>, Foo<[u8]>, and Foo<[i32]>.

Fixes #152788.

TKanX · 2026-02-20T19:33:41Z

@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler

fmease · 2026-02-21T22:14:41Z

r? codegen

compiler/rustc_codegen_ssa/src/size_of_val.rs

rustbot · 2026-02-22T00:16:08Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot · 2026-02-22T05:32:48Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

TKanX · 2026-02-22T05:34:22Z

@rustbot ready

scottmcm · 2026-02-22T19:00:51Z

compiler/rustc_codegen_ssa/src/size_of_val.rs

+            // Alignment rounding can only increase the size, never decrease it:
+            // `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the
+            // addition above, LLVM can therefore deduce
+            // `full_size >= unrounded_size >= offset`, which proves `full_size > 0`
+            // for types with a non-zero-sized prefix (#152788).
+            let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size);
+            bx.assume(size_ge);


Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

@scottmcm

Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

Tried nuw-only (unchecked_uadd) first. That gives LLVM unrounded >= offset > 0 but it stops at the rounding — LLVM can't prove (x + a-1) & -a >= x. Also checked whether feeding ctpop(align) == 1 would help, but there's no fold for "round-up is monotonic when alignment is pow2" in InstCombine/ValueTracking. So the assume tells LLVM the conclusion directly.

nsw (making it unchecked_suadd) is because unrounded ≤ rounded ≤ isize::MAX. Same reasoning as your #152867.

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

Sorry about the OP — English isn't my native language, I overwrite when trying to be precise. Will clean it up.

For the tests: CHECK-NOT: icmp broke because assume itself emits an icmp. The !range checks on the first two functions were dropped because the assume keeps the size computation alive, so there's now a size load before the alignment load — FileCheck hits the wrong one. Range metadata is still verified in align_load_from_align_of_val below. RANGE_META → ALIGN_RANGE since it only covers alignment loads now. Range value {1, 0} → {1, 0x20000001} is Align::max_for_target (same change as #152929).

Happy to close this if you'd rather land it as part of #152867.

Landing this separately is great -- I opened the issue because this particular bit about what LLVM can prove is different enough from the point of layout_of_val that it's better to have the changes separated. (That's why I pulled out #152929 too 🙂 )

Hmm, yeah, I experimented a bit https://llvm.godbolt.org/z/haGYz7aax and even getting lots of annotations on everything and assume it's still not able to understand what's happening properly.

(Also it's so annoying to see add nsw i64 %4, -1 since that used to be sub nuw nsw i64 %4, 1 but LLVM just insists on throwing that information away.)

dianqk · 2026-02-22T19:06:58Z

r? scottmcm

scottmcm · 2026-02-22T21:02:43Z

tests/codegen-llvm/dst-vtable-align-nonzero.rs

-    // CHECK: load [[USIZE:i[0-9]+]], {{.+}} !range [[RANGE_META:![0-9]+]]
+    // CHECK: load [[USIZE:i[0-9]+]]
    // CHECK-NOT: llvm.umax
-    // CHECK-NOT: icmp
    // CHECK-NOT: select
    // CHECK: ret


So the problem here is that if this was testing for "not icmp", just removing that check means this test is (potentially) no longer testing what it was trying to test before.

If there's an icmp now, probably what you want instead is something like

// CHECK-NOT: llvm.umax // CHECK-NOT: icmp // CHECK-NOT: select // CHECK: [[DOES_NOT_SHRINK:%.+]] = icmp ... something here ... // CHECK-NEXT: call void @llvm.assume(i1 [[DOES_NOT_SHRINK]]) // CHECK-NOT: llvm.umax // CHECK-NOT: icmp // CHECK-NOT: select

so that the test is that the only icmp is the expected one that's used for the assume.

Similarly, why remove the !range check? It's not being optimized out, is it? (If it is, that's also interesting.)

Checked the emitted IR — the assume (and the entire size computation) gets DCE'd in these two functions at -O3, since they only need alignment for the field projection. So there's no extra icmp at all, and the alignment load is still the first one with !range. Restored the original patterns as-is; the file is now unchanged from main.

This file is no longer unchanged, so this comment applies again.

@rustbot author

currently failing on LLVM20, i added min-llvm-version: 21 thinking the ret i1 false fold would work there, but that was just an assumption. on LLVM20 the add nuw nsw + assume(icmp uge ...) survive in IR and can be checked directly. should i rewrite the test to check the emission side instead?

@scottmcm

I would suggest you survey what testing exists for it and what the intent of the various tests are.

In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well.

On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs.

@scottmcm similar to your suggestion: 5eec5e3

tests/codegen-llvm/dst-vtable-align-nonzero.rs

scottmcm · 2026-02-22T23:50:53Z

Hmm, why would aarch64 do anything different here? The codegen-llvm tests are running only the middle-end of llvm, not the backend, so it shouldn't matter...

rust-bors · 2026-02-23T00:10:21Z

☀️ Try build successful (CI)
Build commit: 3cf407e (3cf407e8b04f1a796bf7b9360afd7972896f340d, parent: 1500f0f47f5fe8ddcd6528f6c6c031b210b4eac5)

TKanX · 2026-02-23T00:14:10Z

Hmm, why would aarch64 do anything different here? The codegen-llvm tests are running only the middle-end of llvm, not the backend, so it shouldn't matter...

You're right — the architecture has nothing to do with it. I was testing locally against LLVM 22, which DCEs the assume entirely. I verified this just now: same unoptimized IR through opt -O3 with both target triples produces identical output on LLVM 22. The x86_64-gnu-llvm-20 job was cancelled (not passed), so it would have failed the same way.

@scottmcm

rust-timer · 2026-02-23T00:50:11Z

Finished benchmarking commit (3cf407e): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.1%	[2.1%, 2.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-2.5%, -1.9%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.8%	[-2.5%, 2.1%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 483.037s -> 479.541s (-0.72%)
Artifact size: 397.95 MiB -> 397.91 MiB (-0.01%)

scottmcm · 2026-03-03T20:43:40Z

Also, please cleanup the history here once you've addressed the comment above. Squashing all into one commit is fine, as is separating out into 2 meaningful ones if you prefer that, but we don't need 6 commits going back and forth.

TKanX · 2026-03-04T01:49:29Z

@rustbot ready

scottmcm · 2026-03-05T06:46:11Z

I would suggest you survey what testing exists for it and what the intent of the various tests are.

In general, it's fine to limit desirable optimization tests to latest LLVM only, since that's what we ship and if people are using older LLVM then it's at least someone expected that things will optimize less well.

On the other hand, if it's "we're testing what rustc is doing" tests, then those should generally continue to pass on older LLVM because we don't want rustc to break on older LLVMs.

Another thing you could try would be whether -C opt-level=1 still is sufficient to meet the goals of the tests in question but because they try less they might be more consistent between versions. I don't know.

…assume Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

rustbot assigned fmease Feb 19, 2026

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 19, 2026

This comment has been minimized.

Sign in to view

rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 20, 2026

rustbot assigned dianqk and unassigned fmease Feb 21, 2026

This comment has been minimized.

Sign in to view

scottmcm requested changes Feb 22, 2026

View reviewed changes

compiler/rustc_codegen_ssa/src/size_of_val.rs Outdated Show resolved Hide resolved

compiler/rustc_codegen_ssa/src/size_of_val.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a9ec27f to 8339cfe Compare February 22, 2026 05:32

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026

TKanX requested a review from scottmcm February 22, 2026 05:34

scottmcm reviewed Feb 22, 2026

View reviewed changes

rustbot assigned scottmcm and unassigned dianqk Feb 22, 2026

scottmcm reviewed Feb 22, 2026

View reviewed changes

tests/codegen-llvm/dst-vtable-align-nonzero.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2026

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 3, 2026

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from 45b1d74 to 5423cd4 Compare March 4, 2026 01:47

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 4, 2026

This comment has been minimized.

Sign in to view

TKanX marked this pull request as draft March 4, 2026 02:41

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 4, 2026

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from 5423cd4 to a184e09 Compare March 4, 2026 03:13

This comment has been minimized.

Sign in to view

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a184e09 to c45ca83 Compare March 4, 2026 04:23

This comment has been minimized.

Sign in to view

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from c45ca83 to 7f42ac4 Compare March 4, 2026 07:51

This comment has been minimized.

Sign in to view

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from 7f42ac4 to e5e8c07 Compare March 7, 2026 09:32

perf(codegen): Eliminate size_of_val == 0 for non-ZST DSTs via nuw+…

5eec5e3

…assume Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

TKanX marked this pull request as ready for review March 10, 2026 05:14

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 10, 2026

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from e5e8c07 to 5eec5e3 Compare March 10, 2026 05:16

Uh oh!

Conversation

TKanX commented Feb 19, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Problem:

Root Cause:

Solution:

LLVM IR Comparison:

Changes:

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

TKanX commented Feb 20, 2026

Uh oh!

fmease commented Feb 21, 2026

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

rustbot commented Feb 22, 2026

Uh oh!

rustbot commented Feb 22, 2026

Uh oh!

TKanX commented Feb 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dianqk commented Feb 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scottmcm commented Feb 22, 2026

Uh oh!

rust-bors bot commented Feb 23, 2026

Uh oh!

This comment has been minimized.

TKanX commented Feb 23, 2026

Uh oh!

rust-timer commented Feb 23, 2026

Overall result: no relevant changes - no action needed

Uh oh!

scottmcm commented Mar 3, 2026

Uh oh!

TKanX commented Mar 4, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

scottmcm commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

TKanX commented Feb 19, 2026 •

edited by rustbot

Loading