Skip to content

Make bitset would_modify_words more vectorizer-friendly#153640

Draft
Zalathar wants to merge 3 commits intorust-lang:mainfrom
Zalathar:subchunk
Draft

Make bitset would_modify_words more vectorizer-friendly#153640
Zalathar wants to merge 3 commits intorust-lang:mainfrom
Zalathar:subchunk

Conversation

@Zalathar
Copy link
Member

Currently this function compares a single pair of u64 at a time, which is potentially slower than comparing multiple words before each early-exit check, especially for the large chunks used by ChunkedBitSet.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2026
@Zalathar
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Mar 10, 2026
Make bitset `would_modify_words` more vectorizer-friendly
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 10, 2026

☀️ Try build successful (CI)
Build commit: af612eb (af612eb844c6b669b58fe4697341f163b33d231b, parent: 2d76d9bc76f27b03b4899e72ce561c7ac2c5cf6b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (af612eb): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-1.8%, -0.7%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.2% [-1.8%, -0.7%] 4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary -2.3%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
-2.3% [-2.3%, -2.3%] 1
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) -2.3% [-2.3%, -2.3%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 479.112s -> 477.505s (-0.34%)
Artifact size: 395.06 MiB -> 396.97 MiB (0.48%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
@Zalathar
Copy link
Member Author

Let's see what happens if we double the subchunk length from 32 bytes (4 words) to 64 bytes (8 words).

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
rust-bors bot pushed a commit that referenced this pull request Mar 10, 2026
Make bitset `would_modify_words` more vectorizer-friendly
@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 11, 2026

☀️ Try build successful (CI)
Build commit: b3e83b4 (b3e83b46f88ed6694cef6eadfcffb7f5b6d8e9d7, parent: 0c68443b0a0469e4211acca7e7b06e14f256ada8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b3e83b4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-2.3%, -0.8%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-2.3%, -0.8%] 4

Max RSS (memory usage)

Results (secondary 7.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
7.5% [7.5%, 7.5%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results (secondary -2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.4%, -2.1%] 2
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.034s -> 484.684s (0.97%)
Artifact size: 394.90 MiB -> 396.92 MiB (0.51%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 11, 2026
@Zalathar
Copy link
Member Author

Zalathar commented Mar 11, 2026

I was initially disappointed to see this only affect cranelift-codegen, but I guess that's the only benchmark in our suite where these paths are actually hot.

I wonder what code patterns cause these paths to be relevant.

@lqd
Copy link
Member

lqd commented Mar 11, 2026

Probably its huge functions with a bunch of locals, exercising the move/init dataflow a lot?

@Zalathar
Copy link
Member Author

If it has large functions with thousands of locals, then yeah I can imagine that stressing MixedBitSet in ways that most crates never come close to.

@lqd
Copy link
Member

lqd commented Mar 11, 2026

My recollection is that it has indeed, with the usual suspect of using machine-generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants