Make bitset `would_modify_words` more vectorizer-friendly by Zalathar · Pull Request #153640 · rust-lang/rust

Zalathar · 2026-03-10T05:35:06Z

Currently this function compares a single pair of u64 at a time, which is potentially slower than comparing multiple words before each early-exit check, especially for the large chunks used by ChunkedBitSet.

Zalathar · 2026-03-10T05:35:16Z

@bors try @rust-timer queue

Make bitset `would_modify_words` more vectorizer-friendly

rust-bors · 2026-03-10T07:44:44Z

☀️ Try build successful (CI)
Build commit: af612eb (af612eb844c6b669b58fe4697341f163b33d231b, parent: 2d76d9bc76f27b03b4899e72ce561c7ac2c5cf6b)

rust-timer · 2026-03-10T08:26:39Z

Finished benchmarking commit (af612eb): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.2%	[-1.8%, -0.7%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.2%	[-1.8%, -0.7%]	4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary -2.3%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.1%	[3.1%, 3.1%]	1
Improvements ✅ (primary)	-2.3%	[-2.3%, -2.3%]	1
Improvements ✅ (secondary)	-2.4%	[-2.4%, -2.4%]	1
All ❌✅ (primary)	-2.3%	[-2.3%, -2.3%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 479.112s -> 477.505s (-0.34%)
Artifact size: 395.06 MiB -> 396.97 MiB (0.48%)

Zalathar · 2026-03-10T23:30:59Z

Let's see what happens if we double the subchunk length from 32 bytes (4 words) to 64 bytes (8 words).

@bors try @rust-timer queue

Make bitset `would_modify_words` more vectorizer-friendly

rust-bors · 2026-03-11T01:40:06Z

☀️ Try build successful (CI)
Build commit: b3e83b4 (b3e83b46f88ed6694cef6eadfcffb7f5b6d8e9d7, parent: 0c68443b0a0469e4211acca7e7b06e14f256ada8)

rust-timer · 2026-03-11T02:20:20Z

Finished benchmarking commit (b3e83b4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-2.3%, -0.8%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-2.3%, -0.8%]	4

Max RSS (memory usage)

Results (secondary 7.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	7.5%	[7.5%, 7.5%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary -2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.4%, -2.1%]	2
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.034s -> 484.684s (0.97%)
Artifact size: 394.90 MiB -> 396.92 MiB (0.51%)

Zalathar · 2026-03-11T02:53:29Z

I was initially disappointed to see this only affect cranelift-codegen, but I guess that's the only benchmark in our suite where these paths are actually hot.

I wonder what code patterns cause these paths to be relevant.

lqd · 2026-03-11T06:02:35Z

Probably its huge functions with a bunch of locals, exercising the move/init dataflow a lot?

Zalathar · 2026-03-11T06:59:06Z

If it has large functions with thousands of locals, then yeah I can imagine that stressing MixedBitSet in ways that most crates never come close to.

lqd · 2026-03-11T08:32:41Z

My recollection is that it has indeed, with the usual suspect of using machine-generated code.

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2026

This comment has been minimized.

Sign in to view

rust-bors bot pushed a commit that referenced this pull request Mar 10, 2026

Auto merge of #153640 - Zalathar:subchunk, r=<try>

af612eb

Make bitset `would_modify_words` more vectorizer-friendly

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

Zalathar added 2 commits March 11, 2026 10:08

Rename some confusing bitset helper functions

313fe2a

Make bitset would_modify_words more vectorizer-friendly

e8758f5

Zalathar force-pushed the subchunk branch from 46bd3ac to e8758f5 Compare March 10, 2026 23:08

Increase subchunk length to 64 bytes

8be68bf

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

rust-bors bot pushed a commit that referenced this pull request Mar 10, 2026

Auto merge of #153640 - Zalathar:subchunk, r=<try>

b3e83b4

Make bitset `would_modify_words` more vectorizer-friendly

Zalathar mentioned this pull request Mar 10, 2026

Avoid the would-change check for bitset chunks that are unique #153680

Closed

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 11, 2026

Uh oh!

Conversation

Zalathar commented Mar 10, 2026

Uh oh!

Zalathar commented Mar 10, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Mar 10, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 10, 2026

Overall result: ✅ improvements - no action needed

Uh oh!

Zalathar commented Mar 10, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Mar 11, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 11, 2026

Overall result: ✅ improvements - no action needed

Uh oh!

Zalathar commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lqd commented Mar 11, 2026

Uh oh!

Zalathar commented Mar 11, 2026

Uh oh!

lqd commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zalathar commented Mar 11, 2026 •

edited

Loading