Skip to content

Conversation

@quaternic
Copy link
Contributor

@quaternic quaternic commented Dec 4, 2025

Feature gate: #![feature(uint_gather_scatter_bits)]
Tracking issue: #149069
Accepted ACP: rust-lang/libs-team#695

Implements the methods using the parallel suffix strategy mentioned in the ACP discussion. The referenced source material provides C implementations, though this PR makes improvements over those, cutting the instruction count by a third:
https://rust.godbolt.org/z/rn5naYnK4 (this PR)
https://c.godbolt.org/z/WzYd5WbsY (Hacker's delight)

This was initially based on the code for gather_bits that @okaneco provided in rust-lang/libs-team#695 (comment) . I wanted to understand how it worked, and later on noticed some opportunities for improvement, which eventually led to this PR.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 4, 2025
@rust-log-analyzer

This comment has been minimized.

@quaternic quaternic force-pushed the gather-scatter-bits-opt branch from 9549004 to 2c752af Compare December 4, 2025 23:05
@rust-log-analyzer

This comment has been minimized.

@quaternic quaternic force-pushed the gather-scatter-bits-opt branch from 2c752af to ac7e6c6 Compare December 4, 2025 23:49
@quaternic
Copy link
Contributor Author

quaternic commented Dec 5, 2025

Do take this with a grain of salt, since I authored the benchmarks with this in mind. In particular, since the new implementation doesn't have any input-dependent control-flow, it is easily vectorized which all of these benchmarks allow for.

Benchmarked locally on an Intel Core i7 920 @ 2.67GHz (from ~2009)

old (ns/iter) new (ns/iter) old/new
num::int_bits::u8::constant::gather_bits 134 193 0.69
num::int_bits::u8::constant::scatter_bits 133 143 0.93
num::int_bits::u8::invariant::gather_bits 8723 193 45.11
num::int_bits::u8::invariant::scatter_bits 10827 187 57.85
num::int_bits::u8::variable::gather_bits 17938 734 24.43
num::int_bits::u8::variable::scatter_bits 19809 838 23.64
num::int_bits::u16::constant::gather_bits 278 290 0.96
num::int_bits::u16::constant::scatter_bits 279 198 1.41
num::int_bits::u16::invariant::gather_bits 9401 231 40.66
num::int_bits::u16::invariant::scatter_bits 9480 233 40.71
num::int_bits::u16::variable::gather_bits 16398 932 17.59
num::int_bits::u16::variable::scatter_bits 14793 1083 13.66
num::int_bits::u32::constant::gather_bits 528 373 1.42
num::int_bits::u32::constant::scatter_bits 520 301 1.73
num::int_bits::u32::invariant::gather_bits 7699 284 27.11
num::int_bits::u32::invariant::scatter_bits 6670 295 22.6
num::int_bits::u32::variable::gather_bits 9993 1394 7.17
num::int_bits::u32::variable::scatter_bits 9051 1620 5.59
num::int_bits::u64::constant::gather_bits 1008 387 2.6
num::int_bits::u64::constant::scatter_bits 1015 377 2.69
num::int_bits::u64::invariant::gather_bits 7892 347 22.78
num::int_bits::u64::invariant::scatter_bits 6730 362 18.58
num::int_bits::u64::variable::gather_bits 8930 1930 4.63
num::int_bits::u64::variable::scatter_bits 7981 2238 3.57
num::int_bits::u128::constant::gather_bits 15690 721 21.77
num::int_bits::u128::constant::scatter_bits 11374 655 17.35
num::int_bits::u128::invariant::gather_bits 16542 856 19.33
num::int_bits::u128::invariant::scatter_bits 13316 864 15.4
num::int_bits::u128::variable::gather_bits 16984 4403 3.86
num::int_bits::u128::variable::scatter_bits 13721 4674 2.94

@quaternic quaternic marked this pull request as ready for review December 5, 2025 11:08
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 5, 2025
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 5, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Copy link
Contributor

@okaneco okaneco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

I noticed spot-checking some random constant masks that this implementation and the HD implementation flip-flop on which has more instructions, but it's only a small difference count. On simpler masks, they seem to optimize similarly.

The dynamic mask output is a drastic improvement with this by a quarter/third reduction of instructions for gather and scatter.

alive2 showing that this implementation and the current implementation appear to be equivalent functions
gather - https://alive2.llvm.org/ce/z/wKxY2Z
scatter - https://alive2.llvm.org/ce/z/_98DbY
scratchpad for the LLVM IR output - https://rust.godbolt.org/z/brbePx44T

View changes since this review

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Dec 28, 2025

📌 Commit 79d792f has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2025
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Dec 28, 2025
… r=Mark-Simulacrum

Optimized implementation for uN::{gather,scatter}_bits

Feature gate: #![feature(uint_gather_scatter_bits)]
Tracking issue: rust-lang#149069
Accepted ACP: rust-lang/libs-team#695

Implements the methods using the parallel suffix strategy mentioned in the ACP discussion. The referenced source material provides C implementations, though this PR makes improvements over those, cutting the instruction count by a third:
https://rust.godbolt.org/z/rn5naYnK4 (this PR)
https://c.godbolt.org/z/WzYd5WbsY (Hacker's delight)

This was initially based on the code for `gather_bits` that `@okaneco` provided in rust-lang/libs-team#695 (comment) . I wanted to understand how it worked, and later on noticed some opportunities for improvement, which eventually led to this PR.
bors added a commit that referenced this pull request Dec 28, 2025
…uwer

Rollup of 8 pull requests

Successful merges:

 - #148321 (parser/lexer: bump to Unicode 17, use faster unicode-ident)
 - #149540 (std: sys: fs: uefi: Implement readdir)
 - #149582 (Implement `Duration::div_duration_{floor,ceil}`)
 - #149663 (Optimized implementation for uN::{gather,scatter}_bits)
 - #149667 (Fix ICE by rejecting const blocks in patterns during AST lowering (closes #148138))
 - #149947 (add several older crashtests)
 - #150011 (Add more `unbounded_sh[lr]` examples)
 - #150411 (refactor `destructure_const`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 8bd4d04 into rust-lang:main Dec 29, 2025
11 checks passed
@rustbot rustbot added this to the 1.94.0 milestone Dec 29, 2025
rust-timer added a commit that referenced this pull request Dec 29, 2025
Rollup merge of #149663 - quaternic:gather-scatter-bits-opt, r=Mark-Simulacrum

Optimized implementation for uN::{gather,scatter}_bits

Feature gate: #![feature(uint_gather_scatter_bits)]
Tracking issue: #149069
Accepted ACP: rust-lang/libs-team#695

Implements the methods using the parallel suffix strategy mentioned in the ACP discussion. The referenced source material provides C implementations, though this PR makes improvements over those, cutting the instruction count by a third:
https://rust.godbolt.org/z/rn5naYnK4 (this PR)
https://c.godbolt.org/z/WzYd5WbsY (Hacker's delight)

This was initially based on the code for `gather_bits` that ``@okaneco`` provided in rust-lang/libs-team#695 (comment) . I wanted to understand how it worked, and later on noticed some opportunities for improvement, which eventually led to this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants