Skip to content

use BMI2 instruction pdep for faster access to nth bit of SmallBitSet#17

Merged
matthias314 merged 2 commits intomatthias314:masterfrom
GregPlowman:GregPlowman/pdep
Apr 5, 2026
Merged

use BMI2 instruction pdep for faster access to nth bit of SmallBitSet#17
matthias314 merged 2 commits intomatthias314:masterfrom
GregPlowman:GregPlowman/pdep

Conversation

@GregPlowman
Copy link
Copy Markdown
Contributor

Implement Random.nth using pdep when available to speed up random lookup into SmallBitSet.
Speeds up rand(s::SmallBitSet)

Some notes:
pdep could also be used to implement getindex for SmallBitSet. Not sure if that's useful for a set, unless it's treated as an ordered set I guess.

I originally tried to place Random.nth method into smallbitset.jl just after Random.rand definitions, as this seems the logical place for it.
However, I couldn't really resolve the circular dependencies cleanly.
I tried guarding the method definition with something like:

    if HAS_PEXT     # maybe HAS_BMI2 is a better name for this const?
    if Sys.ARCH in (:x86_64, :i686) && cpufeature(:BMI2)
    if isdefined(SmallCollections, :pdep)

but these are not yet defined because smallbitsets.jl is executed before arch/x86.jl
I tried rearranging the execution order but it just created a different set of circular dependencies.

I suspect there might be better ways to implement. I'm happy to take some guidance here, or drop the PR entirely.

Here's some benchmarking.

using SmallCollections, Random, BenchmarkTools
s = SmallBitSet([1, 5, 56])
rng = Xoshiro()
rand(rng, s)
@benchmark rand($rng, $s)

Without pdep (before PR):

BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  10.400 ns … 21.200 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.800 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.917 ns ±  0.967 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

With pdep (after PR):

BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  5.300 ns … 19.200 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.400 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.461 ns ±  0.673 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Implement Random.nth using pdep when available to speed up random lookup into SmallBitSet
@matthias314
Copy link
Copy Markdown
Owner

That's a good idea!

I've noticed that Julia 1.13 has an Iterators.nth function, which might be a better place for this. I actually think that Iterators.nth should be used in Random, see JuliaLang/julia#61486. Let's see what comes out of that PR. If it is accepted, then we could focus on Iterators.nth for Julia 1.13 (which will be released soon, I guess) and higher.

adienes pushed a commit to JuliaLang/julia that referenced this pull request Apr 3, 2026
`Iterators.nth` has been added in #56580 for Julia 1.13. Currently
`Random` has its own (non-public) `nth` function, which is used for
`AbstractDict` and `AbstractSet`. This PR replaces it by
`Iterators.nth`. Apart from removing redundant code, this also allows
user-defined methods for `Iterators.nth` (which is public) to be used in
`rand`.

Note that I call `Iterators.nth` with `@inbounds`. This doesn't make any
difference for the default method (at least not for `AbstractDict` and
`AbstractSet`), but it may do so for user-defined methods. I'm having
`SmallBitSet` from my package SmallCollections.jl in mind, where `nth`
is just some bit juggling, see matthias314/SmallCollections.jl#17. When
filling a vector with random elements of a `SmallBitSet`, around 20% of
the runtime would be spent on checking bounds. Of course, I can remove
the `@inbounds` if you don't like it.

I've also dropped the `::eltype(iter)` return type indicator of
`Random.nth`. I've tested `Set{T}` where `T` is a `Union` with up to 6
elements, and also `Generator`. In all cases, `@code_typed` could
predict the return type.
@matthias314
Copy link
Copy Markdown
Owner

That was quick: rand now uses Iterators.nth in Julia master. I've renamed your function accordingly, moved it to the bottom of the file and added a test.

pdep could also be used to implement getindex for SmallBitSet

I prefer not to do it as it's not an Array.

maybe HAS_BMI2 is a better name for this const?

Could be, but I don't want to change it at the moment.

I suspect there might be better ways to implement

Your function looks totally fine to me. I've just added the @boundscheck that was not necessary for Random.nth.

some benchmarking

For a SmallBitSet with many elements the difference is even more striking!

I would say that the PR is now ready for merging. Do you agree?

From Julia 1.14 on we will see the effect also in rand. I think it's not worthwhile to mess around with internal Julia functions just for one Julia release.

@GregPlowman
Copy link
Copy Markdown
Contributor Author

GregPlowman commented Apr 5, 2026

OK thanks for the quick update.

I agree, defining Iterators.nth is preferable.

If I understand correctly, Iterators.nth will be implemented from Julia 1.13 and rand will use Iterators.nth from Julia v1.14.

I think it's not worthwhile to mess around with internal Julia functions just for one Julia release.

Of course that's your decision and I'll leave that up to you. However, I feel it's a shame to have wait until v1.14 and LTS (v1.10) misses out on the benefit.

How would you feel about adding Random.nth only for earlier versions of Julia? Presumably there's less chance for internal Julia functions to change in older released versions?

if VERSION < v"1.14-" && HAS_PEXT
	function Random.nth(s::SmallBitSet{U}, n::Integer) where U <: Union{UInt8,UInt16,UInt32,UInt}
	    b = pdep(unsafe_shl(one(U), n-1), bits(s))
	    return trailing_zeros(b) + 1
	end
end

@matthias314
Copy link
Copy Markdown
Owner

Iterators.nth will be implemented from Julia 1.13 and rand will use Iterators.nth from Julia v1.14

Correct, unless someone manages to convince the Julia maintainers to backport the change to Julia 1.13.

it's a shame to have wait until v1.14 and LTS (v1.10) misses out on the benefit.

I'm not really worried about the performance for LTS. As long as everything works, I'm fine. Julia 1.13 might be out earlier than the next release of SmallCollections.jl. Then users who cannot wait for Julia 1.14 (and don't want to use Julia nightly) can themselves add

Random.nth(s::SmallBitSet, n::Integer) = @inbounds Iterators.nth(s, n)

to their code.

code rearranged, new file src/arch/x86_init.jl created, test added
@matthias314 matthias314 merged commit 84205d0 into matthias314:master Apr 5, 2026
@matthias314
Copy link
Copy Markdown
Owner

I've rearranged the code by splitting the architecture-dependent file src/arch/x86.jl into two. I think that's conceptually better because now the new function can go to src/smallbitset.jl, where it belongs.

Thanks a lot for the contribution!

@GregPlowman
Copy link
Copy Markdown
Contributor Author

Excellent!
I like the splitting and reorganisation.

Thanks a lot for the contribution!

Ha, ha. You did most of the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants