use BMI2 instruction pdep for faster access to nth bit of SmallBitSet#17
Conversation
Implement Random.nth using pdep when available to speed up random lookup into SmallBitSet
|
That's a good idea! I've noticed that Julia 1.13 has an |
`Iterators.nth` has been added in #56580 for Julia 1.13. Currently `Random` has its own (non-public) `nth` function, which is used for `AbstractDict` and `AbstractSet`. This PR replaces it by `Iterators.nth`. Apart from removing redundant code, this also allows user-defined methods for `Iterators.nth` (which is public) to be used in `rand`. Note that I call `Iterators.nth` with `@inbounds`. This doesn't make any difference for the default method (at least not for `AbstractDict` and `AbstractSet`), but it may do so for user-defined methods. I'm having `SmallBitSet` from my package SmallCollections.jl in mind, where `nth` is just some bit juggling, see matthias314/SmallCollections.jl#17. When filling a vector with random elements of a `SmallBitSet`, around 20% of the runtime would be spent on checking bounds. Of course, I can remove the `@inbounds` if you don't like it. I've also dropped the `::eltype(iter)` return type indicator of `Random.nth`. I've tested `Set{T}` where `T` is a `Union` with up to 6 elements, and also `Generator`. In all cases, `@code_typed` could predict the return type.
|
That was quick:
I prefer not to do it as it's not an
Could be, but I don't want to change it at the moment.
Your function looks totally fine to me. I've just added the
For a I would say that the PR is now ready for merging. Do you agree? From Julia 1.14 on we will see the effect also in |
25ce49f to
4b7e860
Compare
|
OK thanks for the quick update. I agree, defining If I understand correctly,
Of course that's your decision and I'll leave that up to you. However, I feel it's a shame to have wait until v1.14 and LTS (v1.10) misses out on the benefit. How would you feel about adding |
Correct, unless someone manages to convince the Julia maintainers to backport the change to Julia 1.13.
I'm not really worried about the performance for LTS. As long as everything works, I'm fine. Julia 1.13 might be out earlier than the next release of SmallCollections.jl. Then users who cannot wait for Julia 1.14 (and don't want to use Julia nightly) can themselves add to their code. |
code rearranged, new file src/arch/x86_init.jl created, test added
4b7e860 to
4726057
Compare
|
I've rearranged the code by splitting the architecture-dependent file src/arch/x86.jl into two. I think that's conceptually better because now the new function can go to src/smallbitset.jl, where it belongs. Thanks a lot for the contribution! |
|
Excellent!
Ha, ha. You did most of the work! |
Implement
Random.nthusingpdepwhen available to speed up random lookup intoSmallBitSet.Speeds up
rand(s::SmallBitSet)Some notes:
pdepcould also be used to implementgetindexforSmallBitSet. Not sure if that's useful for a set, unless it's treated as an ordered set I guess.I originally tried to place
Random.nthmethod intosmallbitset.jljust afterRandom.randdefinitions, as this seems the logical place for it.However, I couldn't really resolve the circular dependencies cleanly.
I tried guarding the method definition with something like:
but these are not yet defined because
smallbitsets.jlis executed beforearch/x86.jlI tried rearranging the execution order but it just created a different set of circular dependencies.
I suspect there might be better ways to implement. I'm happy to take some guidance here, or drop the PR entirely.
Here's some benchmarking.
Without
pdep(before PR):With
pdep(after PR):