Skip to content

Decide on new PRNG BitGenerator default #13635

@rkern

Description

@rkern

#13163 will be bringing in the long-awaited replacement of numpy's PRNG infrastructure. In the interest of keeping that PR manageable, we will merge it to master before all of the decisions are finalized, like which BitGenerator will be nominated as the default.

We must make a decision before the first release with the new infrastructure. Once released, we will be stuck with our choice for a while, so we should be sure that we are comfortable with our decision.

On the other hand, the choice of the default does not have that many consequences. We are not talking about the default BitGenerator underlying the numpy.random.* convenience functions. Per NEP 19, these remain aliases to the legacy RandomState, whose BitGenerator remains MT19937. The only place where the default comes in is when Generator() is instantiated without arguments; i.e. when a user requests a Generator with an arbitrary state, presumably to then call the .seed() method on it. This might probably be pretty rare, as it would be about as easy to just explicitly instantiate it with the seeded BitGenerator that they actually want. A legitimate choice here might actually be to nominate no default and always require the user to specify a BitGenerator.

Nonetheless, we will have recommendations as to which BitGenerator people should use most of the time, and while we can change recommendations fairly freely, whichever one has pride of place will probably get written about most in books, blogs, tutorials, and such.

IMO, there are a few main options (with my commentary, please feel free to disagree; I have not attempted to port over all the relevant comments from #13163):

No default

Always require Generator(ChosenBitGenerator(maybe_seed)). This is a little unfriendly, but as it's a pretty convenient way to get the generator properly initialized for reproducibility, people may end up doing this anyways, even if we do have a default.

MT19937

This would be a good conservative choice. It is certainly no worse than the status quo. As the Mersenne Twister is still widely regarded as "the standard" choice, it might help academic users who need their papers to be reviewed by people who might question "non-standard" choices, regardless of the specific qualities of the PRNG. "No one ever got fired for hiring IBM." The main downsides of MT19937 are mostly that it is slower than some of the available alternatives, due to its very large state, and that it fails some statistical quality tests. In choosing another PRNG, we have an opportunity (but not an obligation, IMO) to be opinionated here and try to move "the standard", if we wish.

PCG64

This is likely the one that I'll be using most often, personally. The main downside is that it uses 128-bit integer arithmetic, which is emulated in C if the compiler does not provide such an integer type. The two main platforms for which this is the case are 32-bit CPUs and 64-bit MSVC, which just does not support 128-bit integers even when the CPU does. Personally, I do not suggest letting the performance increasingly-rare 32-bit CPUs dictate our choices. But the MSVC performance is important, though, since our Windows builds do need that compiler and not other Windows compilers. It can probably be addressed with some assembly/compiler intrinsics, but someone would have to write them. The fact that it's only MSVC that we have to do this for makes this somewhat more palatable than other times when we are confronted with assembly.

Xoshiro256

Another modern choice for a small, fast PRNG. It does have a few known statistical quirks, but they are unlikely to be a major factor for most uses. Those quirks make me shy away from it, but that's my personal choice for the code I'll be writing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions