Skip to content

API: Add rng.spawn(), bit_gen.spawn(), and bit_gen.seed_seq#23195

Merged
charris merged 5 commits intonumpy:mainfrom
seberg:public-rng-spawn
Feb 23, 2023
Merged

API: Add rng.spawn(), bit_gen.spawn(), and bit_gen.seed_seq#23195
charris merged 5 commits intonumpy:mainfrom
seberg:public-rng-spawn

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Feb 10, 2023

This makes the seed sequence interface more public facing by:

  1. Adding BitGenerator.seed_seq to give clear access to _seed_seq
  2. Add spawn() to both the generator and the bit generator as convenience methods for spawning new instances.

I somewhat remember that we always meant to consider making this more public and adding such convenient methods, but did not do so originally.
So, now, I do wonder whether it is time to make this fully public?

It would be nice to follow up at some point with a bit of best practices. This also doesn't add it to the RandomState, although doing it via RandomState._bit_generator is of course valid.
Can we define as this kind of access as stable enough that downstream libraries could use it? I fear that backcompat with RandomState might make adopting newer things like spawning hard for libraries?


This PR came out of a discussion I had today with @betatim from sklearn (not 100% related). I am not quite sure where we are on doing this (or something similar), but thought it may be time.

@seberg seberg added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: numpy.random 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. labels Feb 10, 2023
@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 10, 2023
Copy link
Copy Markdown
Member

@rkern rkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, 👍. I think it's a good time.

The main narrative documentation about spawning should be reworked to use Generator.spawn() primarily. That can contain code examples of how it works and relates to the underlying SeedSequence so that information can be removed from the docstrings here, where concision constrains us from really explaining it clearly.

raise TypeError(
"The underlying SeedSequence does not implement spawning. "
"You must ensure a custom SeedSequence used for initializing "
"the random state implements spawning (and registers it).")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom SeedSequences don't have to implement spawning. Most reasons for implementing a custom SeedSequence won't support spawning; the most common reason will be to match a seeding algorithm from some other system, like the legacy MT19937 seeding. They just won't support spawning. I'd cut this message off at the first sentence.

This is a convenience method to safely spawn new random number
generators via the `numpy.random.SeedSequence.spawn` mechanism.
The original seed sequence is used by the bit generator and accessible
as ``Generator.bit_generator.seed_seq``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this sentence is trying to convey in this context. I'm not sure why the address of the SeedSequence is important here. On first reading, it read like it was claiming that the original SeedSequence was available on the spawned generators. I think this might work better as an Example rather than prose, if it needs to be said in this docstring.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt like pointing where things come from. But, I agree that it is probably easy enough to find when a user actually wants to (and the vast majority shouldn't).

Instead added a See Also section, that seems more useful.

I gave a shot expanding the docs a bit. The main info about spawning is currently in the bit generator, right now only inserted the new approach there.

(Noted jumped in BitGenerator.spawn(), not sure it is helpful, and maybe See Also of jumped should point back to spawn?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main part of the narrative docs I was thinking of was the Parallel RNG page. It could wait for another doc-focused PR, though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 sorry, gave it a shot to expand those a bit minimally, since that is really the right place to link to. Not sure that is the best approach so happy to try and redo more.

from numpy.random import PCG64, SeedSequence
# High quality initial entropy
entropy = 0x87351080e25cb0fad77a44a3be03b491
base_rng = PCG(entropy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base_rng = PCG(entropy)
base_rng = PCG64(entropy)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a BitGenerator, not a Generator. Use default_rng()

This makes the seed sequence interface more public facing by:
1. Adding `BitGenerator.seed_seq` to give clear access to `_seed_seq`
2. Add `spawn()` to both the generator and the bit generator as
   convenience methods for spawning new instances.

I somewhat remember that we always meant to consider making this
more public and adding such convenient methods, but did not do
so originally.
So, now, I do wonder whether it is time to make this fully public?

It would be nice to follow up at some point with a bit of best practices.
This also doesn't add it to the `RandomState`, although doing it via
`RandomState._bit_generator` is of course valid.
Can we define as this kind of access as stable enough that downstream
libraries could use it?  I fear that backcompat with `RandomState`
might make adopting newer things like spawning hard for libraries?
Trying to address Robert Kerns review comments.
@charris charris merged commit d92cc2d into numpy:main Feb 23, 2023
@charris
Copy link
Copy Markdown
Member

charris commented Feb 23, 2023

Thanks Sebastian.

@lciti
Copy link
Copy Markdown

lciti commented Mar 12, 2023

Please note that the documentation for spawn does not mention the n_children argument (see here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

30 - API 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. component: numpy.random

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants