Seeding

Initialising PRNGs correctly is hard. For casual users in particular, it should be easy to Do The Right Thing.

Typical things users should be able to do and the common solutions are:
1. **User:** initialise a PRNG without specifying a seed at all
   **Solution:** draw non-deterministic randomness from a system source to create a good seed
2. **User:** initialise a PRNG from a low entropy constant (e.g. small integers like 1337 or 42) without getting punished for it
   **Solution:** mix user-provided seed deterministically to turn low-entropy seed (e.g. with lots of zero bytes) into high-entropy seed
3. **User:** initialise multiple instances of a PRNG
   **Solution:** provide a seed sequence with a `spawn` function, see below

### Prior art

Some RNG APIs have a separate "seeding" layer that takes care of this, see https://github.com/idontgetoutmuch/random/issues/13.

#### C++ [`randutils`](https://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html#a-preview) / [`seed_seq`](https://en.cppreference.com/w/cpp/numeric/random/seed_seq)

- Initialising a PRNG without parameters uses a system randomness source to seed it ([source here](https://gist.github.com/imneme/540829265469e673d045))
- Builds on top of the `seed_seq` idea from the C++ standard library; this is a kind of low level PRNG that can only fill arrays of 32-bit numbers with random values, nothing else - its only purpose is to seed other (more convenient and faster) PRNGs.

#### numpy [`SeedSequence`](https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence)

- Initialising a `SeedSequence` without parameters uses a [system randomness](https://github.com/numpy/numpy/blob/acba24414487e0e171bedbc8d1ae3cfc63f4b2c7/numpy/random/_bit_generator.pyx#L304-L305) source to seed it
- [`SeedSequence.spawn`](https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.spawn.html#numpy.random.SeedSequence.spawn) initialises a number of new `SeedSequence` instances. The unique spawn key is hashed into the sequence's entropy
- [`SeedSequence.generate_state`](https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.generate_state.html#numpy.random.SeedSequence.generate_state) returns an array of random numbers for seeding
- PRNG constructors take a `SeedSequence` from which they draw their seed

#### [`mwc-random`](http://hackage.haskell.org/package/mwc-random-0.14.0.0/docs/System-Random-MWC.html#g:1)

- `create` and `initialize` combine a user-provided seed with a fixed default seed - this is a form of deterministic mixing
- `withSystemRandom` and `createSystemRandom` seed the PRNG with system randomness

#### [`splitmix`](https://hackage.haskell.org/package/splitmix-0.0.4/docs/System-Random-SplitMix.html#g:2)

- `seedSMGen` uses the given seeds without mixing
- `mkSMGen` mixes the given `Word64` before initialising the `SMGen`
- `initSMGen` uses system time
- `newSMGen` splits an new generator off from the global generator

#### `random` right now

- `mkStdGen` (public) throws away all but the lowest 32 bits of the given `Int` and then creates the `StdGen` via `mkStdGen32`. `createStdGen` (private) does the same, but takes an `Integer`.
- `split` is used to create new RNG instances

### Design considerations

#### Mixing user seeds

`random` as it is designed right now provides no method that deterministically mixes the user seed. This is easily fixed e.g. by [calling out to the relevant `splitmix` function](https://github.com/idontgetoutmuch/random/pull/2/files#diff-4a522d098452ebfc9be2d6291588f5bcR223).

#### Spawning and splitting

In the `random` API, `split` is part of the `RandomGen` class. This means that there is an assumption that each RNG must be split in its own way.
This is in contrast to the C++ and numpy APIs, where there is a source of seeds separate from any particular RNG implementation (`seed_seq` and `SeedSequence`, respectively) which is used to seed all RNGs.

For concreteness' sake, what I have in mind is something along those lines:

```haskell
-- | Size of the entropy pool, in bytes (default 4 or 8)
type PoolSize = Int

-- | Non-deterministic seed using system randomness source
systemSeed :: IO SeedSeq
systemSeedWithPoolSize :: PoolSize -> IO SeedSeq
-- | Deterministic seed generated from user specified input
fixedSeed :: Word64 -> SeedSeq
fixedSeedWithPoolSize :: PoolSize -> Word64 -> SeedSeq

-- | Seed a single 'RandomGen'
spawn :: RandomGen g => SeedSeq -> (g, SeedSeq)
-- | Seed multiple 'RandomGen'
spawnN :: RandomGen g => SeedSeq -> Int -> ([g], SeedSeq)
```

This is a simplified translation of the numpy / C++ approach. In the background, it would use something like [this `initialize` function in `RandomGen`](https://github.com/idontgetoutmuch/random/pull/12/files#diff-4a522d098452ebfc9be2d6291588f5bcR150-R151) to create RNGs.

**Pros** of such an API with a separate, opaque source of seeds:
- Allows us to build a 'good seed by default' API for any compatible RNG (see `systemSeed` and `fixedSeed` above)
- Intent is communicated clearly: `Seed` is meant to be used for seeding, so its design constraints are different: it must create reasonable seeds for any RNG, but it can be slower than other RNGs because in the expected use case, only a small number of seeds are required compared to the overall number of random values generated by RNGs over the course of a program

**Cons:**
- A bad source of seeds would lead to a 'bad seed by default' API
- It might interact weirdly with some particular RNG algorithm

#### Implementation notes

[`entropy`](https://hackage.haskell.org/package/entropy) could be used to get entropy from the system.
The PRNG behind `Seed` could be
- the non-cryptographic algorithm designed by O'Neill for `randutils`, which is also used by numpy: it passes statistical tests and is now widely used
- a cryptographic RNG

Thoughts?

Edit: renamed `Seed` to `SeedSeq` to avoid clashes with the existing definition of `Seed` inside `MonadRandom`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeding #18

Prior art

C++ `randutils` / `seed_seq`

numpy `SeedSequence`

`mwc-random`

`splitmix`

`random` right now

Design considerations

Mixing user seeds

Spawning and splitting

Implementation notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Seeding #18

Description

Prior art

C++ randutils / seed_seq

numpy SeedSequence

mwc-random

splitmix

random right now

Design considerations

Mixing user seeds

Spawning and splitting

Implementation notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

C++ `randutils` / `seed_seq`

numpy `SeedSequence`

`mwc-random`

`splitmix`

`random` right now