Initialising PRNGs correctly is hard. For casual users in particular, it should be easy to Do The Right Thing.
Typical things users should be able to do and the common solutions are:
- User: initialise a PRNG without specifying a seed at all
Solution: draw non-deterministic randomness from a system source to create a good seed
- User: initialise a PRNG from a low entropy constant (e.g. small integers like 1337 or 42) without getting punished for it
Solution: mix user-provided seed deterministically to turn low-entropy seed (e.g. with lots of zero bytes) into high-entropy seed
- User: initialise multiple instances of a PRNG
Solution: provide a seed sequence with a spawn function, see below
Prior art
Some RNG APIs have a separate "seeding" layer that takes care of this, see #13.
- Initialising a PRNG without parameters uses a system randomness source to seed it (source here)
- Builds on top of the
seed_seq idea from the C++ standard library; this is a kind of low level PRNG that can only fill arrays of 32-bit numbers with random values, nothing else - its only purpose is to seed other (more convenient and faster) PRNGs.
- Initialising a
SeedSequence without parameters uses a system randomness source to seed it
SeedSequence.spawn initialises a number of new SeedSequence instances. The unique spawn key is hashed into the sequence's entropy
SeedSequence.generate_state returns an array of random numbers for seeding
- PRNG constructors take a
SeedSequence from which they draw their seed
create and initialize combine a user-provided seed with a fixed default seed - this is a form of deterministic mixing
withSystemRandom and createSystemRandom seed the PRNG with system randomness
seedSMGen uses the given seeds without mixing
mkSMGen mixes the given Word64 before initialising the SMGen
initSMGen uses system time
newSMGen splits an new generator off from the global generator
random right now
mkStdGen (public) throws away all but the lowest 32 bits of the given Int and then creates the StdGen via mkStdGen32. createStdGen (private) does the same, but takes an Integer.
split is used to create new RNG instances
Design considerations
Mixing user seeds
random as it is designed right now provides no method that deterministically mixes the user seed. This is easily fixed e.g. by calling out to the relevant splitmix function.
Spawning and splitting
In the random API, split is part of the RandomGen class. This means that there is an assumption that each RNG must be split in its own way.
This is in contrast to the C++ and numpy APIs, where there is a source of seeds separate from any particular RNG implementation (seed_seq and SeedSequence, respectively) which is used to seed all RNGs.
For concreteness' sake, what I have in mind is something along those lines:
-- | Size of the entropy pool, in bytes (default 4 or 8)
type PoolSize = Int
-- | Non-deterministic seed using system randomness source
systemSeed :: IO SeedSeq
systemSeedWithPoolSize :: PoolSize -> IO SeedSeq
-- | Deterministic seed generated from user specified input
fixedSeed :: Word64 -> SeedSeq
fixedSeedWithPoolSize :: PoolSize -> Word64 -> SeedSeq
-- | Seed a single 'RandomGen'
spawn :: RandomGen g => SeedSeq -> (g, SeedSeq)
-- | Seed multiple 'RandomGen'
spawnN :: RandomGen g => SeedSeq -> Int -> ([g], SeedSeq)
This is a simplified translation of the numpy / C++ approach. In the background, it would use something like this initialize function in RandomGen to create RNGs.
Pros of such an API with a separate, opaque source of seeds:
- Allows us to build a 'good seed by default' API for any compatible RNG (see
systemSeed and fixedSeed above)
- Intent is communicated clearly:
Seed is meant to be used for seeding, so its design constraints are different: it must create reasonable seeds for any RNG, but it can be slower than other RNGs because in the expected use case, only a small number of seeds are required compared to the overall number of random values generated by RNGs over the course of a program
Cons:
- A bad source of seeds would lead to a 'bad seed by default' API
- It might interact weirdly with some particular RNG algorithm
Implementation notes
entropy could be used to get entropy from the system.
The PRNG behind Seed could be
- the non-cryptographic algorithm designed by O'Neill for
randutils, which is also used by numpy: it passes statistical tests and is now widely used
- a cryptographic RNG
Thoughts?
Edit: renamed Seed to SeedSeq to avoid clashes with the existing definition of Seed inside MonadRandom.
Initialising PRNGs correctly is hard. For casual users in particular, it should be easy to Do The Right Thing.
Typical things users should be able to do and the common solutions are:
Solution: draw non-deterministic randomness from a system source to create a good seed
Solution: mix user-provided seed deterministically to turn low-entropy seed (e.g. with lots of zero bytes) into high-entropy seed
Solution: provide a seed sequence with a
spawnfunction, see belowPrior art
Some RNG APIs have a separate "seeding" layer that takes care of this, see #13.
C++
randutils/seed_seqseed_seqidea from the C++ standard library; this is a kind of low level PRNG that can only fill arrays of 32-bit numbers with random values, nothing else - its only purpose is to seed other (more convenient and faster) PRNGs.numpy
SeedSequenceSeedSequencewithout parameters uses a system randomness source to seed itSeedSequence.spawninitialises a number of newSeedSequenceinstances. The unique spawn key is hashed into the sequence's entropySeedSequence.generate_statereturns an array of random numbers for seedingSeedSequencefrom which they draw their seedmwc-randomcreateandinitializecombine a user-provided seed with a fixed default seed - this is a form of deterministic mixingwithSystemRandomandcreateSystemRandomseed the PRNG with system randomnesssplitmixseedSMGenuses the given seeds without mixingmkSMGenmixes the givenWord64before initialising theSMGeninitSMGenuses system timenewSMGensplits an new generator off from the global generatorrandomright nowmkStdGen(public) throws away all but the lowest 32 bits of the givenIntand then creates theStdGenviamkStdGen32.createStdGen(private) does the same, but takes anInteger.splitis used to create new RNG instancesDesign considerations
Mixing user seeds
randomas it is designed right now provides no method that deterministically mixes the user seed. This is easily fixed e.g. by calling out to the relevantsplitmixfunction.Spawning and splitting
In the
randomAPI,splitis part of theRandomGenclass. This means that there is an assumption that each RNG must be split in its own way.This is in contrast to the C++ and numpy APIs, where there is a source of seeds separate from any particular RNG implementation (
seed_seqandSeedSequence, respectively) which is used to seed all RNGs.For concreteness' sake, what I have in mind is something along those lines:
This is a simplified translation of the numpy / C++ approach. In the background, it would use something like this
initializefunction inRandomGento create RNGs.Pros of such an API with a separate, opaque source of seeds:
systemSeedandfixedSeedabove)Seedis meant to be used for seeding, so its design constraints are different: it must create reasonable seeds for any RNG, but it can be slower than other RNGs because in the expected use case, only a small number of seeds are required compared to the overall number of random values generated by RNGs over the course of a programCons:
Implementation notes
entropycould be used to get entropy from the system.The PRNG behind
Seedcould berandutils, which is also used by numpy: it passes statistical tests and is now widely usedThoughts?
Edit: renamed
SeedtoSeedSeqto avoid clashes with the existing definition ofSeedinsideMonadRandom.