Instances and warnings#58
Conversation
583b375 to
3b86554
Compare
| -- let rolls :: Int -> [Word] | ||
| -- rolls n = runGenState_ | ||
| -- (PCGen 17 29) | ||
| -- (\g -> randomListM g 10 >>= \xs -> return $ map ((+1) . (`mod` 6)) xs) |
There was a problem hiding this comment.
I added uniformListM as an exported function, it seems like it could be useful
There was a problem hiding this comment.
I'm a fan of keeping the API surface small, particularly given that this is a core package (for some definition of "core"). "seems like it could be useful" is not strong enough IMO. I would prefer it if we didn't export uniformListM.
There was a problem hiding this comment.
There is randoms, so I though it only made sense to add a monadic version of it. Also I think generating a list of values is gonna be an extremely common operation. Hell it even creeped up in documentation ;)
There was a problem hiding this comment.
It seems to be have been removed from the example - fine with me
There was a problem hiding this comment.
I removed it because using uniformR is a correct way of generating random die throws, rather than mod 6, cause it is biased that way.
| (l32, g') -> | ||
| case genWord32 g' of | ||
| (h32, g'') -> | ||
| ((fromIntegral h32 `unsafeShiftL` 32) .|. fromIntegral l32, g'') |
There was a problem hiding this comment.
This approach assumes that older instances of RandomGen class generate 32bits or less of randomness at one time. It will still work for 64bits too, but it will reduce the quality of RNG to 32bits.
Another approach would be to switch to this:
genWord64 = randomIvalIntegral (minBound, maxBound)But has terrible performance and is not good default implementation for RNG that define genWord32.
There was a problem hiding this comment.
But this is the default so users can always replace it with something better if it is available?
There was a problem hiding this comment.
@idontgetoutmuch That was my thinking as well.
| bitmaskWithRejection :: | ||
| (RandomGen g, FiniteBits a, Num a, Ord a, Random a) | ||
| uniformIntegerM :: (MonadRandom g m) => (Integer, Integer) -> g -> m Integer | ||
| uniformIntegerM (l, h) gen |
There was a problem hiding this comment.
I just converted randomIvalIntegral to the MonadRandom approach without trying to dive deeper of how the actual generation works, so if someone feels like analysing or improving the algorithm of Integer generation that would be great
There was a problem hiding this comment.
I (think) I understand the algorithm: it is a variant of Classic Modulo which can be badly biased. By multiplying by the arbitrary 1000 this biasing becomes a 1000 times less.
Example of biasing: in this example with 8 bits of entropy, some numbers get picked 4/3 as often, multiplying by 1000 \approxeq 10 bits, some numbers get picked 4473/4472 as often. Of course the underlying RNG then has to do more work as well as still being (a bit) biased.
Data.Map.fromListWith (const (+1)) $ Prelude.map ((\x -> (x, 0)) . (`mod` 57)) [0..255]
(`mod` 57)) [0..255]
fromList [(0,4),(1,4),(2,4),(3,4),(4,4),(5,4),(6,4),(7,4),(8,4),(9,4),(10,4),(11,4),(12,4),(13,4),(14,4),(15,4),(16,4),(17,4),(18,4),(19,4),(20,4),(21,4),(22,4),(23,4),(24,4),(25,4),(26,4),(27,4),(28,3),(29,3),(30,3),(31,3),(32,3),(33,3),(34,3),(35,3),(36,3),(37,3),(38,3),(39,3),(40,3),(41,3),(42,3),(43,3),(44,3),(45,3),(46,3),(47,3),(48,3),(49,3),(50,3),(51,3),(52,3),(53,3),(54,3),(55,3),(56,3)]
fromList [(0,4473),(1,4473),(2,4473),(3,4473),(4,4473),(5,4473),(6,4473),(7,4473),(8,4473),(9,4473),(10,4473),(11,4473),(12,4473),(13,4473),(14,4473),(15,4473),(16,4473),(17,4473),(18,4473),(19,4473),(20,4473),(21,4473),(22,4473),(23,4473),(24,4473),(25,4473),(26,4473),(27,4473),(28,4473),(29,4473),(30,4473),(31,4473),(32,4473),(33,4473),(34,4473),(35,4473),(36,4473),(37,4473),(38,4473),(39,4473),(40,4472),(41,4472),(42,4472),(43,4472),(44,4472),(45,4472),(46,4472),(47,4472),(48,4472),(49,4472),(50,4472),(51,4472),(52,4472),(53,4472),(54,4472),(55,4472),(56,4472)]
There was a problem hiding this comment.
I should also add this function takes a generator with entropy less than 42(!) bits and makes it produce 42 bits of entropy. So if e.g. you have a generator with 1 bit of entropy then it will be invoked 42 times by this function.
A 1 bit of entropy generator if you ever need it
data BinRNG = BinRNG StdGen
instance RandomGen BinRNG where
next (BinRNG g) = (x `mod` 2, BinRNG g')
where (x,g') = next g
split (BinRNG g) = (BinRNG g1, BinRNG g2)
where (g1,g2) = split g
genRange _ = (0,1)
| Minimal complete definition: 'randomR' and 'random'. | ||
|
|
||
| -} | ||
| {-# DEPRECATED randomR "In favor of `uniformR`" #-} |
There was a problem hiding this comment.
Why do you think we should un-deprecate randomR?
There was a problem hiding this comment.
I removed it for now, because it results in too many warnings everywhere. It looks like it is a very popular function. I think we should wait until next iteration on deprecating it.
| -- let rolls :: Int -> [Word] | ||
| -- rolls n = runGenState_ | ||
| -- (PCGen 17 29) | ||
| -- (\g -> randomListM g 10 >>= \xs -> return $ map ((+1) . (`mod` 6)) xs) |
There was a problem hiding this comment.
I'm a fan of keeping the API surface small, particularly given that this is a core package (for some definition of "core"). "seems like it could be useful" is not strong enough IMO. I would prefer it if we didn't export uniformListM.
| | otherwise = (bottom +) . fromUnsigned <$> | ||
| bitmaskWithRejectionM uniform range gen | ||
| where | ||
| range = toUnsigned top - toUnsigned bottom |
There was a problem hiding this comment.
I think it's worth explaining in a comment how this is exploiting overflow to get the correct behaviour.
There was a problem hiding this comment.
I'd love to explain it but I am not sure I can explain it with words. It's just the math used works out with modulo arithmetic, that's all.
There was a problem hiding this comment.
Okay, I'll give it a try:
-- The following works out thanks to modular arithmetic. Let @n@ be the
-- width of the signed and unsigned integer type in question.
--
-- Conceptually,
--
-- toUnsigned x | x >= 0 = x
-- toUnsigned x | x < 0 = 2^n + x
--
-- The following combinations are possible:
--
-- 1. @bottom >= 0@ and @top >= 0@
-- 2. @bottom < 0@ and @top >= 0@
-- 3. @bottom < 0@ and @top < 0@
--
-- Note that @bottom >= 0@ and @top < 0@ is impossible because of the
-- invariant @bottom < top@ enforced in this equation by the guard in the
-- function definition.
--
-- For any signed integer @i@ of width @n@, we have:
--
-- -2^(n-1) <= i <= 2^(n-1) - 1
--
-- Considering each combination in turn, we have
--
-- 1. @bottom >= 0@ and @top >= 0@
--
-- range = (toUnsigned top - toUnsigned bottom) `mod` 2^n
-- --^ top >= 0, so toUnsigned top == top
-- --^ bottom >= 0, so toUnsigned bottom == bottom
-- = (top - bottom) `mod` 2^n
-- --^ top <= 2^(n-1) - 1 and bottom >= 0
-- --^ top - bottom <= 2^(n-1) - 1
-- --^ 0 < top - bottom <= 2^(n-1) - 1
-- = top - bottom
--
-- 2. @bottom < 0@ and @top >= 0@
--
-- range = (toUnsigned top - toUnsigned bottom) `mod` 2^n
-- --^ top >= 0, so toUnsigned top == top
-- --^ bottom < 0, so toUnsigned bottom == 2^n + bottom
-- = (top - (2^n + bottom)) `mod` 2^n
-- --^ summand -2^n cancels out in calculation modulo 2^n
-- = (top - bottom) `mod` 2^n
-- --^ top <= 2^(n-1) - 1 and bottom >= -2^(n-1)
-- --^ top - bottom <= (2^(n-1) - 1) - (-2^(n-1)) = 2^n - 1
-- --^ 0 < top - bottom <= 2^n - 1
-- = top - bottom
--
-- 3. @bottom < 0@ and @top < 0@
--
-- range = (toUnsigned top - toUnsigned bottom) `mod` 2^n
-- --^ top < 0, so toUnsigned top == 2^n + top
-- --^ bottom < 0, so toUnsigned bottom == 2^n + bottom
-- = ((2^n + top) - (2^n + bottom)) `mod` 2^n
-- --^ summand 2^n cancels out in calculation modulo 2^n
-- = (top - bottom) `mod` 2^n
-- --^ top <= -1
-- --^ bottom >= -2^(n-1)
-- --^ top - bottom <= -1 - (-2^(n-1)) = 2^(n-1) - 1
-- --^ 0 < top - bottom <= 2^(n-1) - 1
-- = top - bottom
range = toUnsigned top - toUnsigned bottom
There was a problem hiding this comment.
That is the kind of comments that should never appear in functions, IMHO, since they hide the simplicity of the actual code in the complexity of the comment. What we can do though is add this as an appendix, possibly at the end of the module or something, for those that actually do care reading long comments like that. I have a friend Niklas that loves long commentaries like that, he is also German, is that a cultural thing? :)
| main = defaultMain $ testGroup "Spec" | ||
| [ bitmaskSpecWord32, bitmaskSpecWord64 | ||
| , rangeSpecWord32, rangeSpecDouble, rangeSpecFloat, rangeSpecInt | ||
| main = |
There was a problem hiding this comment.
Nice! Thanks for improving the tests here.
|
I pushed a tiny change to run spec tests for |
|
@curiousleo Not at all. Good catch. |
…prove documentation
…rove tests and coverage
ff9ba76 to
64c0667
Compare
curiousleo
left a comment
There was a problem hiding this comment.
LGTM, happy to see this merged.
I have two suggestions which I've put into separate PRs: #63 and #64. We can merge them into instances-and-warnings before merging instances-and-warnings into interface-to-performance, or we can merge this PR now and I'll re-target #63 and #64 at interface-to-performance. I don't mind.
Mark all bitmask functions with signed/unsigned
Explain why range is computed correctly
|
@curiousleo and I have looked at various algorithms for ranges and the bitmask approach is good but not the best - the unbiased integer multiplication approach seems to be about x2 better depending on the benchmark. These benchmarks are in C(++) so not directly comparable. For ranges for 32 bit integers you need 64 bit words and for ranges for 64 bit integers you need 128 bit words which C supports but Haskell doesn't (at least I couldn't find out how to do it efficiently - there are various libraries around e.g. https://hackage.haskell.org/package/largeword-1.2.5/docs/Data-LargeWord.html and maybe with enought INLINE and SPECIALIZE pragmas these can be made efficient but my brain is not big enough). My suggestion is we use the faster (unbiased integer multiplication) for 32 bit integers and use the bitmask approach for 64 bit integers. |
|
|
||
| genWord32R :: Word32 -> g -> (Word32, g) | ||
| genWord32R m = randomIvalIntegral (minBound, m) | ||
| genWord32R m g = runGenState g (unsignedBitmaskWithRejectionM uniformWord32 m) |
There was a problem hiding this comment.
I propose
genWord32R m g = runGenState g (unbiasedIntMult32 m)
unbiasedIntMult32 :: MonadRandom g m => Word32 -> g -> m Word32
unbiasedIntMult32 r g = go
where
t :: Word32
t = (-r) `mod` r -- Calculates 2^32 `mod` r!!!
go = do
x <- uniformWord32 g
let m :: Word64
m = (fromIntegral x) * (fromIntegral r)
l :: Word32
l = fromIntegral m
if (l >= t) then return (fromIntegral $ m `shiftR` 32) else go
Uncomment legacy tests
Resurrect legacy benchmarks
|
Ok, I think we all agree on changes in this PR, I am merging this one and retargeting all branches that depend on it |
I finally got some time to add more improvements to our initiative. There are quite a few changes in this PR. I'll add some comments directly into the code for some of the decisions I made so you guys can respond back if you don't like any of those decisions.
RandomGenandMonadRandom, while keeping it backwards compatible (I was able to compile and run the test suite for pcgen with no changes to the code)