BUG: Correct loop order in MT19937 jump#16153
Conversation
|
Thanks Kevin! @fzeiser and @dubzzz could you have a quick look whether this makes sense to you, and gives the expected jump values? Not that I don't trust that, but always good to have a second pair of eyes, and you are more knowledgeable. @bashtage I guess this should have a release note due to the changed stream if |
|
If anyone is interested, the code to produce the test values is in this commit. |
There was a problem hiding this comment.
I'd probably add another test covering the issue I originally opened:
{ # You already have this one
"seed": 0,
"steps": 10,
"initial": {"key_md5": "64eaf265d2203179fb5ffb73380cd589", "pos": 9},
"jumped": {"key_md5": "8cb7b061136efceef5217a9ce2cc9a5a", "pos": 598},
},
{ # But don't have this one
"seed": 0,
"steps": 11,
"initial": {"key_md5": "64eaf265d2203179fb5ffb73380cd589", "pos": 10},
"jumped": {"key_md5": "8cb7b061136efceef5217a9ce2cc9a5a", "pos": 599},
},The added test ensures that we have the following behaviour:
jump(next(mt19937)) == next(jump(mt19937)) - or something close
There was a problem hiding this comment.
The only guarantee that I think we should try and make is that the code is faithful to the original author's code. Whether this is true or not I think is a different issue.
There was a problem hiding this comment.
If I understand it correctly, the test you propose will not work (in the current implementation / original authors source code) / tried it out and it didn't work:
jump(next(mt19937)) == next(jump(mt19937))- or something close
However, if this is the case, then I don't think that this pull request closes #15394, does it?
Or put it the other way around: With the current state of the PR, shouldn't the documentation be adopted too?
Current documentation:
The state of the returned big generator is jumped as-if 2**(128 * jumps) random numbers have been generated.
On this note:
The only guarantee that I think we should try and make is that the code is faithful to the original author's code. Whether this is true or not I think is a different issue.
To what extend is this the policy? I totally understand that you may not have the time to check that all implementation are "true" - but still I think it's worrisome if the random generator behaves in a way we don't understand. So what should the documentation of jump say? If I exaggerate a little: the original author claims that it is as if 2*(128 * jumps) random numbers have been generated; but when we test it, that it not true`
There was a problem hiding this comment.
Another note: Shall we implement the same type of test
jump(next(mt19937)) == next(jump(mt19937))- or something close
in the tests for PCG64? There it should pass; but nice to have this ensured for the future, too.
There was a problem hiding this comment.
Depends on what the point of #15394 is. If it is to check the implementation, then I think this PR closes that. If it is to make jumped work in a way different from the original code, then it does not. I don't think the latter is realistic.
There was a problem hiding this comment.
My original point when opening the issue was the following:
As a user of the library I saw that the jump was supposed to offset by a fixed number of iterations. So my expectation and the reason why I opened the issue was jump(next(mt19937)) should always be the same as next(jump(mt19937)) - or at least we might somehow be able to find the same sequence close enough as we should jump very close in both cases. If it is not the case it seems problematic as it looks like the 2**128 skipped iterations are not true so I cannot safely rely on a jump if we don't know where it jumps.
Actually I found this issue while implementing a jump for mersenne twister on a JavaScript library I am maintaining. I already had a working implementation of the mersenne generator but needed a reference one for the jump to compare my results to it. So I compared it to the jump of numpy. In the set of tests, I use on my generators one of them - based on property based testing (see hypothesis for Python) - checks that: jump(next(generator)) == next(jump(generator)). And here is how I found the problem.
There was a problem hiding this comment.
🤔
In this case I'd maybe call mt19937_gen before jumping. As in https://github.com/numpy/numpy/blob/28f9cbc5e5510cd780b9d6a18a8e8bc6b45a13d7/numpy/random/src/mt19937/mt19937.h#L27-L43
Reseting pos to 0 when pos is N means that the final generator after jump will be: 0 + 2**128
While if pos was N-2 it would be: N - 2 + 2**128
While if pos was N-1 it would be: N - 1 + 2**128
There was a problem hiding this comment.
I think the best we can do here is to have fidelity with the original jump implementation. Going off in other directions seems unnecessarily risky.
There was a problem hiding this comment.
I think I agree, except some ease of mind, I honestly do not know why there would be a use-case to generate the same identical stream after different order of operation? The only thing is could be to document it, but since this is the reference implementation...
There was a problem hiding this comment.
I totally agree that if the problem comes from the reference implementation then it is another issue.
What I was saying is more that given the documentation of this method, as a user I expect that calling jump will offset the generator by 2**128 iterations which should be the same as calling next 2**128 times in a row. Saying that next(jump) !== jump(next) somehow points that this is not the case 🤔
There was a problem hiding this comment.
@bashtage maybe we can just tweak the docs a little to be clear that it only jumps about 2**128 states, 2**128 states from the 0 offset postion (whatever it is)?
There was a problem hiding this comment.
People still need a mental model of what's going on. From what you've added, I'd still expect next(jumped()) == jumped(next()) to hold true. If that's not true, then what are the guarantees that we can rely on to reason about our programs?
There was a problem hiding this comment.
My guess is that the jump is for the whole state and ignores the offset for the next item. @bashtage Is that the case? If so, it should be possible to work around that.
There was a problem hiding this comment.
I'm going to try and contact the author to see if he has any notes to share
@bashtage Did you succeed into getting in touch with the author of the paper? I'd really be interested to know the reason why jump seems to jump at very different locations based on how many times we offset the rng before jumping 🤔
As a reminder my original issue was:
from numpy.random import MT19937
rng = MT19937(42)
rng.random_raw() # value given state offset of 0
rng.random_raw() # value given state offset of 1
rng = rng.jumped()
rng.random_raw() # value given state offset of 2**128 +2
# Got: 2382575482
rng = MT19937(42)
rng.random_raw() # value given state offset of 0
rng = rng.jumped()
rng.random_raw() # value given state offset of 2**128 +1
rng.random_raw() # value given state offset of 2**128 +2
# Got: 2001582899It seems that the current version of numpy (1.19.4) still has a similiar behaviour. Which seems unexpected given what jump should do 🤔
There was a problem hiding this comment.
@bashtage Just bringing up the point of dubzzz again: Did you hear back anything of the original author of the jump code paper?
There was a problem hiding this comment.
I recently revisited this issue on pure-rand side with the help of Claude. It is still a work-in-progress but it somehow inspired itself from the numpy implementation but made it (in JavaScript) with the jump>next is equivalent to next>jump. I'll will have to re-read it carefully to get why it suddenly worked but given the tests being executed it seems that it made it through.
PR here.
I'm gonna keep you posted when I understand the diff with its implementation and mine (which was translated from numpy)
There was a problem hiding this comment.
Makes sense and is easily readable! Do you think it makes sense to refer to the commit (5055c43) or a maybe rather version number (1.18.1 ?) of randomgen, too?
|
Looks like the jumped code is LE only. BE will need (painful) investigation. |
|
Should we issue a warning/error on big endian machines until it is fixed? |
|
@charris The bug was in my use of hashlib, not in the C code. CI is passing now. |
|
CircleCI is complaining about a missing reference. |
Use the original loop order instead of an inverted order closes numpy#15394
Refactor polynomial to be unsigned long array Remove unused code Fix md5 calculation on BE
Add note detailing the changes to MT19937 jumped
Remove unused file containing the old polynomial representation.
Clarify the method used and the source of the code
8f46630 to
95309c7
Compare
Fix indent on reference and remove text that may be incorrect.
|
Passing now. I'll put this in tomorrow to allow others to comment if they wish. |
| References | ||
| ---------- | ||
| .. [1] Matsumoto, M, Generating multiple disjoint streams of | ||
| pseudorandom number sequences. Accessed on: May 6, 2020. |
There was a problem hiding this comment.
This may be related to regexp for renaming references in numpydoc... looking into it now
There was a problem hiding this comment.
hmm doesn't seem like it, at least not a problem with the section I looked at:
|
Thanks Kevin. |
Implement MT19937 jump-ahead by 2^128 steps using Horner's method on the characteristic polynomial in GF(2). Key changes: - Switch next() from batch-twist to lazy gen_next model so that jump and next use the same recurrence, guaranteeing commutativity - Add 624-element jump polynomial (2^128 step) from Hiroshima University's jump-ahead algorithm - Implement jump() via Horner's method with genNext/addState helpers - Add reference value tests and property-based tests (noOrderNextJump, changeSelfWithJump, noChangeOnClonedWithJump) References: - Hiroshima University MT jump: http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/JUMP/ - numpy MT19937 jump (corrected): numpy/numpy#16153 - Previous attempt: #12 https://claude.ai/code/session_01Jaz8u7FqjJQUkipr8LPbfr
Use the original loop order instead of an inverted order
closes #15394