Skip to content

BUG: Correct loop order in MT19937 jump#16153

Merged
charris merged 6 commits intonumpy:masterfrom
bashtage:fix-mt19937-jump
May 13, 2020
Merged

BUG: Correct loop order in MT19937 jump#16153
charris merged 6 commits intonumpy:masterfrom
bashtage:fix-mt19937-jump

Conversation

@bashtage
Copy link
Contributor

@bashtage bashtage commented May 4, 2020

Use the original loop order instead of an inverted order

closes #15394

@seberg
Copy link
Member

seberg commented May 4, 2020

Thanks Kevin! @fzeiser and @dubzzz could you have a quick look whether this makes sense to you, and gives the expected jump values? Not that I don't trust that, but always good to have a second pair of eyes, and you are more knowledgeable.

@bashtage I guess this should have a release note due to the changed stream if jump is used?

@seberg seberg added 00 - Bug 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: numpy.random labels May 4, 2020
@bashtage
Copy link
Contributor Author

bashtage commented May 4, 2020

If anyone is interested, the code to produce the test values is in this commit.

Copy link

@dubzzz dubzzz May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably add another test covering the issue I originally opened:

    { # You already have this one
        "seed": 0,
        "steps": 10,
        "initial": {"key_md5": "64eaf265d2203179fb5ffb73380cd589", "pos":  9},
        "jumped": {"key_md5": "8cb7b061136efceef5217a9ce2cc9a5a", "pos": 598},
    },
    { # But don't have this one
        "seed": 0,
        "steps": 11,
        "initial": {"key_md5": "64eaf265d2203179fb5ffb73380cd589", "pos": 10},
        "jumped": {"key_md5": "8cb7b061136efceef5217a9ce2cc9a5a", "pos": 599},
    },

The added test ensures that we have the following behaviour:
jump(next(mt19937)) == next(jump(mt19937)) - or something close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only guarantee that I think we should try and make is that the code is faithful to the original author's code. Whether this is true or not I think is a different issue.

Copy link
Contributor

@fzeiser fzeiser May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, the test you propose will not work (in the current implementation / original authors source code) / tried it out and it didn't work:

jump(next(mt19937)) == next(jump(mt19937)) - or something close

However, if this is the case, then I don't think that this pull request closes #15394, does it?
Or put it the other way around: With the current state of the PR, shouldn't the documentation be adopted too?

Current documentation:

The state of the returned big generator is jumped as-if 2**(128 * jumps) random numbers have been generated.

On this note:

The only guarantee that I think we should try and make is that the code is faithful to the original author's code. Whether this is true or not I think is a different issue.

To what extend is this the policy? I totally understand that you may not have the time to check that all implementation are "true" - but still I think it's worrisome if the random generator behaves in a way we don't understand. So what should the documentation of jump say? If I exaggerate a little: the original author claims that it is as if 2*(128 * jumps) random numbers have been generated; but when we test it, that it not true`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: Shall we implement the same type of test

jump(next(mt19937)) == next(jump(mt19937)) - or something close

in the tests for PCG64? There it should pass; but nice to have this ensured for the future, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what the point of #15394 is. If it is to check the implementation, then I think this PR closes that. If it is to make jumped work in a way different from the original code, then it does not. I don't think the latter is realistic.

Copy link

@dubzzz dubzzz May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original point when opening the issue was the following:

As a user of the library I saw that the jump was supposed to offset by a fixed number of iterations. So my expectation and the reason why I opened the issue was jump(next(mt19937)) should always be the same as next(jump(mt19937)) - or at least we might somehow be able to find the same sequence close enough as we should jump very close in both cases. If it is not the case it seems problematic as it looks like the 2**128 skipped iterations are not true so I cannot safely rely on a jump if we don't know where it jumps.

Actually I found this issue while implementing a jump for mersenne twister on a JavaScript library I am maintaining. I already had a working implementation of the mersenne generator but needed a reference one for the jump to compare my results to it. So I compared it to the jump of numpy. In the set of tests, I use on my generators one of them - based on property based testing (see hypothesis for Python) - checks that: jump(next(generator)) == next(jump(generator)). And here is how I found the problem.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

In this case I'd maybe call mt19937_gen before jumping. As in https://github.com/numpy/numpy/blob/28f9cbc5e5510cd780b9d6a18a8e8bc6b45a13d7/numpy/random/src/mt19937/mt19937.h#L27-L43

Reseting pos to 0 when pos is N means that the final generator after jump will be: 0 + 2**128
While if pos was N-2 it would be: N - 2 + 2**128
While if pos was N-1 it would be: N - 1 + 2**128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best we can do here is to have fidelity with the original jump implementation. Going off in other directions seems unnecessarily risky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree, except some ease of mind, I honestly do not know why there would be a use-case to generate the same identical stream after different order of operation? The only thing is could be to document it, but since this is the reference implementation...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that if the problem comes from the reference implementation then it is another issue.

What I was saying is more that given the documentation of this method, as a user I expect that calling jump will offset the generator by 2**128 iterations which should be the same as calling next 2**128 times in a row. Saying that next(jump) !== jump(next) somehow points that this is not the case 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage maybe we can just tweak the docs a little to be clear that it only jumps about 2**128 states, 2**128 states from the 0 offset postion (whatever it is)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People still need a mental model of what's going on. From what you've added, I'd still expect next(jumped()) == jumped(next()) to hold true. If that's not true, then what are the guarantees that we can rely on to reason about our programs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that the jump is for the whole state and ignores the offset for the next item. @bashtage Is that the case? If so, it should be possible to work around that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try and contact the author to see if he has any notes to share

@bashtage Did you succeed into getting in touch with the author of the paper? I'd really be interested to know the reason why jump seems to jump at very different locations based on how many times we offset the rng before jumping 🤔

As a reminder my original issue was:

from numpy.random import MT19937

rng = MT19937(42)
rng.random_raw()   # value given state offset of 0
rng.random_raw()   # value given state offset of 1
rng = rng.jumped()
rng.random_raw()   # value given state offset of 2**128 +2
# Got: 2382575482

rng = MT19937(42)
rng.random_raw()   # value given state offset of 0
rng = rng.jumped()
rng.random_raw()   # value given state offset of 2**128 +1
rng.random_raw()   # value given state offset of 2**128 +2
# Got: 2001582899

It seems that the current version of numpy (1.19.4) still has a similiar behaviour. Which seems unexpected given what jump should do 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage Just bringing up the point of dubzzz again: Did you hear back anything of the original author of the jump code paper?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently revisited this issue on pure-rand side with the help of Claude. It is still a work-in-progress but it somehow inspired itself from the numpy implementation but made it (in JavaScript) with the jump>next is equivalent to next>jump. I'll will have to re-read it carefully to get why it suddenly worked but given the tests being executed it seems that it made it through.

PR here.

I'm gonna keep you posted when I understand the diff with its implementation and mine (which was translated from numpy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and is easily readable! Do you think it makes sense to refer to the commit (5055c43) or a maybe rather version number (1.18.1 ?) of randomgen, too?

@seberg seberg added this to the 1.19.0 release milestone May 4, 2020
@bashtage
Copy link
Contributor Author

bashtage commented May 4, 2020

Looks like the jumped code is LE only. BE will need (painful) investigation.

@charris
Copy link
Member

charris commented May 4, 2020

Should we issue a warning/error on big endian machines until it is fixed?

@bashtage
Copy link
Contributor Author

bashtage commented May 5, 2020

@charris The bug was in my use of hashlib, not in the C code. CI is passing now.

@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 5, 2020
@charris
Copy link
Member

charris commented May 12, 2020

CircleCI is complaining about a missing reference.

@charris charris closed this May 12, 2020
@charris charris reopened this May 12, 2020
bashtage added 5 commits May 12, 2020 22:39
Use the original loop order instead of an inverted order

closes numpy#15394
Refactor polynomial to be unsigned long array
Remove unused code
Fix md5 calculation on BE
Add note detailing the changes to MT19937 jumped
Remove unused file containing the old polynomial representation.
Clarify the method used and the source of the code
@bashtage bashtage force-pushed the fix-mt19937-jump branch from 8f46630 to 95309c7 Compare May 12, 2020 21:39
Fix indent on reference and remove text that may be incorrect.
@charris
Copy link
Member

charris commented May 12, 2020

Passing now. I'll put this in tomorrow to allow others to comment if they wish.

References
----------
.. [1] Matsumoto, M, Generating multiple disjoint streams of
pseudorandom number sequences. Accessed on: May 6, 2020.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the the title should be in quotes, but I don't know if that makes a difference. I'm not familiar with the citation format we use. @mattip @rkern thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be related to regexp for renaming references in numpydoc... looking into it now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm doesn't seem like it, at least not a problem with the section I looked at:

https://github.com/numpy/numpydoc/blob/b4c5fd17e2b85c2416a5e586933eee353b58bf7c/numpydoc/numpydoc.py#L43-L67

@charris charris merged commit 77410e2 into numpy:master May 13, 2020
@charris
Copy link
Member

charris commented May 13, 2020

Thanks Kevin.

@bashtage bashtage deleted the fix-mt19937-jump branch May 28, 2020 21:44
dubzzz pushed a commit to dubzzz/pure-rand that referenced this pull request Mar 4, 2026
Implement MT19937 jump-ahead by 2^128 steps using Horner's method
on the characteristic polynomial in GF(2).

Key changes:
- Switch next() from batch-twist to lazy gen_next model so that
  jump and next use the same recurrence, guaranteeing commutativity
- Add 624-element jump polynomial (2^128 step) from Hiroshima
  University's jump-ahead algorithm
- Implement jump() via Horner's method with genNext/addState helpers
- Add reference value tests and property-based tests
  (noOrderNextJump, changeSelfWithJump, noChangeOnClonedWithJump)

References:
- Hiroshima University MT jump: http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/JUMP/
- numpy MT19937 jump (corrected): numpy/numpy#16153
- Previous attempt: #12

https://claude.ai/code/session_01Jaz8u7FqjJQUkipr8LPbfr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MT19937 jump does not seem to offset correctly

7 participants