BUG: Get full precision for 32 bit floating point random values. by WarrenWeckesser · Pull Request #20314 · numpy/numpy

WarrenWeckesser · 2021-11-06T07:01:50Z

The formula to convert a 32 bit random integer to a random float32,

(next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f)

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

(next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f)

Occurrences of the incorrect formula in numpy/random/tests/test_direct.py
were also corrected.

Closes gh-17478.

The formula to convert a 32 bit random integer to a random float32, (next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f) shifts by one bit too many, resulting in uniform float32 samples always having a 0 in the least significant bit. The formula is corrected to (next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f) Occurrences of the incorrect formula in numpy/random/tests/test_direct.py were also corrected. Closes numpygh-17478.

seberg · 2021-11-08T18:26:17Z

@bashtage or @rkern the changes look good. Could you make a quick call with respect to only modifying the new API and whether this should have a release note?

bashtage · 2021-11-08T18:30:54Z

Change is safe. Whether it requires a release not only matters if you think the ls bit in a 32 bit float is worth one. Probably best to be safe.

WarrenWeckesser · 2021-11-08T21:38:43Z

I added a release note about the change.

WarrenWeckesser · 2021-11-10T23:59:38Z

Here's a script that demonstrates the changes in the variates that can occur, and shows why a release note is warranted.

Script to print random samples

import numpy as np


print(f"numpy version {np.__version__}")
print()

seed = 98765432109
print(f"seed: {seed}")
print()

print("rng.random")
rng = np.random.default_rng(seed)
x = rng.random(12, dtype=np.float32)
print("first 12 samples:")
print(x)
print()

n = 500_000_000

print("rng.standard_exponential")
rng = np.random.default_rng(seed)
x = rng.standard_exponential(size=n, dtype=np.float32)
x1 = x[-1]
x = rng.standard_exponential(size=n, dtype=np.float32)
x2 = x[-1]
print(f"last sample of {n:10d}:", x1)
print(f"last sample of {2*n:10d}:", x2)

print()

k = 2.5
print(f"rng.standard_gamma, k={k}")
rng = np.random.default_rng(seed)
x = rng.standard_gamma(2.5, size=n, dtype=np.float32)
print("first 14 samples:")
print(x[:14])
print(f"last sample of {n}:", x[-1])

The output for the current main development branch:

numpy version 1.22.0.dev0+1733.g8dbd507fb

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.72002673 0.00234556 0.49236786 0.16807711 0.845093
 0.06266415 0.48290312 0.80823255 0.9720112  0.01573467 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 2.664465

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 0.46255943

The output for this pull request:

numpy version 1.22.0.dev0+1688.ge5af24d51

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.7200268  0.00234556 0.49236786 0.16807711 0.845093
 0.06266421 0.48290312 0.80823255 0.97201127 0.01573473 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 0.71304655

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 4.6183786

You can see the small variation in the ULP of the output of rng.random.

The outputs for rng.standard_exponential and rng.standard_gamma show no differences at first, but eventually, the difference in the ULP of the values generated by next_float cause a different branch to be taken in their iterative algorithms, resulting in large changes in the stream of variates.

seberg · 2021-11-12T18:48:11Z

Thanks @WarrenWeckesser! As far as I see, this only affects the new API, since this is dtype=np.float32 and the old API does not support for this all functions. So there are no stream-compat concerns.

However, since the streams do change I am removing the backport candidate label. Please just re-add if you disagree!

bashtage · 2021-11-12T20:08:20Z

This seems to be nearly an enhancement, even though is also a bug. While the intent was clearly to provide the maximum number of random bits, the nature of the bug only resulted in slightly less random values in most plausible scenarios.

- numpy/numpy#20314 - numba/numba#7754

The formula to convert a 32 bit random integer to a random float32, (((rng)->next_uint32((rng)->state) >> 9) * (1.0f / 8388608.0f)) shifts by one bit too many, resulting in uniform float32 samples always having a 0 in the least significant bit. The formula is corrected to (((rng)->next_uint32((rng)->state) >> 8) * (1.0f / 16777216.0f)) See numpy/numpy#20314 for more details.

WarrenWeckesser added 00 - Bug component: numpy.random labels Nov 6, 2021

WarrenWeckesser force-pushed the float32-rand-unused-bit branch from d754442 to 4b9e569 Compare November 6, 2021 07:21

charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 6, 2021

DOC: Add release note about the fix for 32 bit float random variates.

e5af24d

seberg merged commit 1995e2c into numpy:main Nov 12, 2021

seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2021

WarrenWeckesser deleted the float32-rand-unused-bit branch November 12, 2021 19:42

davemfish mentioned this pull request Jan 3, 2022

numpy 1.22 compatibility natcap/invest#796

Closed

ahirner added a commit to MoonVision/moonbox-docker that referenced this pull request Apr 16, 2022

downgrade numpy to 1.21 for downstream compat

b917fbd

- numpy/numpy#20314 - numba/numba#7754

stuartarchibald mentioned this pull request May 17, 2022

Support for Numpy BitGenerators PR#1 - Core Generator Support numba/numba#8031

Merged

zoj613 mentioned this pull request Jun 17, 2022

MAINT: Get full precision for 32 bit floating point random values. zoj613/polyagamma#111

Closed

zoj613 mentioned this pull request Jun 22, 2022

MAINT: Get full precision for 32 bit floating point random values. zoj613/polyagamma#117

Merged

alecandido mentioned this pull request Mar 17, 2023

Evolve n3fit with eko 0.12 NNPDF/nnpdf#1694

Merged

aegkmq mentioned this pull request Jul 27, 2023

Replace the rand() with a portable rng karpathy/llama2.c#138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Get full precision for 32 bit floating point random values.#20314

BUG: Get full precision for 32 bit floating point random values.#20314
seberg merged 2 commits intonumpy:mainfrom
WarrenWeckesser:float32-rand-unused-bit

WarrenWeckesser commented Nov 6, 2021

Uh oh!

seberg commented Nov 8, 2021

Uh oh!

bashtage commented Nov 8, 2021

Uh oh!

WarrenWeckesser commented Nov 8, 2021

Uh oh!

WarrenWeckesser commented Nov 10, 2021 •

edited

Loading

Uh oh!

seberg commented Nov 12, 2021

Uh oh!

bashtage commented Nov 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

WarrenWeckesser commented Nov 6, 2021

Uh oh!

seberg commented Nov 8, 2021

Uh oh!

bashtage commented Nov 8, 2021

Uh oh!

WarrenWeckesser commented Nov 8, 2021

Uh oh!

WarrenWeckesser commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Nov 12, 2021

Uh oh!

bashtage commented Nov 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WarrenWeckesser commented Nov 10, 2021 •

edited

Loading