BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14777

WarrenWeckesser · 2019-10-25T02:08:12Z

When an 8 or 16 bit dtype was given to the integers() method of the
Generator class, the resulting sample was biased. The problem was
the lines of the form

const uint8_t threshold = -rng_excl % rng_excl;

in the implementations of Lemire's method, in the C file
distributions.c. The intent was to compute
(UINT8_MAX+1 - rng_excl) % rng_excl
However, when the type of rng_excl has integer conversion rank lower
than a C int (which is almost certainly the case for the 8 and 16
bit types), the terms in the expression -rng_excl % rng_excl are
promoted to int, and the result of the calculation is always 0.

The fix is to make the expression explicit, and write it as

const uint8_t threshold = (UINT8_MAX - rng) % rng_excl;

rng is used, because rng_excl is simply rng + 1; by using rng, we
we only need the constant UINT#_MAX, without the extra +1.

For consistency, I made the same change for all the data types
(8, 16, 32 and 64 bit).

Closes gh-14774.

When an 8 or 16 bit dtype was given to the integers() method of the Generator class, the resulting sample was biased. The problem was the lines of the form const uint8_t threshold = -rng_excl % rng_excl; in the implementations of Lemire's method, in the C file distributions.c. The intent was to compute (UINT8_MAX+1 - rng_excl) % rng_excl However, when the type of rng_excl has integer conversion rank lower than a C int (which is almost certainly the case for the 8 and 16 bit types), the terms in the expression -rng_excl % rng_excl are promoted to int, and the result of the calculation is always 0. The fix is to make the expression explicit, and write it as const uint8_t threshold = (UINT8_MAX - rng) % rng_excl; rng is used, because rng_excl is simply rng + 1; by using rng, we we only need the constant UINT#_MAX, without the extra +1. For consistency, I made the same change for all the data types (8, 16, 32 and 64 bit). Closes numpygh-14774.

charris · 2019-10-25T02:40:32Z

Isn't (UINT8_MAX - rng) the (8 bit) complement of rng? Likewise, (UINT8_MAX+1 - rng_excl) looks like the two's complement of rng_excl.

WarrenWeckesser · 2019-10-25T02:51:48Z

@charris, yes. There are probably several other ways the expression could be spelled.

bashtage · 2019-10-25T07:58:57Z

LGTM. Should be backported to 1.17.

mattip · 2019-10-25T08:22:51Z

LGTM and there is even a test. Thanks @WarrenWeckesser for the quick fix.

mattip · 2019-10-25T08:37:16Z

whoops, release note needed

charris · 2019-10-25T12:45:11Z

I'll do the backport.

WarrenWeckesser · 2019-10-25T14:27:37Z

@mattip: PR with release note is #14782

WarrenWeckesser mentioned this pull request Oct 25, 2019

Bias of random.integers() with int8 dtype #14774

Closed

WarrenWeckesser added 00 - Bug component: numpy.random labels Oct 25, 2019

charris added this to the 1.17.4 release. milestone Oct 25, 2019

charris added the 09 - Backport-Candidate PRs tagged should be backported label Oct 25, 2019

mattip merged commit 142f291 into numpy:master Oct 25, 2019

charris mentioned this pull request Oct 25, 2019

BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14781

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 25, 2019

charris removed this from the 1.17.4 release. milestone Oct 25, 2019

WarrenWeckesser added this to the 1.18.0 release milestone Oct 25, 2019

WarrenWeckesser deleted the bug-lemire branch October 25, 2019 14:27

WarrenWeckesser mentioned this pull request Apr 24, 2020

default_rng.integers(2**32) always return 0 #16066

Closed

iyanmv mentioned this pull request Nov 18, 2021

Use new numpy.random.default_rng() mhostetter/galois#204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14777

BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14777

Uh oh!

WarrenWeckesser commented Oct 25, 2019

Uh oh!

charris commented Oct 25, 2019

Uh oh!

WarrenWeckesser commented Oct 25, 2019

Uh oh!

bashtage commented Oct 25, 2019

Uh oh!

mattip commented Oct 25, 2019

Uh oh!

mattip commented Oct 25, 2019

Uh oh!

charris commented Oct 25, 2019

Uh oh!

WarrenWeckesser commented Oct 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14777

BUG: random: biased samples from integers() with 8 or 16 bit dtype. #14777

Uh oh!

Conversation

WarrenWeckesser commented Oct 25, 2019

Uh oh!

charris commented Oct 25, 2019

Uh oh!

WarrenWeckesser commented Oct 25, 2019

Uh oh!

bashtage commented Oct 25, 2019

Uh oh!

mattip commented Oct 25, 2019

Uh oh!

mattip commented Oct 25, 2019

Uh oh!

charris commented Oct 25, 2019

Uh oh!

WarrenWeckesser commented Oct 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants