Skip to content

Conversation

@WarrenWeckesser
Copy link
Member

When an 8 or 16 bit dtype was given to the integers() method of the
Generator class, the resulting sample was biased. The problem was
the lines of the form

const uint8_t threshold = -rng_excl % rng_excl;

in the implementations of Lemire's method, in the C file
distributions.c. The intent was to compute
(UINT8_MAX+1 - rng_excl) % rng_excl
However, when the type of rng_excl has integer conversion rank lower
than a C int (which is almost certainly the case for the 8 and 16
bit types), the terms in the expression -rng_excl % rng_excl are
promoted to int, and the result of the calculation is always 0.

The fix is to make the expression explicit, and write it as

const uint8_t threshold = (UINT8_MAX - rng) % rng_excl;

rng is used, because rng_excl is simply rng + 1; by using rng, we
we only need the constant UINT#_MAX, without the extra +1.

For consistency, I made the same change for all the data types
(8, 16, 32 and 64 bit).

Closes gh-14774.

When an 8 or 16 bit dtype was given to the integers() method of the
Generator class, the resulting sample was biased.  The problem was
the lines of the form

    const uint8_t threshold = -rng_excl % rng_excl;

in the implementations of Lemire's method, in the C file
distributions.c.  The intent was to compute
    (UINT8_MAX+1 - rng_excl) % rng_excl
However, when the type of rng_excl has integer conversion rank lower
than a C int (which is almost certainly the case for the 8 and 16
bit types), the terms in the expression -rng_excl % rng_excl are
promoted to int, and the result of the calculation is always 0.

The fix is to make the expression explicit, and write it as

    const uint8_t threshold = (UINT8_MAX - rng) % rng_excl;

rng is used, because rng_excl is simply rng + 1; by using rng, we
we only need the constant UINT#_MAX, without the extra +1.

For consistency, I made the same change for all the data types
(8, 16, 32 and 64 bit).

Closes numpygh-14774.
@charris
Copy link
Member

charris commented Oct 25, 2019

Isn't (UINT8_MAX - rng) the (8 bit) complement of rng? Likewise, (UINT8_MAX+1 - rng_excl) looks like the two's complement of rng_excl.

@WarrenWeckesser
Copy link
Member Author

@charris, yes. There are probably several other ways the expression could be spelled.

@bashtage
Copy link
Contributor

LGTM. Should be backported to 1.17.

@mattip
Copy link
Member

mattip commented Oct 25, 2019

LGTM and there is even a test. Thanks @WarrenWeckesser for the quick fix.

@mattip mattip merged commit 142f291 into numpy:master Oct 25, 2019
@mattip
Copy link
Member

mattip commented Oct 25, 2019

whoops, release note needed

@charris
Copy link
Member

charris commented Oct 25, 2019

I'll do the backport.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 25, 2019
@charris charris removed this from the 1.17.4 release. milestone Oct 25, 2019
@WarrenWeckesser WarrenWeckesser added this to the 1.18.0 release milestone Oct 25, 2019
@WarrenWeckesser
Copy link
Member Author

@mattip: PR with release note is #14782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bias of random.integers() with int8 dtype

4 participants