Skip to content

Use np.random.Generator to generate random numbers#9005

Closed
erayaslan wants to merge 12 commits intodask:mainfrom
erayaslan:da-np-generator
Closed

Use np.random.Generator to generate random numbers#9005
erayaslan wants to merge 12 commits intodask:mainfrom
erayaslan:da-np-generator

Conversation

@erayaslan
Copy link
Copy Markdown
Contributor

First pass at using np.random.Generator instead of deprecated np.random.RandomState class. Keep backward compatibility and make deprecation and switching over easy. Please let me know if an alternative approach is preferable.

Had a quick look and it should be relatively easy to change internal RandomState usage to Generator

np.random.Generator is the replacement for np.random.RandomState which
is effectively frozen. Use Generator by default and keep RandomState for
now for backward compatibility.

Fixes (dask#8790)
do not assume that random has only one argument (aside from chunks).
case in point: np.random.Generator breaks this assumption.
and so does not raise the exception. Use standard_cauchy instead
and test permutations with both RandomState and Generator classes
as it is no longer available with np.random.Generator. Use
da.random.integers instead
@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@github-actions github-actions bot added array documentation Improve or add to documentation labels Apr 30, 2022
old da.random.random signature:
  random.RandomState.random_sample(size=None)

new da.random.random signature:
  random.Generator.random(size=None, dtype=np.float64, out=None)

we cannot continue using random_sample as it is removed from new
generators.  in other words, chunks needs to be specified and not
assumed to be the second argument
@erayaslan
Copy link
Copy Markdown
Contributor Author

1/
Regarding commits b5eed09 and bcd00d9:

numpy.random.RandomState had random_sample - which we aliased to random - which had the following signature:

random.RandomState.random_sample(size=None)

random_sample is removed in numpy.random.Generator and is replaced with random which has the following signature:

random.Generator.random(size=None, dtype=np.float64, out=None)

so, da.random.random(5,5) used to work with RandomState but errors out with Generator. Instead we need to use da.random.random(5, chunks=5)

In other words, this is a breaking change. Are we allowed to do that?

2/
windows tests fail dask/array/tests/test_random.py::test_choice with:

[gw2] win32 -- Python 3.10.4 C:\Miniconda3\envs\test-environment\python.exe

def test_choice():
    np_dtype = np.random.choice(1, size=()).dtype
    size = (10, 3)
    chunks = 4
    x = da.random.choice(3, size=size, chunks=chunks)
    assert x.dtype == np_dtype
    assert x.shape == size
    res = x.compute()
    assert res.dtype == np_dtype

E AssertionError: assert dtype('int64') == dtype('int32')

I do not have easy access to a windows environment and it looks like a known windows weirdness. How should we proceed?

@erayaslan
Copy link
Copy Markdown
Contributor Author

erayaslan commented May 1, 2022

so all tests pass but

1/ All platforms:
da.random.random(5, 5) used to work but raises TypeError now. Use da.random.random(5, chunks=5) instead

2/ Windows Only:
da.random.choice() dtype changed from int32 to int64

Is this acceptable?

@quasiben
Copy link
Copy Markdown
Member

quasiben commented May 2, 2022

add to allowlist

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I think it's broadly fine to change behavior, but if possible we try to deprecate or at least raise an informative error.

@erayaslan
Copy link
Copy Markdown
Contributor Author

continued on PR #9038. Closing

@erayaslan erayaslan closed this May 5, 2022
@erayaslan erayaslan deleted the da-np-generator branch February 17, 2023 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

array documentation Improve or add to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dask.array.default_rng

4 participants