Skip to content

Unknown chunk slicing - raise informative error#9285

Merged
jsignell merged 4 commits intodask:mainfrom
ncclementi:unknown-chunk-slicing
Jul 20, 2022
Merged

Unknown chunk slicing - raise informative error#9285
jsignell merged 4 commits intodask:mainfrom
ncclementi:unknown-chunk-slicing

Conversation

@ncclementi
Copy link
Member

@ncclementi ncclementi commented Jul 16, 2022

Thank you @jsignell for all the help and suggestions to get this one unlocked : )

cc: @pavithraes @ian-r-rose

@ncclementi ncclementi marked this pull request as ready for review July 16, 2022 22:53

def test_slicing_and_unknown_chunks():
a = da.ones((10, 5), chunks=5)
a._chunks = ((np.nan, np.nan), (5,))
Copy link
Member

@jsignell jsignell Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ian-r-rose suggested coming to empty chinks naturally rather than removing them like this. Something like:

Suggested change
a._chunks = ((np.nan, np.nan), (5,))
a = a[a > 20]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to post why I didn't do this.

This example doesn't seem to result in unknown chunks, that trigger the issue we are trying to resolve.
I tried this and it had no problems. There are certain situations where we have nan in the chunks but we have ways of solving this. I'm not sure what's the right example that triggers the type of unknown chunks that causes this problem.

It seems we do this overwriting of the chinks in other places, see line 935 of this bit

@pytest.mark.parametrize(
"chunks",
[
((1, 1, 1, 1), (np.nan,), (np.nan,)),
pytest.param(
((np.nan, np.nan, np.nan, np.nan), (500,), (500,)),
marks=pytest.mark.xfail(reason="https://github.com/dask/dask/issues/6586"),
),
],
)
def test_getitem_avoids_large_chunks_missing(chunks):
# We cannot apply the "avoid large chunks" optimization when
# the chunks have unknown sizes.
with dask.config.set({"array.slicing.split-large-chunks": True}):
a = np.arange(4 * 500 * 500).reshape(4, 500, 500)
arr = da.from_array(a, chunks=(1, 500, 500))
arr._chunks = chunks
indexer = [0, 1] + [2] * 100 + [3]
expected = a[indexer]
result = arr[indexer]
assert_eq(result, expected)
. In fact I was wondering if we should remove the xfail of this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. It's certainly not a big deal if it doesn't just work. Thanks for the explanation.

@jsignell jsignell merged commit ef3eb9e into dask:main Jul 20, 2022
@pavithraes
Copy link
Member

🎉

@ncclementi ncclementi deleted the unknown-chunk-slicing branch August 31, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexError when slicing dask Array with unknown chunk sizes.

3 participants