Informative error if chunks are unknown while slicing#9188
Informative error if chunks are unknown while slicing#9188pavithraes wants to merge 1 commit intodask:mainfrom
Conversation
|
It looks like this change is causing bunch of other tests to fail. I think some of them are not to hard to fix, because they are marked as xfail due to the problem this PR is solving. But I'm a bit confused about certain cases where it seems we are ok, having nan chunks ? For example the test Is this the expected behavior? because if that's the case then the fix we added here breaks this. As any appearance of nan in self.chunks will raise a ValueError now. |
ian-r-rose
left a comment
There was a problem hiding this comment.
Thanks @pavithraes! The fix looks good to me, I just have a suggestion to improve the test
|
|
||
| def test_slicing_and_unknown_chunks(): | ||
| a = da.ones((10, 5), chunks=5) | ||
| a._chunks = ((np.nan, np.nan), (5,)) |
There was a problem hiding this comment.
Rather than setting the chunks to nan, I think it would be better to write some user code that we know results in unknown chunks (like slicing with a boolean):
a = da.random.uniform(size=10, chunks=2)
a = a[a > 0.5]There was a problem hiding this comment.
This example doesn't seem to result in unknown chunks, I tried this and it has no problems. There are certain situations where we have nan in the chunks but we have ways of solving this. I'm not sure what's the right example that triggers the type of unknown chunks that causes this problem.
|
We have multiple situations where we were able to check for but now this check will fail with the fix on this PR. So test like the one below fail, because the ValueError raises one step before the test_index_array_with_array_1d I think we might need to discuss how to fix all these tests, and what we expect to work or not. Basically with this change any time that there is an unknown chunk, it'll raise as soon as we do any operation that involves |
|
After chatting with Julia for a bit, it seems that finding the right place where to raise the error is not quite possible, or at least we weren't able to sort this out. We have multiple situations where dask can do its magic even though there are nan values in the chunks but not always. It seems the best situation, in this case, is to catch the exception in the compute, and give a more informative error there. Because I don't have push access to this PR, I continued the work in #9285 |
Thank you, @ncclementi! I'll close this one then. :) |
Thanks to @ncclementi and @phobson who figured out the exact solution!
pre-commit run --all-files