use shard shape when available in to_zarr#12105
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 9 files ±0 9 suites ±0 3h 9m 11s ⏱️ - 2m 19s Results for commit 2ea63bb. ± Comparison against base commit 287f149. ♻️ This comment has been updated with latest results. |
|
Thanks. I think it's worth documenting this behavior, since this increases the number of tasks and |
Does it? We are rechunking today to the chunk shape given by the |
|
Nevermind. I think I just got the relationship between chunks and shards backwards (again) :/ |
jacobtomlinson
left a comment
There was a problem hiding this comment.
Thanks @d-v-b and thanks for the review @TomAugspurger
|
I do think @TomAugspurger is right that we need to document this behavior. Perhaps the best solution would be for dask to link to a dedicated "using zarr with parallel computing" section of the Zarr docs. I don't think we have one written yet. |
|
something for a future PR |
|
Makes sense, thanks. |
This PR alters the behavior of
to_zarrto use theshardsattribute, when available, as the dask chunk shape when rechunking into_zarr.to_zarrcurrently rechunks to the chunk shape of the zarr array. Attempting to concurrently write chunks for a sharded array will lead to data loss and is not a safe default. For zarr v3 arrays that use sharding, theshardsattribute and not thechunksis a the safe default, hence the changes in this PR.