Skip to content

use integer multiple of shard shape when rechunking in to_zarr#12106

Merged
dcherian merged 9 commits intodask:mainfrom
d-v-b:fix/to-zarr-sharding-2
Oct 29, 2025
Merged

use integer multiple of shard shape when rechunking in to_zarr#12106
dcherian merged 9 commits intodask:mainfrom
d-v-b:fix/to-zarr-sharding-2

Conversation

@d-v-b
Copy link
Copy Markdown
Member

@d-v-b d-v-b commented Oct 23, 2025

Improves on #12105 by using the shard shape as the previous_chunks parameter instead of chunks. This results in rechunking with with larger chunks that are still shard-aligned. Thanks to @dcherian for reminding me that auto_chunks nearly always concatenates the previous_chunks argument.

But not always. This change brings the possibility of shard-misaligned chunks if the global configuration declares a memory limit smaller than the shard size. What should we do when there's competition between the global config and shard shape of the zarr array? Would an exception be appropriate here, prompting the user to change the config?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 23, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      9 files  ± 0        9 suites  ±0   3h 9m 45s ⏱️ + 4m 47s
 18 130 tests + 3   16 915 ✅ + 3   1 215 💤 ±0  0 ❌ ±0 
162 398 runs  +27  150 305 ✅ +23  12 093 💤 +4  0 ❌ ±0 

Results for commit 1b6ccd6. ± Comparison against base commit 06e75c7.

This pull request removes 1 and adds 4 tests. Note that renamed tests count towards both.
dask.array.tests.test_array_core ‑ test_zarr_chunk_shards_mismatch_warns
dask.array.tests.test_array_core ‑ test_zarr_risky_shards_warns
dask.array.tests.test_array_core ‑ test_zarr_to_zarr_shards[None]
dask.array.tests.test_array_core ‑ test_zarr_to_zarr_shards[all]
dask.array.tests.test_array_core ‑ test_zarr_to_zarr_shards[half]

♻️ This comment has been updated with latest results.

@dcherian
Copy link
Copy Markdown
Collaborator

What should we do when there's competition between the global config and shard shape of the zarr array?

I think a performance warning would be ok. There's just always a tradeoff with chunks="auto"...

@jacobtomlinson
Copy link
Copy Markdown
Member

@dcherian really appreciate your reviews here. I had a chat with other Dask maintainers and we would like to give you contributor access to this repo. Please feel empowered to review and merge PRs like this one where you have expertise.

@dcherian
Copy link
Copy Markdown
Collaborator

Thanks @jacobtomlinson

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
d-v-b and others added 2 commits October 24, 2025 16:46
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@d-v-b
Copy link
Copy Markdown
Member Author

d-v-b commented Oct 29, 2025

do we need anything else here?

@dcherian
Copy link
Copy Markdown
Collaborator

can you solve the merge conflict please?

@d-v-b
Copy link
Copy Markdown
Member Author

d-v-b commented Oct 29, 2025

can you solve the merge conflict please?

done!

@dcherian
Copy link
Copy Markdown
Collaborator

I think this is right; certainly it is an improvement to write safety. So let's merge.

@dcherian dcherian merged commit 9915a79 into dask:main Oct 29, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants