ensure that the shard shape is used as the default chunk shape for sharded Zarr arrays by d-v-b · Pull Request #12104 · dask/dask

d-v-b · 2025-10-20T16:20:38Z

Zarr V3 added the shards attribute to Zarr arrays. When the shards attribute is not None it defines the smallest unit of the array that is safe to write concurrently (outside of exceptional circumstances). Thus for these Zarr arrays the default chunk size of a dask array should be the shard shape (or an integer multiple of the shard shape).

This PR changes the behavior of from_array to check for a shards attribute, and if it is present, uses that attribute as the chunks parameter if chunks was previously set to auto. This ensures that Zarr arrays have a default dask chunking that is safe for concurrent writes.

…rr arrays

github-actions · 2025-10-20T17:15:28Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

9 files ±0 9 suites ±0 3h 12m 44s ⏱️ - 2m 36s
18 127 tests +1 16 912 ✅ +1 1 215 💤 ±0 0 ❌ ±0
162 371 runs +9 150 285 ✅ +7 12 086 💤 +2 0 ❌ ±0

Results for commit 8b636de. ± Comparison against base commit 0743e08.

♻️ This comment has been updated with latest results.

d-v-b · 2025-10-21T12:44:19Z

cc @TomAugspurger

jacobtomlinson

This seems reasonable to me. Looks like there are some conflicts to be resolved.

…arr-chunk-default

dcherian · 2025-10-22T12:15:13Z

dask/array/core.py

+        and (x.shards is not None)
+        and chunks == "auto"
+    ):
+        chunks = x.shards


@d-v-b There is an equivalent change to Xarray that would be good to make: https://github.com/pydata/xarray/blob/19f2973528df7e423aae85184f7c65bf7de9cb2e/xarray/backends/zarr.py#L858

TomAugspurger

One question, and we'll want to document this.

TomAugspurger · 2025-10-22T18:46:39Z

dask/array/core.py

+        and (x.shards is not None)
+        and chunks == "auto"
+    ):
+        chunks = x.shards


Why are we assigning this to chunks rather than previous_chunks? I'd expect us to handle it the same as .chunks on a zarr v2 array.

previous_chunks is a suggestion to normalize_chunks, which might be ignored in favor of satisfying other constraints like keeping below a memory target. But for sharded arrays, my judgment is that the shard shape defines a safe default value, which is a bit more strict than just a suggestion. Any multiple of the shard shape would work as well, but I don't think there's way to tell normalize_chunks "only use multiples of this chunk shape"?

Doesn't normalize_chunks determine a multiple of preferred_chunks for "auto"

most of the time it does, but it prioritizes the memory limit parameter over keeping chunks contiguous:

chunks = normalize_chunks( "auto", shape=(300,), previous_chunks=(100,), limit="13B", dtype=np.uint8 ) # chunks are split to fit under the memory limit assert chunks == ((*((13,) * 23), 1,),)

That being said, I think using previous_chunks = shards is still a better default so I made that change.

Should we guard against the possibility that the chunk memory limit (which I think is fetched from the global config if limit=None) results in splitting shards? I don't think there's currently a way to say "no limit", but we could set the limit to the memory size of a single shard.

d-v-b · 2025-10-29T11:53:21Z

Is there anything else we need to do here?

jacobtomlinson · 2025-10-29T15:16:14Z

I'm happy for @dcherian to hit merge here if he has no further feedback.

dcherian · 2025-10-29T16:14:56Z

let's do it!

d-v-b · 2025-10-29T16:32:14Z

thanks @dcherian!

d-v-b added 3 commits October 20, 2025 18:13

ensure that the shard shape is used as the default chunk shape for Za…

0a83c74

…rr arrays

fix conditional

3b39ae6

lint

2be0486

don't run sharding test when zarr python < 3

628fd56

d-v-b mentioned this pull request Oct 21, 2025

Support for sharding when storing dask arrays to zarr #11778

Closed

jacobtomlinson approved these changes Oct 22, 2025

View reviewed changes

Merge branch 'main' of https://github.com/dask/dask into fix/better-z…

ba1fb80

…arr-chunk-default

dcherian approved these changes Oct 22, 2025

View reviewed changes

dcherian reviewed Oct 22, 2025

View reviewed changes

TomAugspurger approved these changes Oct 22, 2025

View reviewed changes

update from_array to use shards as previous chunks instead of chunks

d42cee9

wietzesuijker mentioned this pull request Oct 24, 2025

feat: let Dask use shard-aligned writes when sharding enabled EOPF-Explorer/data-model#54

Merged

Merge branch 'main' into fix/better-zarr-chunk-default

8b636de

dcherian merged commit 06e75c7 into dask:main Oct 29, 2025
23 of 24 checks passed

d-v-b deleted the fix/better-zarr-chunk-default branch October 29, 2025 16:32

Uh oh!

Conversation

d-v-b commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

d-v-b commented Oct 21, 2025

Uh oh!

jacobtomlinson left a comment

Choose a reason for hiding this comment

Uh oh!

dcherian Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Oct 29, 2025

Uh oh!

jacobtomlinson commented Oct 29, 2025

Uh oh!

dcherian commented Oct 29, 2025

Uh oh!

Uh oh!

d-v-b commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Oct 20, 2025 •

edited

Loading