feat: bounded size auto sharding via zarr.config#3574
feat: bounded size auto sharding via zarr.config#3574d-v-b merged 11 commits intozarr-developers:mainfrom
auto sharding via zarr.config#3574Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3574 +/- ##
==========================================
+ Coverage 61.86% 61.90% +0.03%
==========================================
Files 85 85
Lines 10137 10150 +13
==========================================
+ Hits 6271 6283 +12
- Misses 3866 3867 +1
🚀 New features to boost your workflow:
|
|
Not sure I fully grasp what is going on with codecov, it looks like the tests run and assert the things they should and use the funcionality promised. |
|
In SpatialData we could also have a use case for this. Many users are not as experienced with the whole concept of sharding and chunking, so having the option to provide a maximum size with some explanation in documentation would be very beneficial. |
|
I was curious how @jhamman Is this true? Would |
|
@ilan-gold I think that is correct at the moment. See also pydata/xarray#9947. In dask I am working on having the interface just provide a |
tests/test_config.py
Outdated
| "array": { | ||
| "order": "C", | ||
| "write_empty_chunks": False, | ||
| "max_bytes_per_shard_for_auto_sharding": None, |
There was a problem hiding this comment.
can we find a denser name for this parameter? Something like max_shard_size_bytes?
There was a problem hiding this comment.
Also perhaps something more accurate? Like target_shard_size_bytes? It's not meant as a bound but a target
d-v-b
left a comment
There was a problem hiding this comment.
2 small requests but otherwise this looks great, thanks @ilan-gold
…old/zarr-python into ig/chunk_grid_auto_config_setting
|
@d-v-b What is the take on the fact that this is pre-compression? Something I just realized messing around with this. Should we document that more clearly? |
Maybe just indicate in the docs that this setting governs the in-memory size? Also, how will this handle data types that don't have a computable size per element (like variable-length bytes)? |
It takes in the same arguments as zarr-python/src/zarr/core/array.py Lines 4678 to 4680 in 7d8105f The docs say for chunking: I could add an explanation of the heuristic for both sharding and chunking a bit here, or at least specify what to expect based on data type. |
closes #3568
TODO:
docs/user-guide/*.mdchanges/