Don't write empty chunks to icechunk#745
Conversation
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #745 +/- ##
=======================================
Coverage 87.55% 87.55%
=======================================
Files 35 35
Lines 1848 1848
=======================================
Hits 1618 1618
Misses 230 230
🚀 New features to boost your workflow:
|
|
FYI I'm starting to question this choice. It would be nice if Zarr/Icechunk had a better way to denote chunks comprised only of fill values than just using the absence of a chunk reference (i.e., we should make this explicit rather than implicit). I've asked if there is a formal definition of the expectations related "missing" chunks via the core Zarr spec or extensions in #Zarr > specification for the meaning of missing chunks. |
|
I agree that Zarr conflating uninitialized chunks with chunks containing only fill values is really annoying. But there is nothing we can do here unless the Zarr format gains a way to differentiate? |
Does icechunk have a better way to differentiate? I tried reading through the IC spec but could not tell how that's handled. It seems like pure fill value chunks could fit as a specific type of chunk ref though, even before there's a pure Zarr solution. |
How would this work exactly? It's pretty important that creating a Zarr array only requires writing a single metadata document. This entails that the entire array (based on just 1 metadata document) must be consistently readable, even though no data has been written. This means that there must be some specification for what's inside the array, and that can only be contained in the metadata document. That value must be consistent with the data type for the array. This leads pretty directly to the |
|
In Zarr, the point of If this is a concern, it would be wise to choose a |
FWIW in netCDF this is the historical meaning of
I agree. You could return uninitialized memory so |
AFAICT some substantial fraction of the Zarr community uses it to indicate "valid but background value", hence the historical choice of 0 as default fill value (!!!!) |
yes, this is very common in life sciences, where images are often intensities, and 0 is on the lower end |
agreed, and we could make this even easier on the data type side by offering nullable numeric data types (e.g., |
This should fix the problem exposed in #740
docs/releases.rstNew functions/methods are listed inapi.rstNew functionality has documentation