Skip to content

Raise if missing virtual chunk containers#774

Merged
TomNicholas merged 29 commits intozarr-developers:mainfrom
TomNicholas:raise_if_missing_virtual_chunk_containers
Aug 14, 2025
Merged

Raise if missing virtual chunk containers#774
TomNicholas merged 29 commits intozarr-developers:mainfrom
TomNicholas:raise_if_missing_virtual_chunk_containers

Conversation

@TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Aug 13, 2025

Closes #763 using suggestion in #763 (comment), and supercedes #767.

@TomNicholas TomNicholas added the Icechunk 🧊 Relates to Icechunk library / spec label Aug 13, 2025
Comment on lines +378 to +393
assert diff_is_empty(diff), diff


def diff_is_empty(diff: "Diff") -> bool:
# TODO would be nicer to implement __bool__ on icechunk's Diff class
return not any(
[
bool(diff.deleted_arrays),
bool(diff.deleted_groups),
bool(diff.new_arrays),
bool(diff.new_groups),
bool(diff.updated_arrays),
bool(diff.updated_chunks),
bool(diff.updated_groups),
]
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paraseba if Diff had this as a new method

class Diff:
    @property
    def __bool__(self) -> bool:
        """Returns true if there are any changes whatsoever."""
        return any(
            [
                bool(diff.deleted_arrays),
                bool(diff.deleted_groups),
                bool(diff.new_arrays),
                bool(diff.new_groups),
                bool(diff.updated_arrays),
                bool(diff.updated_chunks),
                bool(diff.updated_groups),
            ]
        )

then I could simplify my check to just

Suggested change
assert diff_is_empty(diff), diff
def diff_is_empty(diff: "Diff") -> bool:
# TODO would be nicer to implement __bool__ on icechunk's Diff class
return not any(
[
bool(diff.deleted_arrays),
bool(diff.deleted_groups),
bool(diff.new_arrays),
bool(diff.new_groups),
bool(diff.updated_arrays),
bool(diff.updated_chunks),
bool(diff.updated_groups),
]
)
assert not diff, diff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request tracked upstream in earth-mover/icechunk#1165

virtual_datasets = [pair[1] for pair in paths_and_virtual_datasets]

store_path = StorePath(store, path="" if at_root else tree.relative_to(vdt))
validate_virtual_chunk_containers(store.session.config, virtual_datasets)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line requires an unreleased change in icechunk: earth-mover/icechunk#1164

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you can't do session.config

@TomNicholas TomNicholas added the test-upstream Run the upstream tests on this PR label Aug 13, 2025
)


def validate_virtual_chunk_containers(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda feel like I should be able to just call some public Icechunk API here to do this instead of doing it myself. Like there is currently no way to validate a virtual reference without then immediately setting it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised earth-mover/icechunk#1167 (comment) upstream to track this idea.

@TomNicholas
Copy link
Member Author

The upstream tests are now passing, which means that oncee Icechunk is released (see #774 (comment)) this can be merged (and then this PR would need to pin to that new IC release).

@TomNicholas
Copy link
Member Author

IC was just released

@TomNicholas TomNicholas merged commit 9261537 into zarr-developers:main Aug 14, 2025
13 checks passed
@TomNicholas TomNicholas deleted the raise_if_missing_virtual_chunk_containers branch August 14, 2025 13:08
@maxrjones
Copy link
Member

@TomNicholas I think this should also be listed as a breaking change in the release notes, since we're using True as the default. It would be helpful to include a short description of why this default change is needed.

@TomNicholas
Copy link
Member Author

since we're using True as the default.

We were already using True as the default - the breaking change is that we're actually enforcing it!

But I agree that this effectively constitutes a breaking change - I will add it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Icechunk 🧊 Relates to Icechunk library / spec test-upstream Run the upstream tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decoding of virtual chunks from Icechunk fails when creating an array with an ArrayBytesCodec serializer.

3 participants