basic support for v2 and v3 groups#1590
Conversation
|
Hello @jhamman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2023-12-06 15:13:06 UTC |
jhamman
left a comment
There was a problem hiding this comment.
A few thoughts below 👇
| attributes: Dict[str, Any] = field(factory=dict) | ||
| zarr_format: Literal[3] = 3 | ||
| node_type: Literal["group"] = "group" | ||
| zarr_format: Literal[2, 3] = 3 # field(default=3, validator=validators.in_([2, 3])) |
There was a problem hiding this comment.
@normanrz - I tried this and failed! Happy to return to it later.
zarr/v3/abc/group.py
Outdated
|
|
||
| @abstractmethod | ||
| def __setitem__(self, key: str, value: Union[SyncArray, "SyncGroup"]) -> None: | ||
| """get child""" |
There was a problem hiding this comment.
| """get child""" |
Proposing that we do away with Group.__setitem__ all together. It really doesn't make sense to include in the api.
zarr/v3/abc/group.py
Outdated
| @abstractmethod | ||
| def group_keys(self) -> AsyncIterator[str]: | ||
| """iterate over child group keys""" | ||
| ... |
There was a problem hiding this comment.
all of these iterators need listable stores before they can be built.
| @frozen | ||
| class SyncConfiguration: | ||
| concurrency: Optional[int] = None | ||
| asyncio_loop: Optional[AbstractEventLoop] = None |
There was a problem hiding this comment.
this was a half hearted attempt at rethinking the config topic. I expect this to go away soon.
| def __setitem__(self, key, value): | ||
| raise NotImplementedError |
There was a problem hiding this comment.
| def __setitem__(self, key, value): | |
| raise NotImplementedError |
zarr/v3/group.py
Outdated
| return self._sync(self._async_group.nchildren) | ||
|
|
||
| @property | ||
| def children(self) -> List[Array, "Group"]: |
There was a problem hiding this comment.
some exploratory work is needed to figure out how to make this (and similar methods) a proper iterator/generator instead of a list. The tricky bit is that the async method is returning an AsyncIterator and we need to synchronize that into a regular Iterator.
zarr/v3/abc/group.py
Outdated
There was a problem hiding this comment.
Now that there is only 1 Group and AsyncGroup. Do we actually want to keep the Group ABC?
There was a problem hiding this comment.
@d-v-b and I were just talking about that last night. I think the answer is NO. For ABCs, I we probably only need the Store and Codec base classes.
| # TODO: consider trying to autodiscover the zarr-format here | ||
| if zarr_format == 3: | ||
| # V3 groups are comprised of a zarr.json object | ||
| # (it is optional in the case of implicit groups) |
There was a problem hiding this comment.
Are implicit groups really a thing? I don't think they show up in the v3 spec.
There was a problem hiding this comment.
They do show up in the spec (in quite a few places actually)! https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#explicit-vs-implicit-groups
| if self.zarr_format == 3: | ||
| zarr_json_bytes = await (store_path / ZARR_JSON).get_async() | ||
| if zarr_json_bytes is None: | ||
| # implicit group? |
There was a problem hiding this comment.
This is a bit dangerous because we have no good way of telling whether this path even exists.
There was a problem hiding this comment.
I agree and we should return to this. For now I'm going to put a log statement there and we can address this later.
| store_path, zarr_json, runtime_configuration=self.runtime_configuration | ||
| ) | ||
| elif self.zarr_format == 2: | ||
| # Q: how do we like optimistically fetching .zgroup, .zarray, and .zattrs? |
There was a problem hiding this comment.
I think the overfetching is fine because it is happening in parallel.
There was a problem hiding this comment.
I was thinking about this issue the other day. One possible behavior when you "open" anything in Zarr would be to call the store.tree(depth=1) method, fetching all existing metadata docs at the root level and one level below.
- removed abcs for groups/arrays - improved return types in group.py - warn (temporarily) when an implicit group is found - add attributes.py with Attributes class
888551a to
84b497f
Compare
| elif self.zarr_format == 2: | ||
| return { | ||
| ZGROUP_JSON: self.zarr_format, | ||
| ZATTRS_JSON: json.dumps(self.attributes).encode(), |
There was a problem hiding this comment.
how would custom JSON serialization of user attributes plug in here? e.g., i don't think numpy dtype objects serialize to JSON easily. Maybe to_bytes can take a attrs_serializer kwarg or something.
| def create_array(self, name: str, **kwargs) -> Array: | ||
| return Array(self._sync(self._async_group.create_array(name, **kwargs))) | ||
|
|
||
| def empty(self, **kwargs) -> Array: |
There was a problem hiding this comment.
thoughts on ultimately putting these routines in a dedicated namespace?
| try: | ||
| if self.store == other.store and self.path == other.path: | ||
| return True | ||
| except Exception: |
There was a problem hiding this comment.
we can probably be a bit more specific about the exception type here
This begins the work of harmonizing the V2 and V3 group APIs from Zarrita with the Zarr-Python Group API.
I'll leave some comments inline.
TODO: