Skip to content

feat: nested group in ManifestStore + HDFParser#790

Merged
maxrjones merged 18 commits intozarr-developers:mainfrom
ilan-gold:ig/nested_h5_group
Nov 11, 2025
Merged

feat: nested group in ManifestStore + HDFParser#790
maxrjones merged 18 commits intozarr-developers:mainfrom
ilan-gold:ig/nested_h5_group

Conversation

@ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Aug 28, 2025

Some potential TODOs:

  1. More complex h5 objects from currently parsed file formats that fit the nested paradigm and would otherwise have not worked
  2. Along the lines of 1., maybe bring in anndata as an optional dep to test the behavior of this more deeply I think this raises separate issues because anndata has semantics around object dtype that would need to be handled by a parser, I think
  3. More pathological examples and/or other places where this feature might not work. I'm not too familiar with this library/space so would be very appreciative of some investigative directions! Here are a few potential ones:
  • open_virtual_dataset seems to return an xarray.Dataset but I think in the case where things are nested, it should return a DataTree maybe? or error out? construct_virtual_dataset and thus ManifestStore.to_virtual_dataset suffer the same issue

  • Can the non-hdf5 formats handle nested structures, like netCDF or fits?

  • Closes hdf parser limited to level-0 h5py.Dataset #664

  • Tests added

  • Tests passing

  • Full type hint coverage

  • Changes are documented in docs/releases.rst

  • New functions/methods are listed in api.rst

  • New functionality has documentation

@TomNicholas
Copy link
Member

open_virtual_dataset seems to return an xarray.Dataset but I think in the case where things are nested, it should return a DataTree maybe? or error out? construct_virtual_dataset and thus ManifestStore.to_virtual_dataset suffer the same issue

Sorry I think I should have linked to #84. What we want is to add open_virtual_datatree and ManifestStore.to_virtual_datatree, and have .to_virtual_dataset/tree implemented on a ManifestStore class which supports nested groups (that's what this PR adds, which is great).

Can the non-hdf5 formats handle nested structures, like netCDF or fits?

NetCDF4 is HDF5 in a trenchcoat 🙂 NetCDF3 cannot handle groups. TIFF can. FITS IDK.

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Sep 7, 2025

TIFF can

Good point, but I don't see any tiff tests at the moment. I'd be happy to add them if you want. But then I would need to edit the Tiff parser so perhaps this PR should be limited to the store + hdf5

Sorry I think I should have linked to #84.

Hmm, it seems adding a check for subgroups caused tests to fail. It looks like opening a group with a subgroup actually does work but just creates a dataset object at the root group if present, ignoring the other subkeys. Judging by the tests, this is intentional. So I can remove the check then that I added here. I guess the vibes well with having a new API in open_virtual_datatree

EDIT: done!

@codecov
Copy link

codecov bot commented Sep 7, 2025

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.78%. Comparing base (c1db925) to head (fc0e999).

Files with missing lines Patch % Lines
virtualizarr/manifests/store.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #790      +/-   ##
==========================================
+ Coverage   87.75%   87.78%   +0.03%     
==========================================
  Files          35       35              
  Lines        1886     1891       +5     
==========================================
+ Hits         1655     1660       +5     
  Misses        231      231              
Files with missing lines Coverage Δ
virtualizarr/manifests/group.py 93.47% <100.00%> (+1.81%) ⬆️
virtualizarr/parsers/hdf/hdf.py 95.41% <100.00%> (+0.07%) ⬆️
virtualizarr/xarray.py 85.43% <ø> (ø)
virtualizarr/manifests/store.py 87.39% <92.30%> (-0.33%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold ilan-gold changed the title feat: nested group feat: nested group in ManifestStore + HDFParser Sep 7, 2025
@ilan-gold ilan-gold marked this pull request as ready for review September 7, 2025 17:50
@TomNicholas
Copy link
Member

Are you waiting on a review here @ilan-gold ?

@ilan-gold
Copy link
Contributor Author

Yes, I just assumed the PR was "approved" because it was put in v3.0.0, milestone, but I'm ready to respond to any requested changes :)

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make a couple small changes to support Zarr-Python's reliance on getting back None if a key doesn't exist for distinguishing between different node types, but that issue also exists in main so I support merging as-is.

I tried out the store changes in a local development of open_virtual_datatree and in virtual_tiff (virtual-zarr/virtual-tiff#54) and it worked great.

I don't have much experience with the HDF parser, so @TomNicholas or @sharkinsspatial may want to take another look at that before merging.

Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com>
else:
_groups: Mapping[str, ManifestGroup] = {} if groups is None else groups

_groups: Mapping[str, ManifestGroup] = {} if groups is None else groups
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if #796 should be dealt with first...


"""

# TODO: Remove private API `._group`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to do this in a follow-up?

@maxrjones maxrjones merged commit 7bf1b9e into zarr-developers:main Nov 11, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hdf parser limited to level-0 h5py.Dataset

4 participants