Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint#9243
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
I know we touched on this a bit during our meeting on Tuesday but I think tests for Let me know if I have that right and what you all think. EDIT: EDIT 2: |
|
Sounds good @eni-awowale You may even be able to use basically the same groups test for multiple backends by adding it to
Having an included file to test against is a good idea, we just want it to be small.
Where are these currently? We don't want to add files to the main repo - preferring to put them here https://github.com/pydata/xarray-data |
So, instead of adding the same type of tests to @TomNicholas
Okay, great there might already be some some sample files in there with groups that I can use! |
I didn't even know those existed... @max-sixty I know you have thought about example datasets for testing purposes - do you have an opinion on whether new files for testing should go in that directory or the separate repository? |
We want tests to be runable without a network connection. If we need new files for testing, which I agree would be a good idea in this case, please add them here. Just keep them as small as possible (the current test files are all under 10 KB in size, most under 1 KB). |
xarray-data is for sample/tutorial datasets, which can be much larger (enough data to make an interesting plot) than what we use for test data. |
…reate netcdf4 file, on the fly
| invalid_netcdf=None, | ||
| phony_dims=None, | ||
| decode_vlen_strings=True, | ||
| driver=None, | ||
| driver_kwds=None, |
There was a problem hiding this comment.
These shouldn't be here right? They should all fall under `**kwargs``
There was a problem hiding this comment.
Or maybe they should be in the specific backend but not in common.py?
There was a problem hiding this comment.
Yeah, so these were added from PR #9199 for adding back the backend specific keyword arguments. I pulled this into my branch after it was merged to main. But they are not in common.py , they are consolidated as **kwargs in common.py.
| ) | ||
| # Check for a group and make it a parent if it exists | ||
| if group: | ||
| parent = NodePath("/") / NodePath(group) |
| @@ -837,6 +837,43 @@ def open_datatree( | |||
| return backend.open_datatree(filename_or_obj, **kwargs) | |||
There was a problem hiding this comment.
We could have a default implementation here that calls open_groups, i.e.
| return backend.open_datatree(filename_or_obj, **kwargs) | |
| groups_dict = backend.open_datatree(filename_or_obj, **kwargs) | |
| return DataTree.from_dict(groups_dict) |
The idea being that then backend developers don't actually have to implement open_datatree if they have implemented open_groups...
This was sort of discussed here (@keewis) #7437 (comment), but this seems like an rabbit hole that should be left for a future PR.
There was a problem hiding this comment.
not really, I was arguing that given any one of open_dataarray, open_dataset and open_datatree allows us to provide (somewhat inefficient) default implementations for the others. However, open_groups has a much closer relationship to open_datatree, so I think having a default implementation for open_datatree is fine (we just need to make sure that a backend that provides neither open_groups nor open_datatree doesn't complain about open_groups not existing if you called open_datatree).
So yeah, this might become a rabbit hole.
There was a problem hiding this comment.
Okay I see. That seems related, but also like a totally optional convenience feature that we should defer to later.
|
Hi folks! I think this PR is a couple commits away from merging but they are a couple things to sort out.
|
Edit:
Maybe the typing crowd can help? cc @max-sixty, @Illviljan, @headtr1ck |
It's this type of problem: https://stackoverflow.com/questions/73603289/why-doesnt-parameter-type-dictstr-unionstr-int-accept-value-of-type-di |
xarray/core/datatree.py
Outdated
| d_cast = cast(dict, d) | ||
| root_data = d_cast.pop("/", None) |
There was a problem hiding this comment.
Mypy is correct here, a Mapping does not include a .pop and ignoring the typing errors doesn't solve the bug.
xarray uses Mappings frequently, for example xr.core.utils.FrozenDict(dict(a=3)).pop("a", None) so it's a real issue.
Either you explicitly convert it to dict i.e. d_cast = dict(d) or refactor the code to not use the .pop since I'm not so sure it's needed.
There was a problem hiding this comment.
Thanks, I updated it to explicitly covert the type to a dict. The mypy3.9 tests are still passing but the CI Mypy check seems to be returning the same error as before explicitly converting it to a dict.
|
@TomNicholas and @keewis this issue we just moved over #9336 seems to be related to the latest batch of test failures. |
|
Yay all checks are passing! Does someone want to give this a quick look before merging? |
| def open_groups_as_dict( | ||
| self, | ||
| filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, | ||
| **kwargs: Any, |
There was a problem hiding this comment.
Maybe we should not make the same mistake as with open_dataset and prevent liskov errors.
| **kwargs: Any, |
If the abstract method supports any kwargs, so must all subclass implementations, which is not what we want.
There was a problem hiding this comment.
@headtr1ck from my understanding I think the **kwargs were added back to fix this issue #9135
There was a problem hiding this comment.
Hmmm, Not sure.
But since this is the same problem in all other backend methods, I'm fine with leaving it as it is (and possibly change it in a future PR all together).
There was a problem hiding this comment.
Sounds good we can revisit this on another PR.
| decode_vlen_strings=True, | ||
| driver=None, | ||
| driver_kwds=None, | ||
| **kwargs, |
There was a problem hiding this comment.
This should be obsolete as well, when you remove it from the abstract method.
| persist=False, | ||
| lock=None, | ||
| autoclose=False, | ||
| **kwargs, |
* main: Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
* main: (214 commits) Adds copy parameter to __array__ for numpy 2.0 (pydata#9393) `numpy 2` compatibility in the `pydap` backend (pydata#9391) pyarrow dependency added to doc environment (pydata#9394) Extend padding functionalities (pydata#9353) refactor GroupBy internals (pydata#9389) Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274) passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377) Fix tests on big-endian systems (pydata#9380) Improve error message on `ds['x', 'y']` (pydata#9375) Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243) Revise (pydata#9366) Fix rechunking to a frequency with empty bins. (pydata#9364) whats-new entry for dropping python 3.9 (pydata#9359) drop support for `python=3.9` (pydata#8937) Revise (pydata#9357) ...
whats-new.rstapi.rst@TomNicholas, @shoyer, @owenlittlejohns, and @flamingbear