DataTree: fetch inherited coords on demand#9187
DataTree: fetch inherited coords on demand#9187TomNicholas wants to merge 9 commits intopydata:mainfrom
Conversation
…not rely on attribute-like access
| _children: dict[str, DataTree] | ||
| _attrs: dict[Hashable, Any] | None | ||
| _cache: dict[str, Any] | ||
| _coord_names: set[Hashable] | ||
| _dims: dict[Hashable, int] | ||
| _local_coord_names: set[Hashable] | ||
| _inherited_coord_names: set[Hashable] | ||
| _local_dims: dict[Hashable, int] | ||
| _encoding: dict[Hashable, Any] | None | ||
| _close: Callable[[], None] | None | ||
| _indexes: dict[Hashable, Index] | ||
| _variables: dict[Hashable, Variable] | ||
| _local_indexes: dict[Hashable, Index] | ||
| _local_variables: dict[Hashable, Variable] |
There was a problem hiding this comment.
The only new internal attribute used here is _inherited_coord_names. Some of the others are renamed just to clarify that they only hold the information local to this node.
| # set tree attributes | ||
| self._children = {} | ||
| self._parent = None | ||
| self._set_node_data(data) |
| def _post_attach_recursively(self: DataTree, parent: DataTree) -> None: | ||
| for k in parent._coord_names: | ||
| if k not in self._variables: | ||
| self._inherited_coord_names.add(k) | ||
|
|
||
| for child in self._children.values(): | ||
| child._post_attach_recursively(self) |
There was a problem hiding this comment.
After alignment has been checked, we build the list of inheritable coordinate names.
| def _get_inherited_coord_var(self: DataTree, key: str) -> Variable: | ||
| for node in self.parents: | ||
| if key in node._local_coord_names: | ||
| return node._local_variables[key] | ||
|
|
||
| raise Exception( | ||
| "should never get here - means we didn't do our alignment / construction properly" | ||
| ) |
There was a problem hiding this comment.
This method, along with the various ._* properties below, is what substitutes for all the ChainMap stuff in #9063.
Here we take the approach that we iterate up the tree to go fetch the inherited coordinate variable only when we need it. (As opposed to pre-fetching all of them and storing all of them on each child node under a ChainMap.)
There was a problem hiding this comment.
This function is basically a very literal way to do the "search by proximity" in the CF conventions.
| @property | ||
| def _inherited_variables(self: DataTree) -> Mapping[Hashable, Variable]: | ||
| return { | ||
| k: self._get_inherited_coord_var(k) for k in self._inherited_coord_names | ||
| } |
There was a problem hiding this comment.
We build these lists of inherited variables/coords/dims/indexes dynamically, instead of trying to save them as static properties.
| self._variables if inherited else self._local_variables, | ||
| self._coord_names if inherited else self._local_coord_names, | ||
| self._dims if inherited else self._local_dims, | ||
| self._attrs, | ||
| self._indexes, | ||
| self._indexes if inherited else self._local_indexes, |
There was a problem hiding this comment.
The .to_dataset method becomes really simple, because the internal data model of the DataTree node is a close as possible to that of a Dataset.
| # nodes should include inherited dimensions | ||
| assert dt["b"].sizes == {"x": 2, "y": 1} | ||
| assert dt["c"].sizes == {"x": 2, "y": 3} |
There was a problem hiding this comment.
@shoyer why would I expect these dimensions to be present if the parent node only has them on data variables, not coordinates?
| with pytest.raises(ValueError, match=expected_msg): | ||
| DataTree(data=xr.Dataset(coords={"x": [1]}), children={"b": b}) | ||
|
|
||
| def test_inconsistent_grandchild_indexes(self): |
There was a problem hiding this comment.
These two grandchild tests don't work because of some as-yet unsolved bug in my implementation.
| @property | ||
| def _item_sources(self) -> Iterable[Mapping[Any, Any]]: | ||
| """Places to look-up items for key-completion""" | ||
| yield self.data_vars |
There was a problem hiding this comment.
I got horrible recursion errors from the attribute-like access method, and I just took the shortcut of deleting it for now.
|
Closing as replaced by #9063 |
This takes the new tests from #9063 and makes (most of) them pass using a different implementation. I didn't get to the IO part but this should be enough to show you what I was imagining.
cc @shoyer
@keewis @owenlittlejohns @flamingbear @eni-awowale I'm interested if any of you see advantages in either mine or @shoyer's approach here (or don't care)
whats-new.rstapi.rst