Coordinates.dims: only include dimensions found in coordinate variables.#10609
Coordinates.dims: only include dimensions found in coordinate variables.#10609benbovy wants to merge 3 commits intopydata:mainfrom
Conversation
Exclude the dimension(s) present in the wrapped Dataset / DataArray / DataTree object that do not have any coordinate variable.
This function is used internally for Xarray -> Pandas conversion, where we still want dimensions without coordinate be converted to an index.
TomNicholas
left a comment
There was a problem hiding this comment.
The test failures here look real?
| self.mda["level_1"] = ("x", np.arange(4)) | ||
| self.mda.coords["level_1"] = ("x", np.arange(4)) | ||
|
|
||
| def test_coords_dims(self) -> None: |
There was a problem hiding this comment.
| def test_coords_dims(self) -> None: | |
| # https://github.com/pydata/xarray/issues/9466 | |
| def test_coords_dims(self) -> None: |
| "b": np.dtype("int64"), | ||
| } | ||
|
|
||
| def test_dims(self) -> None: |
There was a problem hiding this comment.
Should probably also add a similar test to the one you used for DataArray here too
This is caused by this line xarray/xarray/core/dataarray.py Line 453 in 6e9414d where DataArray dimensions, if not explicitly set via the constructor argument, may be ultimately set from This is why this PR may be considered as both a bug fix and a breaking change. Apparently quite a few tests (and maybe 3rd-party code?) rely on the dimensions of the object wrapped by the Coordinates proxy object (passed via the On the >>> import numpy as np
>>> import xarray as xr
>>> da = xr.DataArray([[1, 2], [3, 4]], coords={"x": [0, 1]}, dims=("x", "y"))
>>> values = np.array([[10, 20], [30, 40]])Creating a new DataArray by directly passing the Coordinates proxy object (success, dimension name "y" magically discovered from >>> coords = da.coords
>>> xr.DataArray(values, coords=coords)
<xarray.DataArray (x: 2, y: 2)> Size: 32B
array([[10, 20],
[30, 40]])
Coordinates:
* x (x) int64 16B 0 1
Dimensions without coordinates: yCreating a new DataArray by passing a new "standalone" Coordinates object (error, the coordinates have only one "x" dimension): >>> coords = xr.Coordinates(da.coords)
>>> xr.DataArray(values, coords=coords)
ValueError: different number of dimensions on data and dims: 2 vs 1This seems very confusing to me and I think that we should deprecate the succeeding behavior above. |
whats-new.rstapi.rstTwo comments:
Dataset.coords.dims,DataArray.coords.dimsandDataTree.coords.dimsnow all calculate the dimensions on the fly (no cache), which may have an impact of performance. Not sure those properties are heavily used, though (not much reports apart from ds.coords.dims can display dims not present in the coordinate vars #9466).