-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened:
A project of mine suddenly broke with:
ValueError: Object has inconsistent chunks along dimension row. This can be fixed by calling unify_chunks().
where previously it had worked.
What you expected to happen:
There should have been no change.
Minimal Complete Verifiable Example:
This is very difficult to reproduce. I have tried, but it clearly isn't triggered for relatively simple xarray.Datasets. In my code, the Datasets in question are the result of multiple concatenations, selection and chunking operations. What I shall do instead is attempt to demonstrate the change, in the hopes that someone more knowledgeable has some intuition for what has gone wrong.
dask==2.25.0
I have a dataset, foo, with a number of different variables, most indexed by row. I will focus on one variable to demonstrate the change in behaviour, specifically FLAG. This is what flag looks like prior to a foo.sortby("row") call. Note that there is only a single chunk (this is intentional).
<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)>
dask.array<rechunk-merge, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray>
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076
Dimensions without coordinates: chan, corr
After the foo.sortby("row") call:
<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)>
dask.array<getitem, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray>
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078
Dimensions without coordinates: chan, corr
Note that the chunksize is unchanged.
dask==2.26.0
Repeating exactly the same experiment, prior to the call:
<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)>
dask.array<rechunk-merge, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray>
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076
Dimensions without coordinates: chan, corr
After the foo.sortby("row") call:
<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)>
dask.array<getitem, shape=(40710, 1024, 4), dtype=bool, chunksize=(20355, 1024, 4), chunktype=numpy.ndarray>
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078
Dimensions without coordinates: chan, corr
Note the change in the chunksize.
Anything else we need to know?:
I have seen similar behaviour when using xarray.Dataset.sel.
Environment:
dask==2.25.0
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-7648-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.15.1
pandas: 1.1.2
numpy: 1.19.2
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.25.0
distributed: 2.26.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 50.3.0
pip: 20.2.3
conda: None
pytest: 6.0.2
IPython: None
sphinx: None
dask==2.26.0
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-7648-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.15.1
pandas: 1.1.2
numpy: 1.19.2
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.26.0
distributed: 2.26.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 50.3.0
pip: 20.2.3
conda: None
pytest: 6.0.2
IPython: None
sphinx: None