-
Notifications
You must be signed in to change notification settings - Fork 23
Description
After (at least) doing a grouped collapse whereby the time axis boundaries lie within the group (see MRE below) and attempting to access the underlying data array, an error is raised due to, ultimately, expecting an object of form of a numpy-like array during the Dask compute operation but encountering a cf.NetCDF4Array object, notably: AttributeError: 'NetCDF4Array' object has no attribute 'astype'. Did you mean: 'dtype'.
Note I have done some pdb debugging on this (just getting the MRE was a quite tricky), which I will also summarise below in a follow-on comment, which implies that a concatenate operation may be involved as a prereq for the bug to emerge. I will also investigate the Dask task graph, but am yet to do so due to attending an all-day meeting today and other release-related concerns yesterday.
Traceback and field context
See the end for the error report, but it is useful to note the form of the field and affected coordinate hence the dump and print in the MRE below. It produces the following output:
----------------------------------------------------
Field: long_name=Sea surface temperature (ncvar%sst)
----------------------------------------------------
Conventions = 'CF-1.6'
_FillValue = np.int16(-32767)
history = '2023-07-26 18:11:43 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-
client/bin/grib_to_netcdf.bin -S param -o /cache/data8/adaptor.mars.
internal-1690395052.5048163-16972-6-318f8028-8b02-478b-b30e-
fde96e4f8d57.nc /cache/tmp/318f8028-8b02-478b-b30e-fde96e4f8d57-
adaptor.mars.internal-1690392897.08971-16972-10-tmp.grib'
long_name = 'Sea surface temperature'
missing_value = np.int16(-32767)
units = 'K'
Data(long_name=time(996), long_name=latitude(721), long_name=longitude(1440)) = [[[271.4597342469336, ..., --]]] K
Domain Axis: long_name=latitude(721)
Domain Axis: long_name=longitude(1440)
Domain Axis: long_name=time(996)
Dimension coordinate: long_name=time
calendar = 'gregorian'
long_name = 'time'
units = 'hours since 1900-01-01 00:00:00.0'
Data(long_name=time(996)) = [1940-01-01 00:00:00, ..., 2022-12-01 00:00:00] gregorian
Dimension coordinate: long_name=latitude
long_name = 'latitude'
units = 'degrees_north'
Data(long_name=latitude(721)) = [90.0, ..., -90.0] degrees_north
Dimension coordinate: long_name=longitude
long_name = 'longitude'
units = 'degrees_east'
Data(long_name=longitude(1440)) = [0.0, ..., 359.75] degrees_east
Time cordinate is long_name=time(44) hours since 1900-01-01 00:00:00.0 gregorian
Traceback (most recent call last):
File "/home/slb93/git-repos/cf-python/docs/source/recipes/mre-astype-bug.py", line 13, in <module>
t.data.array # or t.array directly
^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/python3.11/site-packages/cfdm/data/data.py", line 2629, in array
a = self.compute().copy()
^^^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/python3.11/site-packages/cfdm/data/data.py", line 3917, in compute
a = dx.compute()
^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/python3.11/site-packages/dask/base.py", line 370, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/python3.11/site-packages/dask/base.py", line 656, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/python3.11/site-packages/dask/array/chunk.py", line 279, in astype
return x.astype(astype_dtype, **kwargs)
^^^^^^^^
AttributeError: 'NetCDF4Array' object has no attribute 'astype'. Did you mean: 'dtype'?MRE
Using an environment from cf.environment(paths=False) as printed below - in short, using the latest main branch for cfdm and for `cf-python:
import cf
sst = cf.read("~/recipes/ERA5_monthly_averaged_SST.nc")[0] # sea surface temp
sst.dump()
am_max = sst.collapse("area: maximum")
am_max = am_max.subspace(T=cf.ge(cf.dt("1980-01-01")))
# Note a cf.mam() grouped collapse works, buy cf.djf() errors!
am_max_collapse = am_max.collapse("T: mean", group=cf.djf())
t = am_max_collapse.dimension_coordinate("long_name=time")
print("Time cordinate is", t)
t.data.array # or t.array directly, the AttributeError is raised hereNote if the group to collapse on is changed to another season e.g. cf.mam or cf.son, the error does not appear! The sst field has time data starting in Jan and ending in Dec, therefore it is likely that the fact the time boundaries lie within the djf season is related to the problem emerging.
Environment, as tested on
Platform: Linux-6.6.65-1-MANJARO-x86_64-with-glibc2.41
HDF5 library: 1.14.2
netcdf library: 4.9.4-development
udunits2 library: /home/slb93/miniconda3/envs/old-sphinx-cf-doc-build-only/lib/libudunits2.so.0
esmpy/ESMF: 8.7.0
Python: 3.11.8
dask: 2025.3.0
netCDF4: 1.7.2
h5netcdf: 1.6.1
h5py: 3.13.0
s3fs: 2025.3.0
psutil: 7.0.0
packaging: 24.2
numpy: 2.2.4
scipy: 1.15.2
matplotlib: 3.8.4
cftime: 1.6.4.post1
cfunits: 3.3.7
cfplot: 3.3.0
cfdm: 1.12.0.0
cf: 3.17.0