-
Notifications
You must be signed in to change notification settings - Fork 23
Description
It is currently not possible to aggregate two DSG features of different lengths into a single DSG with a larger feature axis. E.g.
>>> a
<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>
>>> b
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>
>>> cf.aggregate([a,b])
[<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>,
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>This is something we might want to do, because we can store DSGs of different lengths in one CF-netCDF data variable using a ragged array representation.
However, if we could pad out the ncdim%timeseries axis of b with missing data then we could do this with a new pad_missing method:
>>> # Pad out the 'ncdim%timeseries' axis with missing data:
>>> # 0 elements at the start of the axis and 4 elements at the end:
>>> b = b.pad_missing('ncdim%timeseries', (0, 4))
>>> c = cf.aggregate([a,b]) # Now this aggregates
>>> c
<CF Field: precipitation_flux(cf_role=timeseries_id(5), ncdim%timeseries(9)) kg m-2 day-1>]
>>> # Compress the field
>>> c = c[0].compress('contiguous')
>>> # Write it to disk in a single CF-netCDF data variable *without* the extra padding
>>> cf.write(c, 'dsg.nc')Numpy and Dask have a pad method that lets you do all sorts of fancy padding, but not for missing data. Their API is also more general. As a result, it may be better to call our method pad_missing to discern it from the more general pad method, and it would always be possible to implement a full cf-python pad in the future if ever the need arose.
PR to follow.