Skip to content

New method pad_missing to support aggregation of DSGs #717

@davidhassell

Description

@davidhassell

It is currently not possible to aggregate two DSG features of different lengths into a single DSG with a larger feature axis. E.g.

>>> a
<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>
>>> b
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>
>>> cf.aggregate([a,b])
[<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>,
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>

This is something we might want to do, because we can store DSGs of different lengths in one CF-netCDF data variable using a ragged array representation.

However, if we could pad out the ncdim%timeseries axis of b with missing data then we could do this with a new pad_missing method:

>>> # Pad out the 'ncdim%timeseries' axis with missing data:
>>> #   0 elements at the start of the axis and 4 elements at the end:
>>> b = b.pad_missing('ncdim%timeseries', (0, 4))
>>> c = cf.aggregate([a,b])  # Now this aggregates
>>> c
<CF Field: precipitation_flux(cf_role=timeseries_id(5), ncdim%timeseries(9)) kg m-2 day-1>]
>>> # Compress the field
>>> c = c[0].compress('contiguous')
>>> # Write it to disk in a single CF-netCDF data variable *without* the extra padding 
>>> cf.write(c, 'dsg.nc')

Numpy and Dask have a pad method that lets you do all sorts of fancy padding, but not for missing data. Their API is also more general. As a result, it may be better to call our method pad_missing to discern it from the more general pad method, and it would always be possible to implement a full cf-python pad in the future if ever the need arose.

PR to follow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions