Skip to content

Change constructs access API to facilitate performance improvements #202

@davidhassell

Description

@davidhassell

Current situation

Access to named constructs is currently either by the "filter_by_*" Constructs methods, or by Field attributes:

>>> f = cf.example_field(0)
>>> f.dimension_coordinates
<Constructs: dimension_coordinate(3)>
>>> f.constructs.filter_by_type('dimension_coordinate')
<Constructs: dimension_coordinate(3)>

The reason for the "dimension_coordinates" alias being an attribute rather than method is that it was expected that it would often be followed by more options:

>>> f.dimension_coordinates('time')
<Constructs: dimension_coordinate(1)>
>>> f.dimension_coordinates.filter_by_property(units='degrees_east')
<Constructs: dimension_coordinate(1)>

This is fine, but comes with the slight confusion that the result of the attribute is a callable Constructs instance, which obfuscates the help:

>>> help(f.dimension_coordinates)
Help on Constructs in module cf.constructs object:

class Constructs(cfdm.constructs.Constructs)
 |  Constructs(auxiliary_coordinate=None, dimension_coordinate=None, domain_ancillary=None, field_ancillary=None, cell_measure=None, coordinate_reference=None, domain_axis=None, cell_method=None, source=None, copy=True, _use_data=True, _view=False, _ignore=())
 |
 |  A container for metadata constructs.
 ...  

Performance

It turns out that the code for accessing constructs is very slow, and must be improved.

A key part of this improvement relies on being able to work with python dictionaries rather than dictionary-like Constructs objects.

To facilitate this, whilst retaining the intuitive nature of the API the least intrusive change is to make what were attributes properties: i.e. f.dimension_coordinates becomes f.dimension_coordinates(). This then allows keyword parameters that can change the behaviour when speed is an issue.
Code breaking

This change will break any code that uses bare construct access attributes:

>>> # These won't work any more
>>> f.auxiliary_coordinates  
>>> f.coordinate_references
>>> f.coordinates  
>>> f.cell_measures
>>> f.dimension_coordinates
>>> f.domain_ancillaries
>>> f.domain_axes
>>> f.cell_methods
>>> f.field_ancillaries
>>> # These will work as before
>>> f.auxiliary_coordinates()
>>> f.coordinate_references()
>>> f.coordinates()
>>> f.cell_measures()
>>> f.dimension_coordinates()
>>> f.domain_ancillaries()
>>> f.domain_axes()
>>> f.cell_methods()
>>> f.field_ancillaries()
>>> # These will also work as before
>>> f.auxiliary_coordinates(x)
>>> f.coordinate_references(x)
>>> f.coordinates(x)
>>> f.cell_measures(x)
>>> f.dimension_coordinates(x)
>>> f.domain_ancillaries(x)
>>> f.domain_axes(x)
>>> f.cell_methods(x)
>>> f.field_ancillaries(x)

This is the only backwards incompatible change to the API. All other changes will not break existing code.

With the new API, reading a file is, in one reproducible test, ~3 times faster:

>>> import cf, timeit
>>> f = cf.example_field(1)
>>> cf.write(f, 'tmp.nc')
>>> cf.__version__
3.8.0
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.22618969043996912
>>> import cf, timeit
>>> cf.__version__
3.9.0b1
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.07341453492001165

Note that much of the improvement comes from unrelated changes (such as removing unnecessary __repr__ calls and unnecessary deep copies).

See NCAS-CMS/cfdm#130 for further details. In particular, the cfdm version of the above timing test gives a ~10 times speed up - the reasons for the lesser improvement in cf-python will need investigating

Edit: The less speed-up is understandable and due to extra checks being carried out in, e.g., set_construct. This is not to say that things can't be sped up more, but that is for another issue

PR to follow ...

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceRelating to speed and memory performance

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions