-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Current situation
Access to named constructs is currently either by the "filter_by_*" Constructs methods, or by Field attributes:
>>> f = cf.example_field(0)
>>> f.dimension_coordinates
<Constructs: dimension_coordinate(3)>
>>> f.constructs.filter_by_type('dimension_coordinate')
<Constructs: dimension_coordinate(3)>The reason for the "dimension_coordinates" alias being an attribute rather than method is that it was expected that it would often be followed by more options:
>>> f.dimension_coordinates('time')
<Constructs: dimension_coordinate(1)>
>>> f.dimension_coordinates.filter_by_property(units='degrees_east')
<Constructs: dimension_coordinate(1)>This is fine, but comes with the slight confusion that the result of the attribute is a callable Constructs instance, which obfuscates the help:
>>> help(f.dimension_coordinates)
Help on Constructs in module cf.constructs object:
class Constructs(cfdm.constructs.Constructs)
| Constructs(auxiliary_coordinate=None, dimension_coordinate=None, domain_ancillary=None, field_ancillary=None, cell_measure=None, coordinate_reference=None, domain_axis=None, cell_method=None, source=None, copy=True, _use_data=True, _view=False, _ignore=())
|
| A container for metadata constructs.
... Performance
It turns out that the code for accessing constructs is very slow, and must be improved.
A key part of this improvement relies on being able to work with python dictionaries rather than dictionary-like Constructs objects.
To facilitate this, whilst retaining the intuitive nature of the API the least intrusive change is to make what were attributes properties: i.e. f.dimension_coordinates becomes f.dimension_coordinates(). This then allows keyword parameters that can change the behaviour when speed is an issue.
Code breaking
This change will break any code that uses bare construct access attributes:
>>> # These won't work any more
>>> f.auxiliary_coordinates
>>> f.coordinate_references
>>> f.coordinates
>>> f.cell_measures
>>> f.dimension_coordinates
>>> f.domain_ancillaries
>>> f.domain_axes
>>> f.cell_methods
>>> f.field_ancillaries>>> # These will work as before
>>> f.auxiliary_coordinates()
>>> f.coordinate_references()
>>> f.coordinates()
>>> f.cell_measures()
>>> f.dimension_coordinates()
>>> f.domain_ancillaries()
>>> f.domain_axes()
>>> f.cell_methods()
>>> f.field_ancillaries()>>> # These will also work as before
>>> f.auxiliary_coordinates(x)
>>> f.coordinate_references(x)
>>> f.coordinates(x)
>>> f.cell_measures(x)
>>> f.dimension_coordinates(x)
>>> f.domain_ancillaries(x)
>>> f.domain_axes(x)
>>> f.cell_methods(x)
>>> f.field_ancillaries(x)This is the only backwards incompatible change to the API. All other changes will not break existing code.
With the new API, reading a file is, in one reproducible test, ~3 times faster:
>>> import cf, timeit
>>> f = cf.example_field(1)
>>> cf.write(f, 'tmp.nc')
>>> cf.__version__
3.8.0
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.22618969043996912>>> import cf, timeit
>>> cf.__version__
3.9.0b1
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.07341453492001165Note that much of the improvement comes from unrelated changes (such as removing unnecessary __repr__ calls and unnecessary deep copies).
See NCAS-CMS/cfdm#130 for further details. In particular, the cfdm version of the above timing test gives a ~10 times speed up - the reasons for the lesser improvement in cf-python will need investigating
Edit: The less speed-up is understandable and due to extra checks being carried out in, e.g., set_construct. This is not to say that things can't be sped up more, but that is for another issue
PR to follow ...