Improve the performance of reading and accessing the data of PP and UM fields files

Aspects of accessing PP and UM fields file data has sometimes been very slow, for a quite a while. I had previously always assumed that this was a `cf.aggregation` issue, which it very much sometimes was! ... but I think aggregation now performs pretty well. 

@theabro kindly raised a case of reading a CF field from a 16 GB PP file. the CF Field itself comprised 2040 (= 24 x 85) 2-d PP fields:
```python
>>> print(f)
<CF Field: id%UM_m01s50i500_vn1300(time(24), atmosphere_hybrid_height_coordinate(85), latitude(144), longitude(192))>
```
Accessing the full data array  with `a = f.array` is taking **~11,000 seconds** - far too long!

Investigations showed that the reason for this was that the whole PP file was being parsed (i.e. all headers read and processed) for every 2-d PP field that contributes to the array, i.e. 2040 times in this case.

Stopping this parsing reduces the time taken to get the full array, on the same machine, to **~2 seconds** (!). The entire 16 GB can read from disk in ~3.5 minutes.

The size of the file per se is not the cause of the problem, rather the large amount of individual lookup headers in the file: 162,888 in this case. For small my test cases with fewer than 5 PP fields, the slow down is invisible :(

Long overdue PR to follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the performance of reading and accessing the data of PP and UM fields files #746

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the performance of reading and accessing the data of PP and UM fields files #746

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions