Skip to content

Memory issues despite lazy operations #1193

@schlunma

Description

@schlunma

Right now, I'm trying to analyze 25 years of monthly ERA5 data (variable ta). Since this dataset is 4D (time, height, lat, lon), the file sizes are pretty large. After concantenation, the cube's shape is (300, 37, 721, 1440), which corresponds to about 86 GiB assuming float64.

When using the following recipe to extract the zonal means

Details
# ESMValTool
---
documentation:
 description: Test ERA5.

 authors:
   - schlund_manuel

 references:
   - acknow_project


preprocessors:

 mean:
   zonal_statistics:
     operator: mean


diagnostics:

 ta_obs:
   variables:
     ta:
       preprocessor: mean
       mip: Amon
       additional_datasets:
         - {dataset: ERA5, project: native6, type: reanaly, version: v1, tier: 3, start_year: 1990, end_year: 2014}
   scripts:
     null

on a node with 64 GiB memory, the process hangs indefinitely after reaching

2021-06-24 11:49:23,750 UTC [24261] INFO    Starting task ta_obs/ta in process [24261]
2021-06-24 11:49:23,843 UTC [24232] INFO    Progress: 1 tasks running, 0 tasks waiting for ancestors, 0/1 done

I'm fairly sure this is due to memory issues. As the following memory samples (for every second) show, the memory slowly ramps up to maximum memory and then suddenly drops. I guess somehow the OS shuts down the process after filling up the entire RAM.

Details
            total       used       free     shared    buffers     cached
Mem:           62G       2.7G        59G        80K       6.5M       653M
Mem:           62G       9.2G        53G        80K       6.5M       1.7G
Mem:           62G       9.4G        53G        80K       6.5M       1.8G
Mem:           62G        16G        46G        80K       6.5M       3.0G
Mem:           62G        15G        47G        80K       6.5M       3.0G
Mem:           62G        22G        39G        80K       6.5M       4.2G
Mem:           62G        22G        40G        80K       6.5M       4.2G
Mem:           62G        29G        32G        80K       6.5M       5.4G
Mem:           62G        28G        34G        80K       6.5M       5.5G
Mem:           62G        36G        26G        80K       6.5M       6.6G
Mem:           62G        35G        27G        80K       6.5M       6.8G
Mem:           62G        42G        20G        80K       6.5M       7.9G
Mem:           62G        41G        20G        80K       6.5M       8.1G
Mem:           62G        48G        14G        80K       6.5M       9.2G
Mem:           62G        48G        14G        80K       6.5M       9.2G
Mem:           62G        53G       9.4G        80K       6.5M        10G
Mem:           62G        56G       6.7G        80K       6.5M        10G
Mem:           62G        59G       3.1G        80K       6.5M        11G
Mem:           62G        61G       811M        80K       1.6M       9.2G
Mem:           62G        58G       4.4G        80K       1.6M       9.4G
Mem:           62G        62G       193M        80K       1.6M       9.5G
Mem:           62G        62G       690M        80K       1.6M       8.4G
Mem:           62G        62G       165M        80K       1.5M       6.5G
Mem:           62G        61G       980M        80K       1.5M       6.2G
Mem:           62G        62G       335M        80K       1.5M       5.4G
Mem:           62G        62G       175M        80K       1.5M       3.4G
Mem:           62G        62G       171M        80K       1.5M       2.5G
Mem:           62G        62G       177M        80K       1.5M       2.3G
Mem:           62G        62G       166M        80K       1.5M       1.0G
Mem:           62G        62G       165M        80K       1.5M       701M
Mem:           62G        62G       165M        80K       1.5M       124M
Mem:           62G        62G       165M        80K       1.1M        23M
Mem:           62G        62G       165M        80K       996K        19M
Mem:           62G        62G       165M        80K       956K        18M
Mem:           62G        62G       166M        80K       856K        24M
Mem:           62G        62G       165M        80K       724K       2.2M
Mem:           62G        40G        22G        80K       920K        11M
Mem:           62G        19G        43G        80K       1.5M        13M
Mem:           62G       2.2G        60G        80K       1.5M        14M
Mem:           62G       1.8G        60G        80K       1.5M        14M
Mem:           62G       1.8G        60G        80K       1.5M        14M

The zonal_statistics preprocessor that is used here should be 100% lazy:

cube = cube.collapsed('longitude', operation)

So I'm really not sure what's going on here. I would have expected that a node with 64 GiB of memory should be able to handle 86 GiB of lazy data. I think this is somehow related to dask, as adding a cube.lazy_data().mean(axis=3).compute() to the line above triggers the exact same behavior. At this moment, the (lazy) array looks like that (for me that looks reasonable):

dask.array<concatenate, shape=(300, 37, 721, 1440), dtype=float32, chunksize=(1, 12, 721, 1440), chunktype=numpy.MaskedArray>

For comparison, setting up a random array with the same sizes in dask and doing the same operation works perfectly well:

>>> import dask.array as da
>>> x = da.ma.masked_greater(da.random.normal(size=(300, 37, 721, 1440), chunks=(1, 12, 721, 1440)), 0.0)                                                                                                                                                                     
>>> print(x)
dask.array<masked_greater, shape=(300, 37, 721, 1440), dtype=float64, chunksize=(1, 12, 721, 1440), chunktype=numpy.MaskedArray>
>>> x.mean(axis=3).compute() # works perfectly well with max. ~15 GiB memory

@ESMValGroup/esmvaltool-coreteam Has anyone made similar experiences? I know that this is not exactly an issue of ESMValTool, but maybe there is something we can configure in dask that helps here? Apart from using a larger node, there is no way to evaluate this dataset at the moment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    irisRelated to the Iris packagequestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions