-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Right now, I'm trying to analyze 25 years of monthly ERA5 data (variable ta). Since this dataset is 4D (time, height, lat, lon), the file sizes are pretty large. After concantenation, the cube's shape is (300, 37, 721, 1440), which corresponds to about 86 GiB assuming float64.
When using the following recipe to extract the zonal means
Details
# ESMValTool
---
documentation:
description: Test ERA5.
authors:
- schlund_manuel
references:
- acknow_project
preprocessors:
mean:
zonal_statistics:
operator: mean
diagnostics:
ta_obs:
variables:
ta:
preprocessor: mean
mip: Amon
additional_datasets:
- {dataset: ERA5, project: native6, type: reanaly, version: v1, tier: 3, start_year: 1990, end_year: 2014}
scripts:
nullon a node with 64 GiB memory, the process hangs indefinitely after reaching
2021-06-24 11:49:23,750 UTC [24261] INFO Starting task ta_obs/ta in process [24261]
2021-06-24 11:49:23,843 UTC [24232] INFO Progress: 1 tasks running, 0 tasks waiting for ancestors, 0/1 done
I'm fairly sure this is due to memory issues. As the following memory samples (for every second) show, the memory slowly ramps up to maximum memory and then suddenly drops. I guess somehow the OS shuts down the process after filling up the entire RAM.
Details
total used free shared buffers cached
Mem: 62G 2.7G 59G 80K 6.5M 653M
Mem: 62G 9.2G 53G 80K 6.5M 1.7G
Mem: 62G 9.4G 53G 80K 6.5M 1.8G
Mem: 62G 16G 46G 80K 6.5M 3.0G
Mem: 62G 15G 47G 80K 6.5M 3.0G
Mem: 62G 22G 39G 80K 6.5M 4.2G
Mem: 62G 22G 40G 80K 6.5M 4.2G
Mem: 62G 29G 32G 80K 6.5M 5.4G
Mem: 62G 28G 34G 80K 6.5M 5.5G
Mem: 62G 36G 26G 80K 6.5M 6.6G
Mem: 62G 35G 27G 80K 6.5M 6.8G
Mem: 62G 42G 20G 80K 6.5M 7.9G
Mem: 62G 41G 20G 80K 6.5M 8.1G
Mem: 62G 48G 14G 80K 6.5M 9.2G
Mem: 62G 48G 14G 80K 6.5M 9.2G
Mem: 62G 53G 9.4G 80K 6.5M 10G
Mem: 62G 56G 6.7G 80K 6.5M 10G
Mem: 62G 59G 3.1G 80K 6.5M 11G
Mem: 62G 61G 811M 80K 1.6M 9.2G
Mem: 62G 58G 4.4G 80K 1.6M 9.4G
Mem: 62G 62G 193M 80K 1.6M 9.5G
Mem: 62G 62G 690M 80K 1.6M 8.4G
Mem: 62G 62G 165M 80K 1.5M 6.5G
Mem: 62G 61G 980M 80K 1.5M 6.2G
Mem: 62G 62G 335M 80K 1.5M 5.4G
Mem: 62G 62G 175M 80K 1.5M 3.4G
Mem: 62G 62G 171M 80K 1.5M 2.5G
Mem: 62G 62G 177M 80K 1.5M 2.3G
Mem: 62G 62G 166M 80K 1.5M 1.0G
Mem: 62G 62G 165M 80K 1.5M 701M
Mem: 62G 62G 165M 80K 1.5M 124M
Mem: 62G 62G 165M 80K 1.1M 23M
Mem: 62G 62G 165M 80K 996K 19M
Mem: 62G 62G 165M 80K 956K 18M
Mem: 62G 62G 166M 80K 856K 24M
Mem: 62G 62G 165M 80K 724K 2.2M
Mem: 62G 40G 22G 80K 920K 11M
Mem: 62G 19G 43G 80K 1.5M 13M
Mem: 62G 2.2G 60G 80K 1.5M 14M
Mem: 62G 1.8G 60G 80K 1.5M 14M
Mem: 62G 1.8G 60G 80K 1.5M 14M
The zonal_statistics preprocessor that is used here should be 100% lazy:
ESMValCore/esmvalcore/preprocessor/_area.py
Line 142 in f24575b
| cube = cube.collapsed('longitude', operation) |
So I'm really not sure what's going on here. I would have expected that a node with 64 GiB of memory should be able to handle 86 GiB of lazy data. I think this is somehow related to dask, as adding a cube.lazy_data().mean(axis=3).compute() to the line above triggers the exact same behavior. At this moment, the (lazy) array looks like that (for me that looks reasonable):
dask.array<concatenate, shape=(300, 37, 721, 1440), dtype=float32, chunksize=(1, 12, 721, 1440), chunktype=numpy.MaskedArray>For comparison, setting up a random array with the same sizes in dask and doing the same operation works perfectly well:
>>> import dask.array as da
>>> x = da.ma.masked_greater(da.random.normal(size=(300, 37, 721, 1440), chunks=(1, 12, 721, 1440)), 0.0)
>>> print(x)
dask.array<masked_greater, shape=(300, 37, 721, 1440), dtype=float64, chunksize=(1, 12, 721, 1440), chunktype=numpy.MaskedArray>
>>> x.mean(axis=3).compute() # works perfectly well with max. ~15 GiB memory@ESMValGroup/esmvaltool-coreteam Has anyone made similar experiences? I know that this is not exactly an issue of ESMValTool, but maybe there is something we can configure in dask that helps here? Apart from using a larger node, there is no way to evaluate this dataset at the moment.