Skip to content

open_dataset is not thread-safe #4100

@paulkernfeld

Description

@paulkernfeld

👋 Hi, great library! I've been trying to use xarray from a flask server and have encountered frequent segfaults when trying to load a file.

MCVE Code Sample

Putting together this MCVE example took a bit of time, but it was a good exercise for me. I enjoyed learning more about the MVCE philosophy! 😊 The only caveat is that reproducibility is difficult for this kind of threading issue, so I can't guarantee that the bug will reproduce. If you have any suggestions for improved reproducibility, please let me know what I can do!

import threading
import xarray as xr


SAVED_FILE_NAME = "saved.nc"

# Modifying these items may change the likelihood of hitting a segfault
N_ELEMENTS = 100
N_THREADS = 2

if __name__ == '__main__':
    xr.Dataset({'foo': ('x', range(N_ELEMENTS))}).to_netcdf(SAVED_FILE_NAME)

    threads = [
        threading.Thread(target=lambda: xr.load_dataset(SAVED_FILE_NAME, engine="netcdf4"))
        for _ in range(N_THREADS)
    ]
    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()

print("No segfault!")

Expected Output

Program prints No segfault! and exits successfully

Problem Description

The program sometimes segfaults. When running with the Python fault handler, I often get an output that looks like this:

Fatal Python error: Segmentation fault

Thread 0x0000700002c2f000 (most recent call first):
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112 in __enter__
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 362 in _acquire
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 211 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 192 in acquire_context

Current thread 0x0000700004a41000 (most recent call first):
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
  File "/usrzsh: segmentation fault  ./xarray-segfault

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.4 (default, Sep 7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)]
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.4
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.4.0
pip: 20.1.1
conda: None
pytest: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions