Skip to content

BUG: blas_fpe_check() deadlocks on glibc < 2.34 when numpy is imported from within dlopen (MKL + OpenMP) #31284

Description

@isuruf

Describe the issue:

blas_fpe_check() in numpy/__init__.py (added in #30102, backported to 2.3.5) executes ones((20,20)) @ ones((20,20)) at import time. When numpy is imported from within a dlopen() context — e.g. a C extension's static constructor calling PyImport_ImportModule("numpy") — this causes a deadlock on glibc < 2.34 with MKL BLAS.

Mechanism:

  1. Python loads a C extension via dlopen() — glibc acquires dl_load_lock
  2. The extension's static constructor calls PyImport_ImportModule("numpy")
  3. numpy's __init__.py runs blas_fpe_check()x @ xcblas_dgemm
  4. MKL dispatches dgemm via OpenMP, spawning worker threads
  5. Worker threads call mkl_serv_load_fun()dlsym() → tries to acquire dl_load_lock
  6. Deadlock: main thread holds dl_load_lock and waits at __kmp_join_barrier for workers; workers block on dl_load_lock held by main thread

On glibc ≥ 2.34, dlopen/dlsym use a recursive lock (dl_load_lock2) so this doesn't deadlock. On glibc < 2.34 (RHEL 8, CentOS 8, Amazon Linux 2, etc.), the lock is non-reentrant.

Prior to #30102, no BLAS computation happened at import time, so this was not an issue.

Reproduce the code example:

# Create environment with MKL-backed numpy and any C extension that
# imports numpy from a static constructor (csp does this):
conda create -n test python=3.11 csp "blas=*=mkl" "numpy=2.3.5"



# repro.py — deadlocks on glibc < 2.34
import importlib.util
import os
import sysconfig

site_packages = sysconfig.get_path("platlib")
so_path = os.path.join(site_packages, "csp", "lib", "_cspimpl.so")

# Loading _cspimpl.so via dlopen triggers its C++ static constructors,
# which call PyImport_ImportModule("numpy"). This worked before 2.3.5.
spec = importlib.util.spec_from_file_location("_cspimpl", so_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)  # hangs forever
print("OK")


**Workaround**pre-import numpy before loading the extension:

import numpy  # ensures blas_fpe_check runs outside dlopen context
import csp     # now safe

Error message:

From GDB

Main thread (waiting for OpenMP workers):

#0  pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  __kmp_suspend_64 () from libiomp5.so
#2  __kmp_wait_template () from libiomp5.so
#3  __kmp_hyper_barrier_gather () from libiomp5.so
#4  __kmp_join_barrier () from libiomp5.so
#5  __kmp_internal_join () from libiomp5.so
#6  __kmp_join_call () from libiomp5.so
#7  __kmpc_fork_call () from libiomp5.so
#8  mkl_blas_dgemm_omp_driver_v1 () from libmkl_intel_thread.so.2
#9  mkl_blas.dgemm () from libmkl_gf_lp64.so.2
#10 cblas_dgemm () from libmkl_gf_lp64.so.2
#11 DOUBLE_matmul_matrixmatrix () from _multiarray_umath.so   <-- x @ x
...
#36 import_find_and_load (abs_name='numpy')                   <-- PyImport_ImportModule


Worker threads (blocked on dl_load_lock):

#0  __lll_lock_wait () from /lib64/libpthread.so.0
#1  pthread_mutex_lock () from /lib64/libpthread.so.0
#2  dlsym () from /lib64/libdl.so.2
#3  mkl_serv_load_fun () from libmkl_core.so.2
#4  mkl_blas_dgemm_mscale () from libmkl_core.so.2
#5  mkl_blas_dgemm_omp_driver_v1.extracted () from libmkl_intel_thread.so.2

Python and NumPy Versions:

  • numpy 2.3.5 / 2.4.3 (conda-forge, MKL variant via blas=*=mkl)
  • Python 3.11.15
  • glibc 2.28 (RHEL 8.10, kernel 4.18.0-553)
  • MKL 2025.3.1, LLVM OpenMP 22.1.3
  • x86_64

Bisect results:

numpy version blas_fpe_check Deadlocks on glibc 2.28 + MKL?
2.2.6 No No
2.3.0 – 2.3.4 No No
2.3.5 Yes (backport) Yes
2.4.0 – 2.4.3 Yes Yes

Runtime Environment:

No response

How does this issue affect you or how did you find it:

Suggested fix:

Guard blas_fpe_check() to only run on platforms where it's needed (ARM/Apple Silicon with Accelerate), or set MKL_NUM_THREADS=1 / OMP_NUM_THREADS=1 for the duration of the check to prevent MKL from spawning worker threads:

def blas_fpe_check():
    with errstate(all='raise'):
        x = ones((20, 20))
        try:
            # Avoid spawning OpenMP threads during import, which deadlocks
            # on glibc < 2.34 if numpy is loaded from within dlopen().
            import os
            old = os.environ.get("MKL_NUM_THREADS")
            os.environ["MKL_NUM_THREADS"] = "1"
            try:
                x @ x
            finally:
                if old is None:
                    os.environ.pop("MKL_NUM_THREADS", None)
                else:
                    os.environ["MKL_NUM_THREADS"] = old
        except FloatingPointError:
            ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions