Skip to content

BUG: os.environ manipluation during import can lead to race condition import failures #30627

@gpshead

Description

@gpshead

Describe the issue:

this one is twisty, bear with me as I don't know where to start but I do have it figured out:

finally:
for envkey in env_added:
del os.environ[envkey]

in particular does an unconditional del os.envion[XXX] which can fail if os.environ has been modified by another thread while this numpy import was happening. leaving the numpy import in an incomplete state until importlib finishes unwinding as it makes an ImportError. "hilarity" ensues if anything accesses numpy in this state (intended to be impossible for anything to get a reference to the incomplete numpy... but... rabbit hole: my rabbit hole sunk deeper and led to the creation of python/cpython#143650 and a CPython bugfix - so it can happen).

Workaround: import numpy before any thread that'd otherwise do such an import is spawned.

why is modifying os.environ tricky? it's a global and in this case, test fixtures and tests of all forms like to do things such as mock or monkeypatch it either in its entirety or just saving and restoring its contents. that, racing with an import that is expecting exclusive consistent access to os.environ, is non-deterministic.

if it hurts "don't do that" is valid here - but applies equally to all sides sadly. 😅

this is the real ask for this unusual issue:

Can there be a better way to plum those extension module load+init time only (right?) process environment *BLAS_MAIN_FREE settings into the .multiarray import presumably to be seen by native extension modules more directly? I realize this is hard and fundamental perhaps to how BLAS native code "works". But numpy could instead call os.putenv(key, "1") and os.unsetenv(key) in the cleanup to be more direct and not involve Python's os.environ proxy. as I'm hoping no Python level code needs to see these temporary environment variable changes in the higher level os.environ proxy dict right? (ever hopeful?)

Reproduce the code example:

Hard to reproduce as it's concurrency related, but this does appear to have been the root cause until CPython's own bugfix is in and widely deployed (slow).

Python and NumPy Versions:

Originally observed on Python 3.13.11 and numpy 2.3.5

How does this issue affect you or how did you find it:

  1. we had a pytest plugin that was spawning a thread that transitively wound up triggering a numpy import.
  2. when that plugin was enabled while running pytest on code that otherwise has nothing to do with numpy and thus does not import numpy itself... it showed up as flakiness during pytest post-test teardown... because our pytest conftest.py fixture did another import of something during teardown that used numpy and had a dataclass numpy.ndarray type annotation but occasional the numpy module would be incomplete raise a mystery AttributeError.

thankfully this appears to only happen in wacky CI test scenarios. code that actually uses numpy outside of tests tends to import it up front and not leave that until later sometimes from two threads at once while another diddles with the environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions