Skip to content

BUG: Multithreaded scaling bottlenecks on free-threaded Python #30494

@ngoldbaum

Description

@ngoldbaum

This stackoverflow question seems to have identified two real scaling bottlenecks inside NumPy.

I build Python 3.15t from source using pyenv:

$ CFLAGS=-O2 pyenv install 3.15t-dev -k -g
...
$ pyenv global 3.15t-dev-debug

Then, in an up-to-date clone of the NumPy repo:

python -m pip install . -v -Cbuilddir=build -C'setup-args=-Dbuildtype=debugoptimized'

And finally run this script, which has to be named mtmp.py: https://gist.github.com/ngoldbaum/ff6428e2510e991247eba4c809dcc8ac. This is identical to what was posted on StackOverflow but with some code commented out.

On my M3 Macbook Pro, I get the following stdout running the script:

Inner loops 10, multithreading  time: 6.68 sec, result sum: 717434683.1879175
Inner loops 10, multiprocessing time: 4.86 sec, result sum: 717434683.1879175

If I run the script like so:

PYTHONPERFSUPPORT=1 samply record $(pyenv which python) mtmp.py

Then I get a profile output like this: https://share.firefox.dev/3Lde1Lm. I see two different scaling bottlenecks. First, there seem to be a mutex inside tracemalloc that has contention.

Second, the critical section I added to the array creation routines in #29394 is causing some contention in this case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    00 - Bug39 - free-threadingPRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions