Skip to content

DOC: Add python-level parallelism tips to the docs #30597

@ngoldbaum

Description

@ngoldbaum

Issue with current documentation:

Right now the docs don't really have any suggestions on how to write performant code with NumPy. With the free-threaded build, there's increasing pressure to support multithreaded use but both multiprocessing and multithreading have seen lots of use with NumPy out in the real world for a long time.

I guess we could also tell people "just use dask!" and maybe these docs should suggest dask but I also think documenting how to work with standard library parallelism primitives is important.

Also, while working on #30494 we've been assembling some knowledge about ways to improve performance under multithreading. We need a place to add these tips in a way that's visible to users.

Idea or request for content:

A new section in the user guide with content about how to use multiple CPU cores to work with numpy arrays. It probably makes sense to cover both multithreading and multiprocessing.

This can include both general guidance for writing performant code as well as tips for micro-optimizations like I point out here: #30514 (comment).

I think it should also probably include general tips about how to write parallel code, including advising to minimize shared state and how to avoid contention around shared state with threads or lots of overhead due to pickle with multiprocessing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    04 - Documentation39 - free-threadingPRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions