Skip to content

Allow parallel start NUMA binding#161576

Closed
pdesupinski wants to merge 3 commits intomainfrom
feature/parallel-numa-binding
Closed

Allow parallel start NUMA binding#161576
pdesupinski wants to merge 3 commits intomainfrom
feature/parallel-numa-binding

Conversation

@pdesupinski
Copy link
Contributor

@pdesupinski pdesupinski commented Aug 27, 2025

Context

In #161183, we added NUMA-binding support for Callable entrypoints to elastic_launch.

However, we would raise an exception if the subprocesses would be spawned in parallel via ThreadPoolExecutor, which is an option configurable via the TORCH_MP_PARALLEL_START environment variable (see diff).

The logic here was that os.sched_setaffinity, which we used to set CPU affinities, is per process, so there could be a race condition during a parallel start:

Restrict the process with PID pid (or the current process if zero) to a set of CPUs. mask is an iterable of integers representing the set of CPUs to which the process should be restricted.

But on further reading, the Linux docs say sched_setaffinity is per thread. As it turns out, the Python doc is a misnomer.

I verified that sched_setaffinity only affects the calling thread, not the entire calling process.

The upshot is that we actually can safely use the inheritance trick from #161183 even with parallel start, since the setting will be inherited from the calling thread, and os.sched_setaffinity only affects the calling thread.

This PR

Remove restrictions against parallel start for NUMA binding.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161576

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit d1c4381 with merge base 443452c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (torchelastic) labels Aug 27, 2025
@pdesupinski pdesupinski added the topic: not user facing topic category label Aug 27, 2025
@pdesupinski pdesupinski requested a review from d4l3k August 27, 2025 00:23
@pdesupinski pdesupinski added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Aug 27, 2025
@pdesupinski pdesupinski marked this pull request as ready for review August 27, 2025 00:24
@pdesupinski pdesupinski requested a review from albanD as a code owner August 27, 2025 00:24
@facebook-github-bot
Copy link
Contributor

@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 27, 2025
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM LGTM LGTM! gogogo

@facebook-github-bot
Copy link
Contributor

@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716.

@pdesupinski pdesupinski force-pushed the feature/parallel-numa-binding branch from f839eab to d1c4381 Compare August 27, 2025 19:23
@facebook-github-bot
Copy link
Contributor

@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
# Context
In pytorch#161183, we added NUMA-binding support for `Callable` entrypoints to `elastic_launch`.

However, we would raise an exception if the subprocesses would be spawned in parallel via `ThreadPoolExecutor`, which is an option configurable via the `TORCH_MP_PARALLEL_START` environment variable (see diff).

The logic here was that `os.sched_setaffinity`, which we used to set CPU affinities, is [per process](https://docs.python.org/3/library/os.html#os.sched_setaffinity), so there could be a race condition during a parallel start:

> Restrict the process with PID pid (or the current process if zero) to a set of CPUs. mask is an iterable of integers representing the set of CPUs to which the process should be restricted.

But on further reading, the Linux docs say [`sched_setaffinity` is per *thread*.](https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html) As it turns out, the Python doc is a misnomer.

I [verified that `sched_setaffinity` only affects the calling thread, not the entire calling process.](https://gist.github.com/pdesupinski/7e2de3cbe5bb48d489f257b83ccddf07)

The upshot is that we actually *can* safely use the inheritance trick from pytorch#161183 even with parallel start, since the setting will be inherited from the calling thread, and `os.sched_setaffinity` only affects the calling thread.

# This PR
Remove restrictions against parallel start for NUMA binding.

Pull Request resolved: pytorch#161576
Approved by: https://github.com/d4l3k
@github-actions github-actions bot deleted the feature/parallel-numa-binding branch September 27, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (torchelastic) suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants