Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161576
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit d1c4381 with merge base 443452c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716. |
|
@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716. |
d3d9dfb to
5f42fc3
Compare
5f42fc3 to
30489bb
Compare
30489bb to
f839eab
Compare
f839eab to
d1c4381
Compare
|
@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D81092716. |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Context In pytorch#161183, we added NUMA-binding support for `Callable` entrypoints to `elastic_launch`. However, we would raise an exception if the subprocesses would be spawned in parallel via `ThreadPoolExecutor`, which is an option configurable via the `TORCH_MP_PARALLEL_START` environment variable (see diff). The logic here was that `os.sched_setaffinity`, which we used to set CPU affinities, is [per process](https://docs.python.org/3/library/os.html#os.sched_setaffinity), so there could be a race condition during a parallel start: > Restrict the process with PID pid (or the current process if zero) to a set of CPUs. mask is an iterable of integers representing the set of CPUs to which the process should be restricted. But on further reading, the Linux docs say [`sched_setaffinity` is per *thread*.](https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html) As it turns out, the Python doc is a misnomer. I [verified that `sched_setaffinity` only affects the calling thread, not the entire calling process.](https://gist.github.com/pdesupinski/7e2de3cbe5bb48d489f257b83ccddf07) The upshot is that we actually *can* safely use the inheritance trick from pytorch#161183 even with parallel start, since the setting will be inherited from the calling thread, and `os.sched_setaffinity` only affects the calling thread. # This PR Remove restrictions against parallel start for NUMA binding. Pull Request resolved: pytorch#161576 Approved by: https://github.com/d4l3k
Context
In #161183, we added NUMA-binding support for
Callableentrypoints toelastic_launch.However, we would raise an exception if the subprocesses would be spawned in parallel via
ThreadPoolExecutor, which is an option configurable via theTORCH_MP_PARALLEL_STARTenvironment variable (see diff).The logic here was that
os.sched_setaffinity, which we used to set CPU affinities, is per process, so there could be a race condition during a parallel start:But on further reading, the Linux docs say
sched_setaffinityis per thread. As it turns out, the Python doc is a misnomer.I verified that
sched_setaffinityonly affects the calling thread, not the entire calling process.The upshot is that we actually can safely use the inheritance trick from #161183 even with parallel start, since the setting will be inherited from the calling thread, and
os.sched_setaffinityonly affects the calling thread.This PR
Remove restrictions against parallel start for NUMA binding.
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela