Default TreadPool size to number of physical cores#125963
Default TreadPool size to number of physical cores#125963
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125963
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit be2399a with merge base bbe68a1 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Will this work properly for partial CPU allocations on SLURM clusters? |
|
This might be a noop concern but was also curious how a change like this affects the performance of distributed jobs |
I can only hope for it :) |
|
I'm just worried becaues there might be say 16 logical cores, and only 4 are available to the job meaning that the cores become over-subscribed. |
|
@Skylion007 that is the exact type of problem we're attempting to fix, as processors usually means threads and cores means actual cores (what we're thinking). @malfet's PR will be strictly an improvement on that front as written compared to before the change. |
|
LGTM - tested both a hyperthreaded multicore and non hyperthreaded multicore. It gives the correct thread count now. |
Hi @gajjanag, can you please clarify if you meant you were getting incorrect counts earlier with Thanks! |
Yes, I was getting incorrect counts before this patch (eg a 2 socket 56 core each Intel was giving 224 thread count, but it now gives the correct 112) |
|
Thanks for confirming, @gajjanag! That hasn't been my experience, with HyperThreading enabled (without this patch) - |
We need to chat about this. Perhaps cpuinfo does not correctly work on your system, but it should have returned number of logical cores rather than physical ones. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
TODO: Some benchmarks
|
Successfully rebased |
230082a to
be2399a
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 3 checks: pull / linux-focal-cuda12.4-py3.10-gcc9 / build, pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build, Meta Internal-Only Changes Check Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
TODO: Some benchmarks Pull Request resolved: pytorch#125963 Approved by: https://github.com/janeyx99, https://github.com/Skylion007, https://github.com/gajjanag, https://github.com/jgong5


TODO: Some benchmarks