Lazily initialise thread local num_threads value#37461
Lazily initialise thread local num_threads value#37461peterbell10 wants to merge 5 commits intopytorch:masterfrom
Conversation
|
generally LG, here we make sure that if thread uses parallel api then init_num_threads is called but I thought the original issue was that a user created thread reverted openmp settings back to the default, if user code does not run the code from ATen/Parallel* then it will still be using default OpenMP settings could we also then add a pybind binding for init_num_threads, so that users can at least explicitly call it? |
|
Okay, added Based on the existence of |
💊 CI failures summary and remediationsAs of commit 601d192 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 21 times. |
1d62fc7 to
49fba19
Compare
|
you can probably push the button ready for review to publish the PR |
torch/csrc/Module.cpp
Outdated
There was a problem hiding this comment.
I think we have a better way to create a binding, and this one is deprecated, we moved to pybind, example of how to add a binding in one line of code: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/init.cpp#L57
There was a problem hiding this comment.
When I used pybind11, add_docstr failed because the function already had an (empty) docstring. Obviously I can just define the docstring in Module.cpp but then the doc strings aren't all in one place. What do you think?
There was a problem hiding this comment.
Moved to pybind, can you take another look @ilia-cher?
There was a problem hiding this comment.
oh, pragma omp creeped back into the codebase, we'll need to change this to parallel_for, but we don't have to do this in this PR
aten/src/ATen/Parallel.h
Outdated
There was a problem hiding this comment.
I'm a bit concerned about multiple compilation units including this function in the header, could we move the implementation into the .cpp ? e.g. ATen/ParallelCommon.cpp
There was a problem hiding this comment.
My intention is that the cheap boolean check will be inlined into the caller and in the overwhelmingly likely case, there is no function call overhead.
There was a problem hiding this comment.
Also, it's marked inline so duplicates in multiple compilation units will be deduplicated at link time.
|
overall LG, thanks! left some minor comments |
49fba19 to
c520098
Compare
c520098 to
601d192
Compare
|
will check today, thanks! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
|
||
| namespace internal { | ||
|
|
||
| // Initialise num_threads lazily at first parallel call |
There was a problem hiding this comment.
It would be good to have more explicit instructions about when to use this. AFAICT, you have to call this before every omp pragma?
There was a problem hiding this comment.
unfortunately there are are some new pragmas that got into the codebase, I specifically killed all of them earlier and used at::parallel_for, but I'll follow up with a diff to clean them up
In general, we do call at::init_num_threads in places we control (i.e. except explicit user created threads); this PR is more a safe guard to make sure the original settings are respected if user creates a new thread themselves and user uses at::paralllel_for
So, technically after this PR the only case we don't cover is when user explicitly creates their own thread and user doesn't use at::parallel_for (and e.g. uses OpenMP directly), but in this case we can't really do much anyways
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@ilia-cher merged this pull request in 5137827. |
Fixes #37259, fixes #20156
This lazily calls
at::init_num_threadsonce for each thread by adding a call tolazy_init_num_threadsinat::parallel_forandat::parallel_reduce.If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP.