Lazily initialise thread local num_threads value by peterbell10 · Pull Request #37461 · pytorch/pytorch

peterbell10 · 2020-04-28T21:19:17Z

Fixes #37259, fixes #20156

This lazily calls at::init_num_threads once for each thread by adding a call to lazy_init_num_threads in at::parallel_for and at::parallel_reduce.

If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP.

ilia-cher · 2020-04-28T21:26:16Z

generally LG, here we make sure that if thread uses parallel api then init_num_threads is called

but I thought the original issue was that a user created thread reverted openmp settings back to the default, if user code does not run the code from ATen/Parallel* then it will still be using default OpenMP settings

could we also then add a pybind binding for init_num_threads, so that users can at least explicitly call it?

torch/csrc/Module.cpp

peterbell10 · 2020-04-28T22:48:18Z

Okay, added torch.init_num_threads. I think I've also now guarded all OpenMP parallel regions.

Based on the existence of mkl_set_num_threads_local I'm thinking that mkl_set_num_threads isn't thread local and so we don't need to add lazy init before calling MKL functions. So, I think that's everything?

dr-ci · 2020-04-28T23:09:19Z

💊 CI failures summary and remediations

As of commit 601d192 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/py3.6-clang7-rocmdeb-ubuntu16.04

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 21 times.

ilia-cher · 2020-04-29T19:52:53Z

you can probably push the button ready for review to publish the PR

ilia-cher · 2020-04-29T20:36:50Z

torch/csrc/Module.cpp

I think we have a better way to create a binding, and this one is deprecated, we moved to pybind, example of how to add a binding in one line of code: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/init.cpp#L57

When I used pybind11, add_docstr failed because the function already had an (empty) docstring. Obviously I can just define the docstring in Module.cpp but then the doc strings aren't all in one place. What do you think?

Moved to pybind, can you take another look @ilia-cher?

ilia-cher · 2020-04-29T20:36:55Z

aten/src/ATen/native/AdaptiveAveragePooling3d.cpp

oh, pragma omp creeped back into the codebase, we'll need to change this to parallel_for, but we don't have to do this in this PR

ilia-cher · 2020-04-29T20:36:58Z

aten/src/ATen/Parallel.h

I'm a bit concerned about multiple compilation units including this function in the header, could we move the implementation into the .cpp ? e.g. ATen/ParallelCommon.cpp

My intention is that the cheap boolean check will be inlined into the caller and in the overwhelmingly likely case, there is no function call overhead.

Also, it's marked inline so duplicates in multiple compilation units will be deduplicated at link time.

ilia-cher · 2020-04-29T20:37:56Z

overall LG, thanks! left some minor comments

ilia-cher · 2020-05-07T19:04:04Z

will check today, thanks!

facebook-github-bot

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-05-08T21:06:39Z

aten/src/ATen/Parallel.h


+namespace internal {
+
+// Initialise num_threads lazily at first parallel call


It would be good to have more explicit instructions about when to use this. AFAICT, you have to call this before every omp pragma?

unfortunately there are are some new pragmas that got into the codebase, I specifically killed all of them earlier and used at::parallel_for, but I'll follow up with a diff to clean them up

In general, we do call at::init_num_threads in places we control (i.e. except explicit user created threads); this PR is more a safe guard to make sure the original settings are respected if user creates a new thread themselves and user uses at::paralllel_for
So, technically after this PR the only case we don't cover is when user explicitly creates their own thread and user doesn't use at::parallel_for (and e.g. uses OpenMP directly), but in this case we can't really do much anyways

facebook-github-bot

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-05-12T00:16:59Z

@ilia-cher merged this pull request in 5137827.

peterbell10 added the open source label Apr 28, 2020

peterbell10 requested review from ezyang and ilia-cher April 28, 2020 21:24

peterbell10 mentioned this pull request Apr 28, 2020

Bad performance with python threads #37259

Closed

ilia-cher reviewed Apr 28, 2020

View reviewed changes

torch/csrc/Module.cpp Show resolved Hide resolved

peterbell10 force-pushed the lazy_init_num_threads branch from 1d62fc7 to 49fba19 Compare April 29, 2020 15:07

peterbell10 marked this pull request as ready for review April 29, 2020 20:09

peterbell10 changed the title ~~WIP: Lazily initialise thread local num_threads value~~ Lazily initialise thread local num_threads value Apr 29, 2020

ilia-cher reviewed Apr 29, 2020

View reviewed changes

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 5, 2020

peterbell10 added 4 commits May 7, 2020 17:56

Lazily initialise thread local num_threads value

78e9d37

Add torch.init_num_threads to python API

e4299af

Add lazy_init to remaining openmp calls

5d8e35f

Register init_num_threads in relevent places

f54bc25

peterbell10 force-pushed the lazy_init_num_threads branch from 49fba19 to c520098 Compare May 7, 2020 16:58

Bind init_num_threads with pybind11

601d192

peterbell10 force-pushed the lazy_init_num_threads branch from c520098 to 601d192 Compare May 7, 2020 17:14

ilia-cher approved these changes May 8, 2020

View reviewed changes

facebook-github-bot reviewed May 8, 2020

View reviewed changes

ezyang reviewed May 8, 2020

View reviewed changes

ezyang approved these changes May 8, 2020

View reviewed changes

facebook-github-bot reviewed May 11, 2020

View reviewed changes

facebook-github-bot closed this in 5137827 May 11, 2020

facebook-github-bot added the merged label May 12, 2020

mruberry added the Merged label Oct 28, 2020

malfet mentioned this pull request Nov 23, 2022

Remove TORCH_API from inline at::internal::lazy_init_num_thread #89511

Closed


		namespace internal {

		// Initialise num_threads lazily at first parallel call

Conversation

peterbell10 commented Apr 28, 2020

Uh oh!

ilia-cher commented Apr 28, 2020

Uh oh!

Uh oh!

peterbell10 commented Apr 28, 2020

Uh oh!

dr-ci bot commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

ilia-cher commented Apr 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilia-cher commented Apr 29, 2020

Uh oh!

ilia-cher commented May 7, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dr-ci bot commented Apr 28, 2020 •

edited

Loading