Skip to content

[Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h#39881

Closed
pbelevich wants to merge 2 commits intogh/pbelevich/132/basefrom
gh/pbelevich/132/head
Closed

[Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h#39881
pbelevich wants to merge 2 commits intogh/pbelevich/132/basefrom
gh/pbelevich/132/head

Conversation

@pbelevich
Copy link
Contributor

@pbelevich pbelevich commented Jun 11, 2020

Fixes #39471

Reland of #39612

@zou3519 :

So the conclusion is: your PR may have just changed the compilation speeds of everything so that the clobbering (see Pytorch OSS CI thread) is noticeable

Should be fixed by #39863

Stack from ghstack:

Differential Revision: D22008317

…TIVE_TBB to ATen/Config.h

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jun 11, 2020

💊 CI failures summary and remediations

As of commit d06178e (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 9 times.

…PARALLEL_NATIVE_TBB to ATen/Config.h"

Differential Revision: [D22008317](https://our.internmc.facebook.com/intern/diff/D22008317)

[ghstack-poisoned]
pbelevich added a commit that referenced this pull request Jun 12, 2020
…TIVE_TBB to ATen/Config.h

ghstack-source-id: 52e279b
Pull Request resolved: #39881
@pbelevich pbelevich requested review from ezyang, malfet and zou3519 June 12, 2020 14:00
@ezyang
Copy link
Contributor

ezyang commented Jun 12, 2020

I didn't follow the revert; what broke, and how does this fix it?

@pbelevich
Copy link
Contributor Author

@ezyang yesterday morning conversation about docker images between you and @zou3519:

@zou3519 :

So the conclusion is: your PR may have just changed the compilation speeds of everything so that the clobbering (see Pytorch OSS CI thread) is noticeable

Should be fixed by #39863

@facebook-github-bot
Copy link
Contributor

@pbelevich merged this pull request in e62d655.

@facebook-github-bot
Copy link
Contributor

@pbelevich merged this pull request in e62d655.

@pbelevich
Copy link
Contributor Author

@ezyang
pytorch extensions require including torch/extension.h which includes all.h which includes utils.h which includes Parallel.h.
So you can call for instance parallel_for() from your extension, but it will fail because of Parallel.h has only declarations and definitions are conditionally included Parallel.h(see last lines of Parallel.h).
I tried to solve it #39612 and #39881 by including Config.h into Parallel.h
But if Pytorch is built with TBB it provides Config.h that has AT_PARALLEL_NATIVE_TBB=1(see #3961 or #39881) and it means that if you include
torch/extension.h which transitively includes Parallel.h which transitively includes tbb.h which is not available! Now the question is: should we provide TBB(and other libs) headers if PyTorch was built with it? Or exclude Parallel.h as a transitive dependency of torch/extension.h ?

@ezyang
Copy link
Contributor

ezyang commented Jun 15, 2020

@ilia-cher may know better, but assuming you were right in #39471 and if you compile PyTorch with TBB you MUST use TBB implementation of Parallel.h, then I would certainly expect that tbb.h should be available in one way or another if you want to use the Parallel.h functionality.

Now, IIRC, we don't ever distribute binaries that are linked against tbb, so if I understand correctly, your problems here are just because of the TBB CI job. In a perfect universe, we'd have a proper distribution story for TBB, but this would require you to solve a lot more problems (because we'd need to say where tbb is actually coming from; are we relying on a third-party conda package, or distributing our own copy ourselves? What happens with mkl?) So it doesn't seem like the right call is to try to fix the distribution problem for TBB, unless you want to solve this problem for its own sake.

But it also seems likely that you will run into problems with removing Parallel.h from the dependency chain of torch/extension.h, as this would technically be a BC-breaking change, making functionality that was previously implicitly available no longer available. And it's not unreasonable to think that third party C++ extensions will be using the functionality of Parallel.h, since it provides functionality that would be generally useful if you were writing a kernel.

So, if I were in this situation, I'd probably look for some surgical fix for the relevant CI configuration, rather than a general fix. But that is just an opinion.

@ilia-cher
Copy link
Contributor

ilia-cher commented Jun 15, 2020

"if you compile PyTorch with TBB you MUST use TBB implementation of Parallel.h, then I would certainly expect that tbb.h should be available in one way or another if you want to use the Parallel.h functionality."

That's correct, if you set TBB option then we have to use TBB version of Parallel.h and we do use TBB submodule, is this different now?
And yes this option exists for users who want/can build PyTorch themselves.

Are we still running TBB build in CI (master)? @pbelevich @ezyang

@pbelevich
Copy link
Contributor Author

pbelevich commented Jun 15, 2020

Are we still running TBB build in CI (master)? @pbelevich @ezyang

yes, here is the example of tbb/tbb.h: No such file or directory when I tried to land #39881

@ezyang this is my "surgical fix"

pbelevich added a commit that referenced this pull request Jun 15, 2020
…L_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 and #39881


Differential Revision: [D22050656](https://our.internmc.facebook.com/intern/diff/D22050656)

[ghstack-poisoned]
pbelevich added a commit that referenced this pull request Jun 16, 2020
…L_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 and #39881


Differential Revision: [D22050656](https://our.internmc.facebook.com/intern/diff/D22050656)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Jun 16, 2020
…_PARALLEL_NATIVE_TBB to ATen/Config.h (#40045)

Summary:
Pull Request resolved: #40045

Fixes #39471

Reland of #39612 and #39881

Test Plan: Imported from OSS

Differential Revision: D22050656

Pulled By: pbelevich

fbshipit-source-id: 274bc0733c37ef6c87deb3344bb49ca9107e257b
@facebook-github-bot facebook-github-bot deleted the gh/pbelevich/132/head branch June 16, 2020 15:19
@ezyang
Copy link
Contributor

ezyang commented Jun 16, 2020

That's correct, if you set TBB option then we have to use TBB version of Parallel.h and we do use TBB submodule, is this different now?
And yes this option exists for users who want/can build PyTorch themselves.

Essentially, what I am saying is that the commits that added TBB to PyTorch were incomplete: a complete set of diffs also needs to arrange for the correct TBB headers to be made available to C++ extensions. Since we don't seem to actually distribute official TBB binaries for anyone, it seems reasonable enough to skip doing this today, but if you ever want to actually have people use TBB for real, this is one of the things that still needs to get fixed.

pbelevich added a commit that referenced this pull request Jun 16, 2020
…AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 #39881 and #40045


Differential Revision: [D22076711](https://our.internmc.facebook.com/intern/diff/D22076711)

[ghstack-poisoned]
pbelevich added a commit that referenced this pull request Jun 16, 2020
…AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 #39881 and #40045


Differential Revision: [D22076711](https://our.internmc.facebook.com/intern/diff/D22076711)

[ghstack-poisoned]
pbelevich added a commit that referenced this pull request Jun 16, 2020
…AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 #39881 and #40045


Differential Revision: [D22076711](https://our.internmc.facebook.com/intern/diff/D22076711)

[ghstack-poisoned]
pbelevich added a commit that referenced this pull request Jun 16, 2020
…AT_PARALLEL_NATIVE_TBB to ATen/Config.h"

Fixes #39471

Reland of #39612 #39881 and #40045


Differential Revision: [D22076711](https://our.internmc.facebook.com/intern/diff/D22076711)

[ghstack-poisoned]
@ilia-cher
Copy link
Contributor

@ezyang sure we can add headers, though regarding your last point, I'd want to note that people (companies) actually do use TBB builds of pytorch (I can provide more detail)

@ezyang
Copy link
Contributor

ezyang commented Jun 18, 2020

Yeah, in this case I'm specifically talking about our OSS distributions, e.g., the wheels and conda packages.

facebook-github-bot pushed a commit that referenced this pull request Jun 19, 2020
…_NATIVE_TBB to ATen/Config.h (#40211)

Summary:
Fixes #39471

Reland of #39612 #39881 #40045 #40122

Proof: [green TBB test](https://app.circleci.com/pipelines/github/pytorch/pytorch/182769/workflows/ae9f4f7a-791a-49df-9625-e2f0a51e70e7/jobs/5910591/steps)
Pull Request resolved: #40211

Reviewed By: malfet

Differential Revision: D22128537

Pulled By: pbelevich

fbshipit-source-id: 98c589405daafc2c81f76e1d5c1aef5e57065351
xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
…TIVE_TBB to ATen/Config.h (pytorch#39881)

Summary: Pull Request resolved: pytorch#39881

Test Plan: Imported from OSS

Differential Revision: D22008317

Pulled By: pbelevich

fbshipit-source-id: b25714d8643cf584bb3331d70e44f4df06c1b615
xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
…_PARALLEL_NATIVE_TBB to ATen/Config.h (pytorch#40045)

Summary:
Pull Request resolved: pytorch#40045

Fixes pytorch#39471

Reland of pytorch#39612 and pytorch#39881

Test Plan: Imported from OSS

Differential Revision: D22050656

Pulled By: pbelevich

fbshipit-source-id: 274bc0733c37ef6c87deb3344bb49ca9107e257b
xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
…ARALLEL_NATIVE_TBB to ATen/Config.h (pytorch#40211)

Summary:
Fixes pytorch#39471

Reland of pytorch#39612 pytorch#39881 pytorch#40045 pytorch#40122

Proof: [green TBB test](https://app.circleci.com/pipelines/github/pytorch/pytorch/182769/workflows/ae9f4f7a-791a-49df-9625-e2f0a51e70e7/jobs/5910591/steps)
Pull Request resolved: pytorch#40211

Reviewed By: malfet

Differential Revision: D22128537

Pulled By: pbelevich

fbshipit-source-id: 98c589405daafc2c81f76e1d5c1aef5e57065351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants