Extend DispatchStub to support CUDA dispatch#9664
Closed
colesbury wants to merge 2 commits intopytorch:masterfrom
Closed
Extend DispatchStub to support CUDA dispatch#9664colesbury wants to merge 2 commits intopytorch:masterfrom
colesbury wants to merge 2 commits intopytorch:masterfrom
Conversation
Contributor
facebook-github-bot
left a comment
There was a problem hiding this comment.
@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
f21d099 to
a75ff47
Compare
This is a modification of the strategy from pytorch#8919 and pytorch#9579. Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel.
a75ff47 to
52152a8
Compare
Collaborator
|
@pytorchbot retest this please |
Contributor
|
Build error looks real: |
Contributor
facebook-github-bot
left a comment
There was a problem hiding this comment.
@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Jul 23, 2018
Summary: This is a modification of the strategy from pytorch/pytorch#8919 and pytorch/pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch/pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef
jramseyer
pushed a commit
to jramseyer/pytorch
that referenced
this pull request
Jul 30, 2018
Summary: This is a modification of the strategy from pytorch#8919 and pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef
goodlux
pushed a commit
to goodlux/pytorch
that referenced
this pull request
Aug 15, 2018
Summary: This is a modification of the strategy from pytorch#8919 and pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a modification of the strategy from #8919 and #9579.