cuda: update default PTX behaviour when CUDA_ARCH_BIN is unset#24131
Merged
asmorkalov merged 1 commit intoopencv:4.xfrom Sep 14, 2023
Merged
cuda: update default PTX behaviour when CUDA_ARCH_BIN is unset#24131asmorkalov merged 1 commit intoopencv:4.xfrom
cuda: update default PTX behaviour when CUDA_ARCH_BIN is unset#24131asmorkalov merged 1 commit intoopencv:4.xfrom
Conversation
cuda: add default PTX when CUDA_ARCH_BIN is missing
…CH_PTX to be passed in isolation
ee154ea to
358e306
Compare
Contributor
Author
|
@asmorkalov I have updated the PR. |
cuda: add default PTX when CUDA_ARCH_BIN is missingcuda: update default PTX behaviour when CUDA_ARCH_BIN is unset
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently when
CUDA_ARCH_BINisn't specified CMake attempts to build .cu files for the all architechtures it knows about, generating a list of architectures supported by the current CUDA Toolkit (see #17432). The idea being that if a user doesn't specify the arch then to be on the safe side it should generate for all architechtures for maximum compatibility. Additionally if a user just wants to specify a PTX architechture they have to passCUDA_ARCH_BIN=(incurring the overhead of callingocv_filter_available_architectureon all know architectures) in addition toCUDA_ARCH_PTX=<TAERGET ARCH>to CMake to avoid generating binary code for all supported architectures as well.This PR proposes that:
CUDA_ARCH_BINandCUDA_ARCH_PTXare not specified, in addition to binary code for all compute capabilities, PTX code for the highest supported architecture is also generated. This would ensure that the CUDA device code will also run on newer GPU's than the installed CUDA Toolkit, increasing the level of compatibility futher.CUDA_ARCH_PTXis passed in isolation only PTX code for the desired CC is generated.@tomoaki0705
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.