Skip to content

cuda4dnn: catch common build mistakes early and better diagnostics#17788

Merged
opencv-pushbot merged 2 commits intoopencv:masterfrom
YashasSamaga:cuda4dnn-nice-build
Jul 15, 2020
Merged

cuda4dnn: catch common build mistakes early and better diagnostics#17788
opencv-pushbot merged 2 commits intoopencv:masterfrom
YashasSamaga:cuda4dnn-nice-build

Conversation

@YashasSamaga
Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga commented Jul 9, 2020

  • throws errors if the user marks OPENCV_DNN_CUDA but cuDNN, cuBLAS or CUDA dependencies haven't been resolved
  • check if device supports fp16 before using DNN_TARGET_CUDA_FP16
  • check if device is compatible before use (device cc applicable from the list in CUDA_ARCH_BIN or CUDA_ARCH_PTX )
  • check if a valid device is selected before init
  • reduce the chances of future bugs by setting streams in initCUDABackend (see cuda4dnn(concat): fix stream not being set for concat wrappers after fusion  #17359)
  • check CUDART version with the version used in build
  • check cuDNN version with the version used in build
  • ensure that cuDNN's CUDART version matches with the CUDART version being used

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under OpenCV (BSD) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
allow_multiple_commits=1
force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda:18.04

@YashasSamaga
Copy link
Copy Markdown
Contributor Author

YashasSamaga commented Jul 9, 2020

Would it be possible to store the CUDA version and cuDNN version somewhere so that this can be verified at runtime? This mismatch often causes runtime errors.

cvconfig.h stores HAVE_CUDA and HAVE_CUDNN but it doesn't store the versions.

@alalek
Copy link
Copy Markdown
Member

alalek commented Jul 9, 2020

I believe this change is fine (errors can be caused by non-default variable value).

@YashasSamaga YashasSamaga force-pushed the cuda4dnn-nice-build branch from 7422ba5 to ee7dcef Compare July 11, 2020 17:51
@YashasSamaga YashasSamaga force-pushed the cuda4dnn-nice-build branch from eece025 to 1949056 Compare July 13, 2020 16:05
@YashasSamaga YashasSamaga marked this pull request as ready for review July 13, 2020 16:18
@asmorkalov asmorkalov self-assigned this Jul 15, 2020
@asmorkalov asmorkalov self-requested a review July 15, 2020 12:02
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov
Copy link
Copy Markdown
Contributor

I tested the patch with Ubuntu 18.04 and CUDA 10.2. There is some other CuBLAS and NPP search issue in CMake, but it's not relevant to DNN.

@opencv-pushbot opencv-pushbot merged commit cd0f038 into opencv:master Jul 15, 2020

auto d2h_stream = cuda4dnn::csl::Stream(true); // stream for background D2H data transfers
cudaInfo = std::unique_ptr<CudaInfo_t>(new CudaInfo_t(std::move(context), std::move(d2h_stream)));
cuda4dnn::checkVersions();
Copy link
Copy Markdown
Contributor Author

@YashasSamaga YashasSamaga Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive facepalm. This is a very embarrassing mistake. If a version mismatch occurs, exceptions will be thrown even before it reaches this line. This makes this version check mechanism completely useless.

It's not a bug that breaks things. It just makes this version check useless.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the check to proper place with another PR then?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am planning to do that. This time I am trying to test by making lots of containers with broken builds. So this will take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants