Skip to content

Do not accept FP16 target on old, incompatible Nvidia cards#21462

Closed
JulienMaille wants to merge 1 commit intoopencv:4.xfrom
JulienMaille:patch-2
Closed

Do not accept FP16 target on old, incompatible Nvidia cards#21462
JulienMaille wants to merge 1 commit intoopencv:4.xfrom
JulienMaille:patch-2

Conversation

@JulienMaille
Copy link
Copy Markdown
Contributor

Fix for #21461

@JulienMaille JulienMaille force-pushed the patch-2 branch 2 times, most recently from 29c4661 to 05598d4 Compare January 17, 2022 09:14
@YashasSamaga
Copy link
Copy Markdown
Contributor

I think there is a corner case that is missed with this patch (if it's going into 3.4) if OpenCL devices can be changed at runtime (I am not sure if changing devices at runtime is officially supported). See this comment: #21461 (comment)

@JulienMaille JulienMaille changed the base branch from 3.4 to 4.x January 17, 2022 09:17
@JulienMaille JulienMaille changed the title Do not report for FP16 compatibility on old Nvidia cards Do not accept FP16 target on old, incompatible Nvidia cards Jan 17, 2022
@JulienMaille
Copy link
Copy Markdown
Contributor Author

JulienMaille commented Jan 17, 2022

@alalek @YashasSamaga I have updated my PR to better suit your approach

@JulienMaille JulienMaille requested a review from alalek January 17, 2022 10:13
{
#ifdef HAVE_CUDA
if (!cuda4dnn::doesDeviceSupportFP16())
impl->preferableTarget = DNN_TARGET_CUDA;
Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to understand this fix. This check is already done in initCUDABackend and it also sends an error message. This check here will render that check useless. What is this check doing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will auto-correct target when setPreferableTarget(TARGET_CUDA_FP16) is called. A lot of things can happen after initCudaBackend, and there are no guarantees that the preferableTarget has not changed by the time Net::forward() is called. Or I'm missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YashasSamaga Ok, I agree this does render your other check useless (and I can see that I don't have the auto-correct warning message when I add my fix). However it does indeed fixes some corner case because it won't complain about missing convolution for half anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing backend/target would reset the network which would force a reinitialization. I think there is a bug elsewhere and probably along the lines you're hinting at. Maybe something does indeed happen before the FP16->FP32 switch in initCUDABackend. This might fix the problem for now though but I think it's better to find the root cause than hide it.

{
#ifdef HAVE_CUDA
if (!cuda4dnn::doesDeviceSupportFP16())
impl->preferableTarget = DNN_TARGET_CUDA;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YashasSamaga Are you OK with proposed change?

Copy link
Copy Markdown
Contributor Author

@JulienMaille JulienMaille Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think he is, my change patches the bug but is closer to a workaround rather than a proper fix

Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that this will involve the CUDA API before the diagnostic checks on the version, compatibility, etc. happen in initCUDABackend.

I think the bug, as @JulienMaille pointed out in another comment, is that the target for layers is set even before initCUDABackend is called. So changing the target inside initCUDABackend is useless.

One option would be to completely reinitialize the net object on fallback or redo the diagnostic checks in the setPreferableTarget before the FP16 check.

@asmorkalov
Copy link
Copy Markdown
Contributor

@JulienMaille Do you have progress with the patch? Please pay attention on conflicts.

@JulienMaille
Copy link
Copy Markdown
Contributor Author

@YashasSamaga told me he was working on a cleaner fix so I've not reviewed this patch recently.

@asmorkalov
Copy link
Copy Markdown
Contributor

@YashasSamaga Do you have any update on alternative solution?

@YashasSamaga
Copy link
Copy Markdown
Contributor

I am quite rusty on the internals now but I have a hypothesis. I think that the target for the layers are set before initCUDABackend is called and changing the target in the Net object inside initCUDABackend won't automatically change the target in the layers. Therefore, even though the net target is changed, the initCUDA code, which rely on Layer::preferableTarget will initialize with the original FP16 target.

I think re-initializing (Net::setPreferableBackend calls clear() on backend change which forces reinit) the network after the switch should fix the problem.

@asmorkalov asmorkalov removed this from the 4.8.0 milestone May 5, 2023
@Jamim
Copy link
Copy Markdown
Contributor

Jamim commented Jul 7, 2024

Hello @alalek and @asmorkalov,

Would you mind reviewing an alternative fix for that issue?

Thanks in advance!

@asmorkalov
Copy link
Copy Markdown
Contributor

Closed in favor #25880

@asmorkalov asmorkalov closed this Jul 9, 2024
@JulienMaille JulienMaille deleted the patch-2 branch July 9, 2024 09:04
asmorkalov pushed a commit that referenced this pull request Jul 10, 2024
Fix CUDA for old GPUs without FP16 support #25880

Fixes #21461

~This is a build-time solution that reflects https://github.com/opencv/opencv/blob/4.10.0/modules/dnn/src/cuda4dnn/init.hpp#L68-L82.~
~We shouldn't add an invalid target while building with `CUDA_ARCH_BIN` < 53.~
_(please see [this discussion](#25880 (comment)

This is a run-time solution that basically reverts [these lines](d0fe6ad#diff-757c5ab6ddf2f99cdd09f851e3cf17abff203aff4107d908c7ad3d0466f39604L245-R245).

I've debugged these changes, [coupled with other fixes](gentoo/gentoo#37479), on [Gentoo Linux](https://www.gentoo.org/) and [related tests passed](https://github.com/user-attachments/files/16135391/opencv-4.10.0.20240708-224733.log.gz) on my laptop with `GeForce GTX 960M`.

Alternative solution:
  - #21462

_Best regards!_

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] `n/a` There is accuracy test, performance test and test data in opencv_extra repository, if applicable
- [ ] `n/a` The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: dnn category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants