Skip to content

CUDA: Handle fusion of conv+eltwise in case of multi-output node (i.e. Split)#27326

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
dkurt:handle_multi_output_eltwise_fusion
May 19, 2025
Merged

CUDA: Handle fusion of conv+eltwise in case of multi-output node (i.e. Split)#27326
asmorkalov merged 1 commit intoopencv:4.xfrom
dkurt:handle_multi_output_eltwise_fusion

Conversation

@dkurt
Copy link
Copy Markdown
Member

@dkurt dkurt commented May 17, 2025

Pull Request Readiness Checklist

Enables YOLO11n with CUDA backend

resolves #26566

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@dkurt dkurt changed the title Handle fusion of conv+eltwise in case of multi-output node (i.e. Split) CUDA: Handle fusion of conv+eltwise in case of multi-output node (i.e. Split) May 17, 2025
@dkurt
Copy link
Copy Markdown
Member Author

dkurt commented May 17, 2025

import numpy as np
import cv2 as cv

net = cv.dnn.readNet("yolo11n.onnx")

inp = np.random.rand(1, 3, 640, 640).astype(np.float32)
net.setInput(inp)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)
ref = net.forward()

net = cv.dnn.readNet("yolo11n.onnx")

net.setInput(inp)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
out = net.forward()

print("ref shape", ref.shape)
print("out shape", out.shape)
print("diff", np.max(np.abs(ref - out)))
ref shape (1, 84, 8400)
out shape (1, 84, 8400)
diff 0.0029296875

@dkurt dkurt added this to the 4.12.0 milestone May 17, 2025
@dkurt dkurt requested a review from asmorkalov May 17, 2025 11:08
@asmorkalov asmorkalov self-assigned this May 19, 2025
@asmorkalov asmorkalov merged commit 9d2d927 into opencv:4.x May 19, 2025
103 of 109 checks passed
@dkurt dkurt deleted the handle_multi_output_eltwise_fusion branch May 19, 2025 07:48
@asmorkalov asmorkalov mentioned this pull request May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

YOLO11 models do not operate with the CUDA and CUDA FP16 targets (OpenCV 4.10.0)

2 participants