YOLOv8 bounding box dimensions are 0 when running detection on an ONNX model with OpenCV CUDA build

### System Information

**Training (python)**
python 3.8
ultralytics 8.0.128
opencv-python 4.8.0.74
torch 1.8.0+cu111
onnx 1.14.0
onnxsim 0.4.33

**Inference (c++)**
opencv 4.8.0
cuda 11.2.2
cudnn 8.1.1.33

### Detailed description

I'm loading a simple yolov8 model exported as onnx for object detection. I have tested and confirmed that both the model and code are working correctly when opencv is built without cuda enabled, however, when running inference with a cuda build, interestingly the resulting bounding box coordinates and size are always 0, yet the score is correct. Also, I'm not getting any errors here.
Just to note, I've used the same cuda build to successfully run many other models (yolov5, yolov7 included), perhaps my issue lies somewhere in yolov8 incompatibility with these cuda or cudnn versions, which I can't upgrade that simply due to many different reasons.

### Steps to reproduce

Here's how I've done it, the model I've tested was simply made as such:
```python
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
model.train(data='coco128.yaml', epochs=3, device=0, workers=1)
model.export(format='onnx', simplify=True, opset=12)
```
And the examplified inference part is as follows:
```cpp
// setup
cv::Mat frame; // net input image
cv::Size frameShape; // net input shape
std::string modelFile; // exported onnx model

cv::dnn::Net net = cv::dnn::readNet(modelFile);
net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
cv::dnn::Net* net_p = new cv::dnn::Net(net);

// inference
cv::Mat blob = cv::dnn::blobFromImage(frame, 1. / 255., frameShape, cv::Scalar(), true, false, CV_32F);
net_p->setInput(blob);
std::vector<cv::Mat> prob;
net_p->forward(prob, net_p->getUnconnectedOutLayersNames());

int prob_rows = prob[0].size[2];
int prob_cols = prob[0].size[1];
prob[0] = prob[0].reshape(1, prob_cols);
cv::transpose(prob[0], prob[0]);

float ratioh = (float)frame.rows / frameShape.height;
float ratiow = (float)frame.cols / frameShape.width;
float* data_p = (float*)prob[0].data;

std::vector<float> confidences;
std::vector<cv::Rect> boxes;
std::vector<int> classIds;
float confThreshold = 0.25f;
float nmsThreshold = 0.45f;

for (int n = 0; n < prob_rows; n++)
{
	cv::Mat scores = prob[0].row(n).colRange(4, prob_cols); // the scores are ok
	cv::Point classIdPoint;
	double maxClassScore;
	minMaxLoc(scores, 0, &maxClassScore, 0, &classIdPoint);
	if (maxClassScore >= confThreshold)
	{
		// the issue is observed here with a cuda build
		float cx = data_p[0] * ratiow; // data_p[0] = 0
		float cy = data_p[1] * ratioh; // data_p[1] = 0
		float w = data_p[2] * ratiow; // data_p[2] = 0
		float h = data_p[3] * ratioh; // data_p[3] = 0

		int left = int(cx - 0.5f * w);
		int top = int(cy - 0.5f * h);

		confidences.push_back((float)maxClassScore);
		boxes.push_back(cv::Rect(left, top, (int)(w), (int)(h)));
		classIds.push_back(classIdPoint.x);
	}
	data_p += prob_cols;
}

std::vector<int> indices;
cv::dnn::NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
// visualize detection boxes here

// cleanup
delete net_p;
```

### Issue submission checklist

- [X] I report the issue, it's not a question
- [X] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- [X] I updated to the latest OpenCV version and the issue is still there
- [X] There is reproducer code and related data files (videos, images, onnx, etc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YOLOv8 bounding box dimensions are 0 when running detection on an ONNX model with OpenCV CUDA build #23977

System Information

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

YOLOv8 bounding box dimensions are 0 when running detection on an ONNX model with OpenCV CUDA build #23977

Description

System Information

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions