Matmul layer craches with CUDA backend

### System Information

OpenCV version: 5.x
Operating System / Platform: Ubuntu 20.04
Compiler & compiler version: GCC  9.4.0

### Detailed description

When the input to MatMul is of shape `[batch_size, 1, input_dim]` and weight matrix is of shape `[input_dim, hidden_dim]` and `batch_size` is big (128 in this case) the inference fails with following error. But if one swaps `batch_size` with ineffective dimention (1) of input, such that input shape is `[1, batch_size, input_dim]` then the multiplicatio works fine. The number of elements is the same in both cases (size occupied should be the same theoretically) but `malloc` fails in the first case. 

I have attached the ONNX model and reproducer below. 

```
[ INFO:0@0.015] global onnx_importer.cpp:821 populateNet DNN/ONNX: loading ONNX v9 model produced by ''. Number of nodes = 2, initializers = 0, inputs = 1, outputs = 1
[ INFO:0@0.015] global onnx_importer.cpp:714 parseOperatorSet DNN/ONNX: ONNX opset version = 19
[ INFO:0@0.016] global onnx_importer.cpp:992 handleNode DNN/ONNX: processing node with 0 inputs and 1 outputs: [Constant]:(onnx_node!n0) from domain='ai.onnx'
[ INFO:0@0.019] global onnx_importer.cpp:992 handleNode DNN/ONNX: processing node with 2 inputs and 1 outputs: [MatMul]:(onnx_node!n1) from domain='ai.onnx'
layer name: return_val
preferableBackend is CUDA
[ INFO:0@0.164] global op_cuda.cpp:81 initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__
malloc(): corrupted top size
[1]    1574745 abort (core dumped) 
```

### Steps to reproduce

```python

import cv2 as cv
import numpy as np

if __name__ == "__main__":

    net = cv.dnn.readNet("./matmul_cuda.onnx")

    net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)

    batch_size = 128
    input_size = 384

    inp = np.ones((batch_size, 1, input_size), dtype=np.float32)

    net.setInput(inp)
    out = net.forward()
    print(out.shape)

    layerNames = net.getLayerNames()
    for layer in layerNames:
        l = net.getLayer(layer)
        print(l.preferableTarget)
```

### Issue submission checklist

- [X] I report the issue, it's not a question
- [X] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- [X] I updated to the latest OpenCV version and the issue is still there
- [X] There is reproducer code and related data files (videos, images, onnx, etc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Matmul layer craches with CUDA backend #26021

System Information

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Matmul layer craches with CUDA backend #26021

Description

System Information

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions