-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Closed
Labels
bugcategory: dnncategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib
Milestone
Description
System Information
OpenCV version: 5.x
Operating System / Platform: Ubuntu 20.04
Compiler & compiler version: GCC 9.4.0
Detailed description
When the input to MatMul is of shape [batch_size, 1, input_dim] and weight matrix is of shape [input_dim, hidden_dim] and batch_size is big (128 in this case) the inference fails with following error. But if one swaps batch_size with ineffective dimention (1) of input, such that input shape is [1, batch_size, input_dim] then the multiplicatio works fine. The number of elements is the same in both cases (size occupied should be the same theoretically) but malloc fails in the first case.
I have attached the ONNX model and reproducer below.
[ INFO:0@0.015] global onnx_importer.cpp:821 populateNet DNN/ONNX: loading ONNX v9 model produced by ''. Number of nodes = 2, initializers = 0, inputs = 1, outputs = 1
[ INFO:0@0.015] global onnx_importer.cpp:714 parseOperatorSet DNN/ONNX: ONNX opset version = 19
[ INFO:0@0.016] global onnx_importer.cpp:992 handleNode DNN/ONNX: processing node with 0 inputs and 1 outputs: [Constant]:(onnx_node!n0) from domain='ai.onnx'
[ INFO:0@0.019] global onnx_importer.cpp:992 handleNode DNN/ONNX: processing node with 2 inputs and 1 outputs: [MatMul]:(onnx_node!n1) from domain='ai.onnx'
layer name: return_val
preferableBackend is CUDA
[ INFO:0@0.164] global op_cuda.cpp:81 initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__
malloc(): corrupted top size
[1] 1574745 abort (core dumped)
Steps to reproduce
import cv2 as cv
import numpy as np
if __name__ == "__main__":
net = cv.dnn.readNet("./matmul_cuda.onnx")
net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
batch_size = 128
input_size = 384
inp = np.ones((batch_size, 1, input_size), dtype=np.float32)
net.setInput(inp)
out = net.forward()
print(out.shape)
layerNames = net.getLayerNames()
for layer in layerNames:
l = net.getLayer(layer)
print(l.preferableTarget)Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugcategory: dnncategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib