-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Concat operator fusion leads to inconsistent inference results between CPU and GPU. #24721
Description
System Information
OpenCV version: 4.8.0 / newest 4.x
Operating System / Platform: Ubuntu 20.04
CUDA:11.8
The graphics card model : NVIDIA GeForce GTX 3090
Detailed description
I found the issue is caused by this line, when I commented it, my programe got consistent results.
During discovering above solution, I observed some facts. As shown below, after the fusion of the concat operator, the results of the Mul and Sigmoid operators are stored in contiguous memory. However, in some case, Mul cannot use the CUDA backend, for example, Mul use defaults backend, the result of Mul is stored in host memory, while the result of Sigmoid is stored in device memory. Subsequent operations will only use either the Mul or Sigmoid result. Because Mul and Sigmoid indirectly call the function setHostDirty or setDeviceDirty after the operator computations are completed, allowing only one function to take effect finally on contiguous memory.
I think my solution is only temporary. Is there a way to fix it permanently?
Steps to reproduce
export onnx
import torch
import torch.nn as nn
class MulMat(nn.Module):
def __init__(self):
super(MulMat, self).__init__()
self.yy = torch.randn(1, 8400)
self.xx = torch.randn(1, 4, 8400)
self.xxx = torch.randn(1, 4, 8400)
def forward(self, x):
x1= x + self.xx
x2= x + self.xxx
x3 = x2 * self.yy
x4 = torch.sigmoid(x1)
x5 = torch.cat((x3, x4), 1)
return x5
m = MulMat()
torch.onnx.export(m,
(torch.randn(1, 4, 8400)),
'mulmat.onnx',
export_params=True,
opset_version=11,
input_names = ['input0'],
output_names=['output0'],
)opencv run cpu and cuda backends
import cv2.dnn
import numpy as np
def main(onnx_model):
# Load the ONNX model
# CPU
model: cv2.dnn.Net = cv2.dnn.readNetFromONNX(onnx_model)
# CUDA
model_cuda: cv2.dnn.Net = cv2.dnn.readNetFromONNX(onnx_model)
model_cuda.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
model_cuda.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
np.random.seed(13)
blob = np.random.randint(0, 255, (1, 4, 8400), dtype=np.uint8) * 1.0 / 255
# print(blob.shape)
model.setInput(blob)
model_cuda.setInput(blob)
# Perform inference
outputs = model.forward()
outputs_cuda = model_cuda.forward()
print("The results of CPU")
print(outputs)
print("\nThe results of CUDA")
print(outputs_cuda)
r = np.allclose(outputs[0], outputs_cuda[0], atol=1e-4)
if r:
print("CPU and CUDA is same")
else:
print("CPU and CUDA is not same")
if __name__ == '__main__':
main('mulmat.onnx')Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)
