🔎 Search before asking
🐛 Bug (问题描述)
用onnx模型在GPU上推理很奇怪,cpu推理多次不会有问题,但是用GPU推理的话,第一次推理正常,但第二次以及其他次模型输出的值特别大也特别小,正负几千万亿

上下分别是第一次和第二次的模型直接输出的对比,第三次第四次第五次也都是和第二次一模一样的离谱结果。
🏃♂️ Environment (运行环境)
OS windows11
Environment conda
python 3.10.14
paddlepaddle-gpu 3.0.0rc1
onnxruntime-gpu 1.21.1
nvidia-cuda-runtime-cu12 12.3.101
nvidia-cudnn-cu12 9.1.1.17
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
模型加载与推理的类
class OrtBase:
def __init__(self, onnx_path):
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = int(os.environ.get("CPU_CORE_NUM", 2))
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
providers = [
("CUDAExecutionProvider", {
"device_id": 0,
"arena_extend_strategy": "kSameAsRequested",
"gpu_mem_limit": 5 * 1024 * 1024 * 1024,
"cudnn_conv_algo_search": "HEURISTIC",
"do_copy_in_default_stream": True,
}),
"CPUExecutionProvider"
]
# providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
# self.session = ort.InferenceSession(path_or_bytes=onnx_path)
# self.session = ort.InferenceSession(path_or_bytes=onnx_path, providers=providers)
# self.session = ort.InferenceSession(path_or_bytes=onnx_path, sess_options=sess_options)
self.session = ort.InferenceSession(path_or_bytes=onnx_path, sess_options=sess_options, providers=providers)
self.input_names = [node.name for node in self.session.get_inputs()]
self.output_name = [node.name for node in self.session.get_outputs()]
print("input_name:{}".format(self.input_names))
print("output_name:{}".format(self.output_name))
async def forward(self, inputs, *args, **kwargs):
"""
image_tensor = image.transpose(2, 0, 1)
image_tensor = image_tensor[np.newaxis, :]
onnx_session.run([output_name], {input_name: x})
:param inputs:
:return:
"""
if len(self.input_names) == 1:
input_name = self.input_names[0]
input_feed = {input_name: inputs}
else:
assert len(self.input_names) == len(inputs)
input_feed = {input_name: input_data for input_name, input_data in zip(self.input_names, inputs)}
predict = self.session.run(self.output_name, input_feed=input_feed)
return predict
推理
async def single_predict(self, img, *args, **kwargs):
data = {'image': img[0]}
inputs = transform(data, self.preprocess_op)
image = np.array(inputs)
outputs = await self.model.forward(image)
post_result = self.postprocess_op(outputs[0])
return post_result
🔎 Search before asking
🐛 Bug (问题描述)
用onnx模型在GPU上推理很奇怪,cpu推理多次不会有问题,但是用GPU推理的话,第一次推理正常,但第二次以及其他次模型输出的值特别大也特别小,正负几千万亿
上下分别是第一次和第二次的模型直接输出的对比,第三次第四次第五次也都是和第二次一模一样的离谱结果。
🏃♂️ Environment (运行环境)
OS windows11
Environment conda
python 3.10.14
paddlepaddle-gpu 3.0.0rc1
onnxruntime-gpu 1.21.1
nvidia-cuda-runtime-cu12 12.3.101
nvidia-cudnn-cu12 9.1.1.17
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
模型加载与推理的类
推理