Skip to content

Failed to create TensorRT engine during conversion process for classification model with TensorRT backend #156

@del-zhenwu

Description

@del-zhenwu

Thanks for your bug report. We appreciate it a lot.

Checklist

Describe the bug

A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
    deploy cmd:
tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0
test_convert.py::TestConvertors::test_cls_convert[config_cpt0] 
---------------------------------------------------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------------------------------------------------
2022-02-14 15:35:06 [   DEBUG] Starting new HTTPS connection (1): download.openmmlab.com:443 (connectionpool.py:943)
2022-02-14 15:35:07 [   DEBUG] https://download.openmmlab.com:443 "GET /mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth HTTP/1.1" 200 44779833 (connectionpool.py:442)
2022-02-14 15:35:13 [   DEBUG] resnet18_b16x8_cifar10_20210528-bd6371c8.pth (util.py:75)
2022-02-14 15:35:13 [    INFO] resnet18_b16x8_cifar10_20210528-bd6371c8.pth (test_convert.py:47)
2022-02-14 15:35:13 [   DEBUG] tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0 (util.py:18)
2022-02-14 15:35:13 [    INFO] cup shell execute cd /opt/mmdeploy && python tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0 with script ['/bin/sh', '/tmp/cup.shell.2f30c837-5702-4168-a1dd-cfeaa72b843fg7tcn_n2'] (oper.py:559)
2022-02-14 15:36:17 [   DEBUG] {'stdout': 'load checkpoint from local path: /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth\n[2022-02-14 15:35:34.614] [mmdeploy] [info] Register \'DirectoryModel\'\n[2022-02-14 15:35:37.119] [mmdeploy] [info] Register \'DirectoryModel\'\nTraceback (most recent call last):\n  File "/opt/mmdeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper\n    result = target(*args, **kwargs)\n  File "/opt/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 72, in onnx2tensorrt\n    device_id=device_id)\n  File "/opt/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 116, in create_trt_engine\n    assert engine is not None, \'Failed to create TensorRT engine\'\nAssertionError: Failed to create TensorRT engine\n', 'stderr': '2022-02-14 15:35:17,173 - mmdeploy - INFO - torch2onnx start.\n2022-02-14 15:35:34,390 - mmdeploy - INFO - torch2onnx success.\n2022-02-14 15:35:34,646 - mmdeploy - INFO - onnx2tensorrt of resnet18_b16x8_cifar10_20210528-bd6371c8.pth/end2end.onnx start.\n2022-02-14 15:35:37,170 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /opt/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so\n[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +226, GPU +0, now: CPU 288, GPU 481 (MiB)\n[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.\n[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 372 MiB, GPU 481 MiB\n[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +68, now: CPU 530, GPU 549 (MiB)\n[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +132, GPU +86, now: CPU 662, GPU 635 (MiB)\n[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0\n[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead\n[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.\n[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 972, GPU 743 (MiB)\n[TensorRT] ERROR: 2: [ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed.)\n2022-02-14:15:36:16,root ERROR    [utils.py:41] Failed to create TensorRT engine\n2022-02-14 15:36:17,154 - mmdeploy - ERROR - onnx2tensorrt of resnet18_b16x8_cifar10_20210528-bd6371c8.pth/end2end.onnx failed.\n', 'returncode': 0} (util.py:21)
2022-02-14 15:36:17 [    INFO] ('load checkpoint from local path: '
 '/opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth\n'
 "[2022-02-14 15:35:34.614] [mmdeploy] [info] Register 'DirectoryModel'\n"
 "[2022-02-14 15:35:37.119] [mmdeploy] [info] Register 'DirectoryModel'\n"
 'Traceback (most recent call last):\n'
 '  File "/opt/mmdeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper\n'
 '    result = target(*args, **kwargs)\n'
 '  File "/opt/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 72, '
 'in onnx2tensorrt\n'
 '    device_id=device_id)\n'
 '  File "/opt/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 116, in '
 'create_trt_engine\n'
 "    assert engine is not None, 'Failed to create TensorRT engine'\n"
 'AssertionError: Failed to create TensorRT engine\n') (test_convert.py:58)

  1. Did you make any modifications on the code or config? Did you understand what you have modified?

Environment

2022-02-14 15:37:08,495 - mmdeploy - INFO - **********Environmental information**********
2022-02-14 15:37:10,850 - mmdeploy - INFO - sys.platform: linux
2022-02-14 15:37:10,851 - mmdeploy - INFO - Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
2022-02-14 15:37:10,851 - mmdeploy - INFO - CUDA available: True
2022-02-14 15:37:10,851 - mmdeploy - INFO - GPU 0: Tesla V100-PCIE-16GB
2022-02-14 15:37:10,851 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-02-14 15:37:10,851 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.89
2022-02-14 15:37:10,851 - mmdeploy - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
2022-02-14 15:37:10,851 - mmdeploy - INFO - PyTorch: 1.9.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

2022-02-14 15:37:10,852 - mmdeploy - INFO - TorchVision: 0.10.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - OpenCV: 4.5.4
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV: 1.4.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV Compiler: GCC 7.5
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMDeployment: 0.1.0+af13086
2022-02-14 15:37:10,852 - mmdeploy - INFO - 

2022-02-14 15:37:10,852 - mmdeploy - INFO - **********Backend information**********
[2022-02-14 15:37:11.261] [mmdeploy] [info] Register 'DirectoryModel'
2022-02-14 15:37:11,297 - mmdeploy - INFO - onnxruntime: 1.10.0 ops_is_avaliable : True
2022-02-14 15:37:11,298 - mmdeploy - INFO - tensorrt: 8.0.3.4 ops_is_avaliable : True
2022-02-14 15:37:11,299 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False
2022-02-14 15:37:11,301 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-02-14 15:37:11,303 - mmdeploy - INFO - openvino_is_avaliable: True
2022-02-14 15:37:11,304 - mmdeploy - INFO - 

2022-02-14 15:37:11,304 - mmdeploy - INFO - **********Codebase information**********
2022-02-14 15:37:11,305 - mmdeploy - INFO - mmcls: 0.19.0
2022-02-14 15:37:11,306 - mmdeploy - INFO - mmdet: 2.20.0
2022-02-14 15:37:11,315 - mmdeploy - INFO - mmedit: 0.12.0
2022-02-14 15:37:11,320 - mmdeploy - INFO - mmocr: 0.4.1
2022-02-14 15:37:11,327 - mmdeploy - INFO - mmseg: 0.21.1

Error traceback

If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions