-
Notifications
You must be signed in to change notification settings - Fork 707
Failed to create TensorRT engine during conversion process for classification model with TensorRT backend #156
Copy link
Copy link
Closed
Description
Thanks for your bug report. We appreciate it a lot.
Checklist
Describe the bug
A clear and concise description of what the bug is.
Reproduction
- What command or script did you run?
deploy cmd:
tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0
test_convert.py::TestConvertors::test_cls_convert[config_cpt0]
---------------------------------------------------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------------------------------------------------
2022-02-14 15:35:06 [ DEBUG] Starting new HTTPS connection (1): download.openmmlab.com:443 (connectionpool.py:943)
2022-02-14 15:35:07 [ DEBUG] https://download.openmmlab.com:443 "GET /mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth HTTP/1.1" 200 44779833 (connectionpool.py:442)
2022-02-14 15:35:13 [ DEBUG] resnet18_b16x8_cifar10_20210528-bd6371c8.pth (util.py:75)
2022-02-14 15:35:13 [ INFO] resnet18_b16x8_cifar10_20210528-bd6371c8.pth (test_convert.py:47)
2022-02-14 15:35:13 [ DEBUG] tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0 (util.py:18)
2022-02-14 15:35:13 [ INFO] cup shell execute cd /opt/mmdeploy && python tools/deploy.py configs/mmcls/classification_tensorrt-fp16_dynamic-224x224-224x224.py /tmp/mmclassification//configs/resnet/resnet18_8xb16_cifar10.py /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth /tmp/mmclassification/demo/demo.JPEG --work-dir resnet18_b16x8_cifar10_20210528-bd6371c8.pth --show --device cuda:0 with script ['/bin/sh', '/tmp/cup.shell.2f30c837-5702-4168-a1dd-cfeaa72b843fg7tcn_n2'] (oper.py:559)
2022-02-14 15:36:17 [ DEBUG] {'stdout': 'load checkpoint from local path: /opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth\n[2022-02-14 15:35:34.614] [mmdeploy] [info] Register \'DirectoryModel\'\n[2022-02-14 15:35:37.119] [mmdeploy] [info] Register \'DirectoryModel\'\nTraceback (most recent call last):\n File "/opt/mmdeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper\n result = target(*args, **kwargs)\n File "/opt/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 72, in onnx2tensorrt\n device_id=device_id)\n File "/opt/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 116, in create_trt_engine\n assert engine is not None, \'Failed to create TensorRT engine\'\nAssertionError: Failed to create TensorRT engine\n', 'stderr': '2022-02-14 15:35:17,173 - mmdeploy - INFO - torch2onnx start.\n2022-02-14 15:35:34,390 - mmdeploy - INFO - torch2onnx success.\n2022-02-14 15:35:34,646 - mmdeploy - INFO - onnx2tensorrt of resnet18_b16x8_cifar10_20210528-bd6371c8.pth/end2end.onnx start.\n2022-02-14 15:35:37,170 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /opt/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so\n[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +226, GPU +0, now: CPU 288, GPU 481 (MiB)\n[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.\n[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 372 MiB, GPU 481 MiB\n[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +68, now: CPU 530, GPU 549 (MiB)\n[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +132, GPU +86, now: CPU 662, GPU 635 (MiB)\n[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0\n[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead\n[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.\n[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 972, GPU 743 (MiB)\n[TensorRT] ERROR: 2: [ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed.)\n2022-02-14:15:36:16,root ERROR [utils.py:41] Failed to create TensorRT engine\n2022-02-14 15:36:17,154 - mmdeploy - ERROR - onnx2tensorrt of resnet18_b16x8_cifar10_20210528-bd6371c8.pth/end2end.onnx failed.\n', 'returncode': 0} (util.py:21)
2022-02-14 15:36:17 [ INFO] ('load checkpoint from local path: '
'/opt/mmdeploy/openmmlab-ci/e2e/mmdeploy/resnet18_b16x8_cifar10_20210528-bd6371c8.pth\n'
"[2022-02-14 15:35:34.614] [mmdeploy] [info] Register 'DirectoryModel'\n"
"[2022-02-14 15:35:37.119] [mmdeploy] [info] Register 'DirectoryModel'\n"
'Traceback (most recent call last):\n'
' File "/opt/mmdeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper\n'
' result = target(*args, **kwargs)\n'
' File "/opt/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 72, '
'in onnx2tensorrt\n'
' device_id=device_id)\n'
' File "/opt/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 116, in '
'create_trt_engine\n'
" assert engine is not None, 'Failed to create TensorRT engine'\n"
'AssertionError: Failed to create TensorRT engine\n') (test_convert.py:58)
- Did you make any modifications on the code or config? Did you understand what you have modified?
Environment
2022-02-14 15:37:08,495 - mmdeploy - INFO - **********Environmental information**********
2022-02-14 15:37:10,850 - mmdeploy - INFO - sys.platform: linux
2022-02-14 15:37:10,851 - mmdeploy - INFO - Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
2022-02-14 15:37:10,851 - mmdeploy - INFO - CUDA available: True
2022-02-14 15:37:10,851 - mmdeploy - INFO - GPU 0: Tesla V100-PCIE-16GB
2022-02-14 15:37:10,851 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-02-14 15:37:10,851 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.89
2022-02-14 15:37:10,851 - mmdeploy - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
2022-02-14 15:37:10,851 - mmdeploy - INFO - PyTorch: 1.9.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
2022-02-14 15:37:10,852 - mmdeploy - INFO - TorchVision: 0.10.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - OpenCV: 4.5.4
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV: 1.4.0
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV Compiler: GCC 7.5
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2
2022-02-14 15:37:10,852 - mmdeploy - INFO - MMDeployment: 0.1.0+af13086
2022-02-14 15:37:10,852 - mmdeploy - INFO -
2022-02-14 15:37:10,852 - mmdeploy - INFO - **********Backend information**********
[2022-02-14 15:37:11.261] [mmdeploy] [info] Register 'DirectoryModel'
2022-02-14 15:37:11,297 - mmdeploy - INFO - onnxruntime: 1.10.0 ops_is_avaliable : True
2022-02-14 15:37:11,298 - mmdeploy - INFO - tensorrt: 8.0.3.4 ops_is_avaliable : True
2022-02-14 15:37:11,299 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False
2022-02-14 15:37:11,301 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-02-14 15:37:11,303 - mmdeploy - INFO - openvino_is_avaliable: True
2022-02-14 15:37:11,304 - mmdeploy - INFO -
2022-02-14 15:37:11,304 - mmdeploy - INFO - **********Codebase information**********
2022-02-14 15:37:11,305 - mmdeploy - INFO - mmcls: 0.19.0
2022-02-14 15:37:11,306 - mmdeploy - INFO - mmdet: 2.20.0
2022-02-14 15:37:11,315 - mmdeploy - INFO - mmedit: 0.12.0
2022-02-14 15:37:11,320 - mmdeploy - INFO - mmocr: 0.4.1
2022-02-14 15:37:11,327 - mmdeploy - INFO - mmseg: 0.21.1
Error traceback
If applicable, paste the error trackback here.
A placeholder for trackback.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels