Skip to content

TensorRT int8 model test failed when batch size>1 #795

@DarrenCheung

Description

@DarrenCheung

Network: yolov3
Config file: configs/yolo/yolov3_d53_mstrain-416_273e_coco.py

We use "configs/mmdet/detection/detection_tensorrt-int8_dynamic-64x64-608x608.py" within mmdeploy for TRT int8 model quant, no errors occurs until when run ./mmdeploy/tools/test.py for model test.

Errors:

2022-07-22 07:34:54,879 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /opt/conda/lib/python3.8/site-packages/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
2022-07-22 07:34:54,879 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /opt/conda/lib/python3.8/site-packages/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
NOTE! Installing ujson may make loading annotations faster.
loading annotations into memory...
Done (t=0.61s)
creating index...
index created!
[                                                  ] 0/5000, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/test.py", line 138, in <module>
    main()
  File "tools/test.py", line 130, in main
    outputs = task_processor.single_gpu_test(model, data_loader, args.show,
  File "/opt/conda/lib/python3.8/site-packages/mmdeploy/codebase/base/task.py", line 138, in single_gpu_test
    return self.codebase_class.single_gpu_test(model, data_loader, show,
  File "/opt/conda/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/mmdetection.py", line 142, in single_gpu_test
    outputs = single_gpu_test(model, data_loader, show, out_dir, **kwargs)
  File "/workspace/algorithm/tensorrt_infer/yolov3/mmdetection/mmdet/apis/test.py", line 26, in single_gpu_test
    for i, data in enumerate(data_loader):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 431, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/collate.py", line 82, in collate
    return {
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/collate.py", line 83, in <dictcomp>
    key: collate([d[key] for d in batch], samples_per_gpu)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/collate.py", line 80, in collate
    return [collate(samples, samples_per_gpu) for samples in transposed]
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/collate.py", line 80, in <listcomp>
    return [collate(samples, samples_per_gpu) for samples in transposed]
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/collate.py", line 87, in collate
    return default_collate(batch)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 320, 416] at entry 0 and [3, 128, 416] at entry 1

For mutil-batch input, we have modified "./mmdeploy/configs/mmdet/detection/detection_tensorrt-int8_dynamic-64x64-608x608.py"

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[2, 3, 64, 64],
                    opt_shape=[2, 3, 608, 608],
                    max_shape=[2, 3, 608, 608])))
    ])

Environment:

2022-07-22 07:36:54,164 - mmdeploy - INFO - 

2022-07-22 07:36:54,164 - mmdeploy - INFO - **********Environmental information**********
2022-07-22 07:36:54,422 - mmdeploy - INFO - sys.platform: linux
2022-07-22 07:36:54,423 - mmdeploy - INFO - Python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0]
2022-07-22 07:36:54,423 - mmdeploy - INFO - CUDA available: True
2022-07-22 07:36:54,423 - mmdeploy - INFO - GPU 0,1: A10
2022-07-22 07:36:54,423 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-07-22 07:36:54,423 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.4, V11.4.120
2022-07-22 07:36:54,423 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
2022-07-22 07:36:54,423 - mmdeploy - INFO - PyTorch: 1.10.0a0+3fd9dcf
2022-07-22 07:36:54,423 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.4
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.2.4
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.2.4, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

2022-07-22 07:36:54,423 - mmdeploy - INFO - TorchVision: 0.11.0a0
2022-07-22 07:36:54,423 - mmdeploy - INFO - OpenCV: 3.4.11
2022-07-22 07:36:54,423 - mmdeploy - INFO - MMCV: 1.6.0
2022-07-22 07:36:54,423 - mmdeploy - INFO - MMCV Compiler: GCC 9.3
2022-07-22 07:36:54,424 - mmdeploy - INFO - MMCV CUDA Compiler: 11.4
2022-07-22 07:36:54,424 - mmdeploy - INFO - MMDeploy: 0.6.0+6e58c3c
2022-07-22 07:36:54,424 - mmdeploy - INFO - 

2022-07-22 07:36:54,424 - mmdeploy - INFO - **********Backend information**********
2022-07-22 07:36:54,802 - mmdeploy - INFO - onnxruntime: 1.11.1 ops_is_avaliable : False
2022-07-22 07:36:54,824 - mmdeploy - INFO - tensorrt: 8.0.3.0   ops_is_avaliable : True
2022-07-22 07:36:54,837 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2022-07-22 07:36:54,838 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-07-22 07:36:54,838 - mmdeploy - INFO - openvino_is_avaliable: False
2022-07-22 07:36:54,838 - mmdeploy - INFO - 

2022-07-22 07:36:54,839 - mmdeploy - INFO - **********Codebase information**********
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmdet:      2.25.0
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmseg:      None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmcls:      None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmocr:      None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmedit:     None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmdet3d:    None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmpose:     None
2022-07-22 07:36:54,842 - mmdeploy - INFO - mmrotate:   None

How to solve this problem?
Thanks

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions