Skip to content

test_pt_onnx_trt fails after removing pycuda.autoinit in #49980 #51105

@ptrblck

Description

@ptrblck

🐛 Bug

In #49980 "unused" imports were removed, which also killed import pycuda.autoinit.
If I understand the docs correctly, the import itself is setting up the context and thus this import might have been detected as unused by the used tool.

Currently the test breaks with:

======================================================================
ERROR: test_alexnet (__main__.Test_PT_ONNX_TRT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 112, in test_alexnet
    self._test_model("alexnet", (3, 227, 227))
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 91, in _test_model
    h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 30, in allocate_buffers
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)),
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?

======================================================================
ERROR: test_googlenet (__main__.Test_PT_ONNX_TRT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 138, in test_googlenet
    self._test_model("googlenet")
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 91, in _test_model
    h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
  File "/opt/pytorch/pytorch/caffe2/python/trt/test_pt_onnx_trt.py", line 30, in allocate_buffers
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)),
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?

[...]

for all models.

To Reproduce

# install pycuda for the test
pip install pycuda

# building OSS release of ONNX parser on top of just installed
git clone https://github.com/onnx/onnx-tensorrt.git
cd onnx-tensorrt/
git submodule update --init --recursive
mkdir build
cd build
cmake ..
CPLUS_INCLUDE_PATH=/usr/local/cuda/include make -j$(nproc)
cp libnvonnxparser.so* /usr/lib/x86_64-linux-gnu
cd ../../
rm -rf onnx-tensorrt/

python caffe2/python/trt/test_pt_onnx_trt.py

Environment

PyTorch version: 1.8.0a0+52ea372
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.19.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
GPU 4: A100-SXM4-40GB
GPU 5: A100-SXM4-40GB
GPU 6: A100-SXM4-40GB
GPU 7: A100-SXM4-40GB

Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-transformers==1.1.0
[pip3] torch==1.8.0a0+52ea372
[pip3] torchtext==0.9.0a0
[pip3] torchvision==0.9.0a0
[conda] magma-cuda110             2.5.2                         5    local
[conda] mkl                       2019.4                      243
[conda] mkl-include               2019.4                      243
[conda] nomkl                     3.0                           0
[conda] numpy                     1.19.2           py38h6163131_0
[conda] numpy-base                1.19.2           py38h75fe3a5_0
[conda] pytorch-transformers      1.1.0                    pypi_0    pypi
[conda] torch                     1.8.0a0+52ea372          pypi_0    pypi
[conda] torchtext                 0.9.0a0                  pypi_0    pypi
[conda] torchvision               0.9.0a0                  pypi_0    pypi

CC @r-barnes

I'll create a PR to add this import back, as it fixes the issue locally, but let me know if this is not the right approach (I'm not deeply familiar with pycuda).

cc @houseroad @spandantiwari @lara-hdr @BowenBao @neginraoof

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: onnxRelated to torch.onnxmodule: testingIssues related to the torch.testing module (not tests)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions