Skip to content

[CUDA] RuntimeBuilder.Preprocess() causes subsequent CUDA function calls to fail #598

@lzhangzz

Description

@lzhangzz

What are the problems?(screenshots or detailed error messages)

Observe that, for some models (e.g. YOLOX-s, DBNet-r18, others like ResNet-18 are fine), after creating runtime using RuntimeBuilder, subsequent CUDA function calls (or kernel launches) may fail.

I first getting the CUDA invalid argument error when testing ppl.nn using mmdeploy's test.py, at a point after runtime creation, before inference, when copying data from host to device. Later I met the same problem when testing using mmdeploy's SDK.

After digging around for a while, I found the the simplest way to reproduce the problem using pplnn.py:

insert the following code

import torch
t = torch.Tensor([[1,1],[1,1]]).cuda()

to
https://github.com/openppl-public/ppl.nn/blob/1ae5d95f3ee49b3e582564cc004443931fbe2f7a/tools/pplnn.py#L564
and then

python pplnn.py --use-cuda --onnx-model model.onnx --in-shape 1_3_640_640 --quick-select

got

INFO: PPLNN version: [0.8.0], commit: [02418bb57bef2d888b57d44589a599080cb806d9]
[INFO][2022-07-06 22:23:06.057][utils.cc:456] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2022-07-06 22:23:06.067][opt_graph.cc:324] added 1020 new bridge kernels
[INFO][2022-07-06 22:23:06.223][opt_graph.cc:581] deleted 990 bridge kernels
Traceback (most recent call last):
  File "pplnn.py", line 567, in <module>
    t = torch.Tensor([[1,1],[1,1]]).cuda()
RuntimeError: CUDA error: invalid argument

Which version(commit id or tag) of ppl.nn is used?

02418bb

What's the operating system ppl.nn runs on?

Ubuntu 18.04

What's the compiler and its version?

GCC-7.5, CUDA-11.1

What are the commands used to build ppl.nn?

cmake .. \
    -DCMAKE_INSTALL_PREFIX=/workspace/ppl.nn/install \
    -DPPLNN_ENABLE_PYTHON_API=ON \
    -DPPLNN_USE_X86_64=ON \
    -DPPLNN_USE_CUDA=ON \
    -DPPL_USE_X86_AVX512=OFF \
    -DPPLNN_ENABLE_CUDA_JIT=OFF \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CUDA_ARCHITECTURES=75

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions