1

I have a machine with two different GPUs (one RTX and one Titan V) and often it fails to run tasks. This behaviour is mostly observed in GPU with id=1.

The same task can be run successfully in a different machine or GPU with id=0.

The precise stack is the following:

  File "cupy/core/core.pyx", line 1689, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/core.pyx", line 3598, in cupy.core.core._scatter_op
  File "cupy/core/_kernel.pyx", line 828, in cupy.core._kernel.ufunc.__call__
  File "cupy/util.pyx", line 48, in cupy.util.memoize.decorator.ret
  File "cupy/core/_kernel.pyx", line 617, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 51, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 164, in cupy.core.core.compile_with_cache
  File "[miniconda]/envs/[env_name]/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 161, in compile_with_cache
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 181, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 185, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

My setup is the following:

chainer                   5.2.0                     <pip>
chainercv                 0.12.0                    <pip>
cupy-cuda100              5.2.0                     <pip>

The same problem shows up in chainer 5.3 (i created a new conda environment from scratch).

I believe that this is somehow connected with multi-threading, but I could not find how to turn it off in cupy or how to avoid the problem altogether.

Some potentially irrelevant information: This is a rather stochastic process. In GPU id=1, eight times out of ten it fails to run with the aforementioned error.

Any ideas?

2
  • Does GPU id=1 points to RTX? Could you try again using v5.4.0? Commented Apr 5, 2019 at 4:47
  • No, GPU id=1, points to Titan X. Indeed, the problem is solved with 5.4. Could you please describe what was the issue for future reference? Commented Apr 13, 2019 at 7:02

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.