Chainer - Python - CUDA_ERROR_INVALID_SOURCE

Ask Question

Asked 7 years ago

Modified 7 years ago

Viewed 1k times

I have a machine with two different GPUs (one RTX and one Titan V) and often it fails to run tasks. This behaviour is mostly observed in GPU with id=1.

The same task can be run successfully in a different machine or GPU with id=0.

The precise stack is the following:

  File "cupy/core/core.pyx", line 1689, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/core.pyx", line 3598, in cupy.core.core._scatter_op
  File "cupy/core/_kernel.pyx", line 828, in cupy.core._kernel.ufunc.__call__
  File "cupy/util.pyx", line 48, in cupy.util.memoize.decorator.ret
  File "cupy/core/_kernel.pyx", line 617, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 51, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 164, in cupy.core.core.compile_with_cache
  File "[miniconda]/envs/[env_name]/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 161, in compile_with_cache
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 181, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 185, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

My setup is the following:

chainer                   5.2.0                     <pip>
chainercv                 0.12.0                    <pip>
cupy-cuda100              5.2.0                     <pip>

The same problem shows up in chainer 5.3 (i created a new conda environment from scratch).

I believe that this is somehow connected with multi-threading, but I could not find how to turn it off in cupy or how to avoid the problem altogether.

Some potentially irrelevant information: This is a rather stochastic process. In GPU id=1, eight times out of ten it fails to run with the aforementioned error.

Any ideas?

edited Mar 31, 2019 at 12:40

talonmies

72.8k35 gold badges205 silver badges297 bronze badges

asked Mar 31, 2019 at 12:26

GrigorisG

1,0089 silver badges14 bronze badges

Does GPU id=1 points to RTX? Could you try again using v5.4.0?

kmaehashi
– kmaehashi

2019-04-05 04:47:53 +00:00
Commented Apr 5, 2019 at 4:47
No, GPU id=1, points to Titan X. Indeed, the problem is solved with 5.4. Could you please describe what was the issue for future reference?

GrigorisG
– GrigorisG

2019-04-13 07:02:00 +00:00
Commented Apr 13, 2019 at 7:02

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Chainer - Python - CUDA_ERROR_INVALID_SOURCE

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest