Using CuPy with memory pool from multi-threaded application

When using CuPy with memory pool from multi-threaded app, sometimes it fails to launch a kernel (``CUDADriverError: CUDA_ERROR_INVALID_CONTEXT: invalid device context``).  I think this is because CUDA Driver API (to launch kernel) is called without establishing context on the host thread.

Here is a simple code to reproduce:

```py
import chainer  # Enable memory pool; without this line the issue does not reproduce.
import cupy
import threading

def run(size):
    # Uncomment the following line to explicitly establish CUDA context
    # on the current host thread:
    #cupy.cuda.runtime.free(0)
    print(cupy.arange(size, dtype=int))

size = 1024

# Run in main thread; this is OK.
# CuPy mallocs memory via Runtime API, then launches kernel with Driver API.
run(size)

# Run in another thread; this fails.
# The executed thread tries to launch kernel without establishing context,
# as Runtime API is not used (memory block acquired in the previous run is
# reused from pool.)
t = threading.Thread(target=run, args=(size,))
t.start()
t.join()
```

As commented in the above code, I could workaround the problem by calling harmless Runtime API, e.g., ``cupy.cuda.runtime.free(0)`` to explicitly establish context on the host thread.

It would be great if CuPy could take care of such use case, but documenting the behavior may be enough.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using CuPy with memory pool from multi-threaded application #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Using CuPy with memory pool from multi-threaded application #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions