Skip to content

Using CuPy with memory pool from multi-threaded application #72

@kmaehashi

Description

@kmaehashi

When using CuPy with memory pool from multi-threaded app, sometimes it fails to launch a kernel (CUDADriverError: CUDA_ERROR_INVALID_CONTEXT: invalid device context). I think this is because CUDA Driver API (to launch kernel) is called without establishing context on the host thread.

Here is a simple code to reproduce:

import chainer  # Enable memory pool; without this line the issue does not reproduce.
import cupy
import threading

def run(size):
    # Uncomment the following line to explicitly establish CUDA context
    # on the current host thread:
    #cupy.cuda.runtime.free(0)
    print(cupy.arange(size, dtype=int))

size = 1024

# Run in main thread; this is OK.
# CuPy mallocs memory via Runtime API, then launches kernel with Driver API.
run(size)

# Run in another thread; this fails.
# The executed thread tries to launch kernel without establishing context,
# as Runtime API is not used (memory block acquired in the previous run is
# reused from pool.)
t = threading.Thread(target=run, args=(size,))
t.start()
t.join()

As commented in the above code, I could workaround the problem by calling harmless Runtime API, e.g., cupy.cuda.runtime.free(0) to explicitly establish context on the host thread.

It would be great if CuPy could take care of such use case, but documenting the behavior may be enough.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions