Conversation
|
Can you say more on which piece of the eager initialization causes issues? |
|
Basically, the following code paradigm no longer works no call into ROCM API before this
fork process using multiprocessing
use ROCM APIThe above code sequence would work for CUDA and opencl because their states are lazily initialized. This could prevent some possible usages. In our case, we need to fork new process to provide isolation of session for RPC. Some deep learning libraries might start by using python's multi-processing to start new processes to load data(which don't need GPU) and then use GPU API, the eager initialization prevent us from doing that. |
|
@bensander I am not sure which piece it is but should it should be part of hsa runtime. You should be able to reproduce this by crafting a program that forks at the beginning of program and then use ROCM API. |
|
@adityaatluri My guess is that such eager initialization was due to use of static global variable or something related to that, instead of using a static function that lazily initializes things in first call. |
|
Got it - we have seen this behavior in a few other situations but seems it is becoming more popular here. IIRC the issue was related to mapping queues into user-space memory which breaks fork. We had a couple solutions, let me follow up and see what we can do. |
|
How can I reproduce the issue? |
cc @adityaatluri
The eager initialization of ROCm(specifically hip_hcc) might cause other problems, basically making it incompatible with multiprocessing in python