[APP] ROCM RPC by tqchen · Pull Request #1155 · apache/tvm

tqchen · 2018-05-11T03:09:11Z

cc @adityaatluri

The eager initialization of ROCm(specifically hip_hcc) might cause other problems, basically making it incompatible with multiprocessing in python

bensander · 2018-05-11T03:46:04Z

Can you say more on which piece of the eager initialization causes issues?

tqchen · 2018-05-11T03:50:29Z

Basically, the following code paradigm no longer works

no call into ROCM API before this
fork process using multiprocessing
use ROCM API

The above code sequence would work for CUDA and opencl because their states are lazily initialized. This could prevent some possible usages. In our case, we need to fork new process to provide isolation of session for RPC. Some deep learning libraries might start by using python's multi-processing to start new processes to load data(which don't need GPU) and then use GPU API, the eager initialization prevent us from doing that.

tqchen · 2018-05-11T03:52:03Z

@bensander I am not sure which piece it is but should it should be part of hsa runtime. You should be able to reproduce this by crafting a program that forks at the beginning of program and then use ROCM API.

tqchen · 2018-05-11T03:57:37Z

@adityaatluri My guess is that such eager initialization was due to use of static global variable or something related to that, instead of using a static function that lazily initializes things in first call.
ref https://isocpp.org/wiki/faq/ctors#static-init-order

tqchen · 2018-05-11T04:03:37Z

https://github.com/dmlc/tvm/blob/master/apps/rocm_rpc/README.md

bensander · 2018-05-11T05:33:17Z

Got it - we have seen this behavior in a few other situations but seems it is becoming more popular here. IIRC the issue was related to mapping queues into user-space memory which breaks fork. We had a couple solutions, let me follow up and see what we can do.

aditya4d · 2018-05-21T19:50:38Z

How can I reproduce the issue?

[APP] ROCM RPC

4b31a2a

tqchen merged commit 7233ce1 into apache:master May 11, 2018

tqchen added a commit to tqchen/tvm that referenced this pull request Jul 6, 2018

[APP] ROCM RPC (apache#1155)

396393c

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[APP] ROCM RPC (apache#1155)

e2a22f0

masahi mentioned this pull request Sep 25, 2018

[WIP] De-blacklist test_multiprocessing. ROCm/pytorch#211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APP] ROCM RPC#1155

[APP] ROCM RPC#1155
tqchen merged 1 commit intoapache:masterfrom
tqchen:master

tqchen commented May 11, 2018 •

edited

Loading

Uh oh!

bensander commented May 11, 2018

Uh oh!

tqchen commented May 11, 2018 •

edited

Loading

Uh oh!

tqchen commented May 11, 2018

Uh oh!

tqchen commented May 11, 2018 •

edited

Loading

Uh oh!

tqchen commented May 11, 2018

Uh oh!

bensander commented May 11, 2018

Uh oh!

aditya4d commented May 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tqchen commented May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bensander commented May 11, 2018

Uh oh!

tqchen commented May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented May 11, 2018

Uh oh!

tqchen commented May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented May 11, 2018

Uh oh!

bensander commented May 11, 2018

Uh oh!

aditya4d commented May 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tqchen commented May 11, 2018 •

edited

Loading

tqchen commented May 11, 2018 •

edited

Loading

tqchen commented May 11, 2018 •

edited

Loading