Skip to content

Move stream to thread local#13080

Closed
dzhulgakov wants to merge 1 commit intopytorch:masterfrom
dzhulgakov:export-D10525134
Closed

Move stream to thread local#13080
dzhulgakov wants to merge 1 commit intopytorch:masterfrom
dzhulgakov:export-D10525134

Conversation

@dzhulgakov
Copy link
Copy Markdown
Collaborator

Summary:
This is the first step to untangle this logic:

  • moves stream id to thread local mechanically
  • relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Differential Revision: D10525134

Summary:
This is the first step to untangle this logic:
- moves stream id to thread local mechanically
- relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Differential Revision: D10525134

fbshipit-source-id: 7810c458cc6beef94f255fe11a044252cc38f4f9
Copy link
Copy Markdown
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is very helpful.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 26, 2018
Summary:
Pull Request resolved: pytorch/pytorch#13080

This is the first step to untangle this logic:
- moves stream id to thread local mechanically
- relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Reviewed By: ezyang

Differential Revision: D10525134

fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Oct 26, 2018

This diff broke Caffe2 ROCm tests: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/629//console

Might just be a failure to apply this code change to the HIP code as well.

CC @iotamudelta @bddppq

@bddppq
Copy link
Copy Markdown
Contributor

bddppq commented Oct 26, 2018

@ezyang Yep the context_gpu code change need to be applied to context_hip as well (the test is automatically hipified but the context_* source code is not). I'm working on hipifying caffe2/core in #13148, I just rebased my PR to include this and triggered a test run here https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger-test/3525/, let's see the failure will go away.

New run with #gpu checks https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger-test/3609/

auto before_stream = context_outer.cuda_stream();

// try to mess up current device
CUDAContext context_different_device(1);

This comment was marked as off-topic.

@ezyang ezyang added the merged label Jun 25, 2019
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#13080

This is the first step to untangle this logic:
- moves stream id to thread local mechanically
- relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Reviewed By: ezyang

Differential Revision: D10525134

fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants