[CUDA][Green Context] Expose green context streams#171116
[CUDA][Green Context] Expose green context streams#171116eqy wants to merge 14 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171116
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (4 Unrelated Failures)As of commit 38dca67 with merge base 5e30b70 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| @@ -97,6 +100,7 @@ GreenContext::GreenContext(uint32_t device_id, uint32_t num_sms) { | |||
| green_ctx_ = std::exchange(other.green_ctx_, nullptr); | |||
There was a problem hiding this comment.
You should ifdef the entire move ctor and have these all be member initializers. Also that way the TORCH_CHECK error would properly give a stack trace instead of just terminating.
| CUgreenCtx green_ctx_ = nullptr; | ||
| CUcontext context_ = nullptr; | ||
| cudaStream_t parent_stream_ = nullptr; | ||
| CUstream green_ctx_stream_; |
There was a problem hiding this comment.
Shouldn't this also be nullptr initialized?
| auto default_stream = c10::cuda::getDefaultCUDAStream(); | ||
| ev.block(default_stream); | ||
| c10::cuda::setCurrentCUDAStream(default_stream); | ||
| auto green_ctx_stream = c10::cuda::getStreamFromExternal(green_ctx_stream_, device_id_); |
There was a problem hiding this comment.
can we create greem_ctx_stream_ as CUDAStream so you can directly use it here? nbd if no
There was a problem hiding this comment.
Can just revert to using the default stream in this case given below comment, no real reason this has to be the same stream as returned by Stream()
|
|
||
| CUDAStream GreenContext::Stream() { | ||
| #if HAS_CUDA_GREEN_CONTEXT() | ||
| return c10::cuda::getStreamFromExternal(green_ctx_stream_, device_id_); |
There was a problem hiding this comment.
Ugh this limits users to just one stream per green context? People are used to writing s1 = torch.cuda.Stream(); s2 = torch.cuda.Stream(), if ctx.Stream() has drastically different behavior this will be confusing. Also is there a real reason for this?
There was a problem hiding this comment.
I don't think so if we make the tracking user responsibility
| CUstream green_ctx_side_stream; | ||
| C10_CUDA_DRIVER_CHECK(c10::cuda::DriverAPI::get()->cuGreenCtxStreamCreate_( | ||
| &green_ctx_side_stream, green_ctx_, CU_STREAM_NON_BLOCKING, 0)); | ||
| // implies we leak side-streams, but this has precedent in e.g., c10/cuda/CUDAStream.cpp |
There was a problem hiding this comment.
not really, CUDAStream.cpp creates fixed number of streams, getStreamFromExternal implies that external libraries that created the stream can also destroy it, but here we can potentially create and leak an unbounded number of streams, because it's very common to have code that just creates and "destroys" streams like no tomorrow.
Can we instead go CUDAStream.cpp route, precreate a fixed number of streams and dole them out as needed?
|
@pytorchmergebot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
1743baa to
51bbc95
Compare
| CUcontext context_ = nullptr; | ||
| cudaStream_t parent_stream_ = nullptr; | ||
| std::array<CUstream, kStreamPerGreenContextPool> green_ctx_streams_; | ||
| int32_t curr_stream_idx_ = -1; |
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Successfully rebased |
6ba6d3f to
38dca67
Compare
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check Details for Dev Infra teamRaised by workflow job |
|
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 5 checks: Limited CI on H100 / linux-jammy-cuda12_8-py3_10-gcc11-sm90-FA3-ABI-stable-test / test, trunk / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4), trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4), trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4), Meta Internal-Only Changes Check Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…71116)" This reverts commit 5ecb35e. Reverted pytorch#171116 on behalf of https://github.com/jeanschmidt due to breaks internal builds, see D90148243 ([comment](pytorch#171116 (comment)))
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge -f "merge keeps timing out" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Also uses a non-default stream in the green context as passing around a default (null) stream seems sketchyset/pop-context APIs still use default stream
cc @ptrblck @msaroufim @jerryzh168 @tinglvv @nWEIdia