[user-streams] Add stream support to scheduler#165505
[user-streams] Add stream support to scheduler#165505mlazos wants to merge 23 commits intogh/mlazos/44/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165505
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit b9a8960 with merge base d2d12ef ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
ghstack-source-id: 66a91d4 Pull Request resolved: pytorch/pytorch#165505
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
|
@claude, review this PR |
|
Claude finished @desertfire's task in 4m 28s —— View job PR Review: [user-streams] Add stream support to schedulerThis PR adds stream context tracking and switching infrastructure to the Overall AssessmentThe design is clean. The scheduler tracks only which stream context it's currently emitting into and generates Issues1. Typo: "plaining" should be "planning" ( # Don't generate ctx switching. Memory plaining code (e.g., delete buffers) on currentShould be "planning" (or perhaps "cleanup" would be more descriptive here). 2.
3. When Minor Observations
SummaryThe PR looks good structurally. The main items to address:
No blocking concerns beyond the typo. |
torch/_inductor/scheduler.py
Outdated
| """Code-gen to enter the Stream context assigned to node.""" | ||
| assert not isinstance(node, NopKernelSchedulerNode) | ||
| # pyrefly: ignore[missing-attribute] | ||
| node_stream = self.node_to_stream[node] |
There was a problem hiding this comment.
You have a PR ordering issue? Since node_to_stream is defined in the next PR. (Thanks Claude pointing this out)
There was a problem hiding this comment.
Yeah let me fix this
|
The PR's commit msg doesn't match its content. |
Summary Add CUDA stream context management to the inductor scheduler's codegen phase. This is the foundational infrastructure for multi-stream code generation: it tracks which stream is currently active and generates the appropriate with torch.cuda.stream(...) enter/exit wrappers around kernels. Key pieces: - _current_stream_ctx field tracks the active EnterCudaStreamContextLine during codegen - current_stream_idx / current_stream_name properties expose the current stream state (also used by the wrapper's nesting guard) - generate_stream_ctx_switching() is the main entry point called per-node during codegen — it compares the node's assigned stream to the current stream and emits enter/exit code only when switching is needed. NopKernelSchedulerNodes inherit the previous stream context since they generate no kernel code. This commit adds the codegen-time stream management; the stream assignment logic (node_to_stream, _populate_stream_assignments) and the codegen callsite are in the next commit in the stack. Test plan - Tests are in the third commit of the stack (test/inductor/test_user_streams.py) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
Summary Add CUDA stream context management to the inductor scheduler's codegen phase. This is the foundational infrastructure for multi-stream code generation: it tracks which stream is currently active and generates the appropriate with torch.cuda.stream(...) enter/exit wrappers around kernels. Key pieces: - _current_stream_ctx field tracks the active EnterCudaStreamContextLine during codegen - current_stream_idx / current_stream_name properties expose the current stream state (also used by the wrapper's nesting guard) - generate_stream_ctx_switching() is the main entry point called per-node during codegen — it compares the node's assigned stream to the current stream and emits enter/exit code only when switching is needed. NopKernelSchedulerNodes inherit the previous stream context since they generate no kernel code. This commit adds the codegen-time stream management; the stream assignment logic (node_to_stream, _populate_stream_assignments) and the codegen callsite are in the next commit in the stack. Test plan - Tests are in the third commit of the stack (test/inductor/test_user_streams.py) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
ghstack-source-id: 3fb569a Pull Request resolved: pytorch/pytorch#165505
ghstack-source-id: 746cc10 Pull Request resolved: pytorch/pytorch#165505
Summary Add CUDA stream context management to the inductor scheduler's codegen phase. This is the foundational infrastructure for multi-stream code generation: it tracks which stream is currently active and generates the appropriate with torch.cuda.stream(...) enter/exit wrappers around kernels. Key pieces: - _current_stream_ctx field tracks the active EnterCudaStreamContextLine during codegen - current_stream_idx / current_stream_name properties expose the current stream state (also used by the wrapper's nesting guard) - generate_stream_ctx_switching() is the main entry point called per-node during codegen — it compares the node's assigned stream to the current stream and emits enter/exit code only when switching is needed. NopKernelSchedulerNodes inherit the previous stream context since they generate no kernel code. This commit adds the codegen-time stream management; the stream assignment logic (node_to_stream, _populate_stream_assignments) and the codegen callsite are in the next commit in the stack. Test plan - Tests are in the third commit of the stack (test/inductor/test_user_streams.py) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo chenyang78 [ghstack-poisoned]
|
Starting merge as part of PR stack under #174223 |
…74223) Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. Pull Request resolved: #174223 Approved by: https://github.com/shunting314 ghstack dependencies: #165505
Add comprehensive tests for user stream support in inductor: - Stream utility tests (pool, context manager, naming) - Event factory tests (creation, ordering, hashing) - Wrapper codegen tests (stream context enter/exit) - Compile tests for stream semantics preservation The compile tests verify that torch.compile() correctly handles: - Stream context managers - Event record/wait operations - Multi-stream synchronization patterns - Fusion behavior within and across streams Test assertions check for generated code patterns that may appear as either custom ops (record_event/wait_event) or method calls. Pull Request resolved: #174224 Approved by: https://github.com/shunting314 ghstack dependencies: #165505, #174223
Summary
Add CUDA stream context management to the inductor scheduler's codegen phase. This is the
foundational infrastructure for multi-stream code generation: it tracks which stream is currently
active and generates the appropriate with torch.cuda.stream(...) enter/exit wrappers around
kernels.
Key pieces:
the wrapper's nesting guard)
compares the node's assigned stream to the current stream and emits enter/exit code only when
switching is needed. NopKernelSchedulerNodes inherit the previous stream context since they
generate no kernel code.
This commit adds the codegen-time stream management; the stream assignment logic (node_to_stream,
_populate_stream_assignments) and the codegen callsite are in the next commit in the stack.
Test plan
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78