[user-streams] Enable FX metadata stream annotations in scheduler#174223
[user-streams] Enable FX metadata stream annotations in scheduler#174223mlazos wants to merge 19 commits intogh/mlazos/93/basefrom
Conversation
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174223
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 2f618c0 with merge base d2d12ef ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: b459478 Pull Request resolved: #174223
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: b459478 Pull Request resolved: #174223
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: b459478 Pull Request resolved: #174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: b208af6 Pull Request resolved: #174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: 9263147 Pull Request resolved: #174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: 7486432 Pull Request resolved: #174223
eellison
left a comment
There was a problem hiding this comment.
Can you say more about how we guarantee the meta stream is correct ? E.g., do we need to add support to FakeTensorUpdater ? Do we need to make register_replacement, replace_by_example, etc, propagate stream assignment ?
At one point we discussed a fork stream op that would take inputs/produce them on output stream. do you have any graphs i could look at ?
Yeah, I have a draft of the fork/join stuff, but wanted to experiment with this first to see how complex it would be to propagate. This impl is a lot simpler, and I wanted to let Ads experiment with this first. I do remember we discussed this, do you think propagation would be complicated? With the frontier stuff people have been relying on meta quite a bit, especially for the regional inductor work so I was thinking it was solid. |
|
I think that's reasonable, but I do think we should re-propagate, or have other safety mechanisms. Any graph pattern today will lose the stream annotation. |
|
@claude review this pr |
|
Claude finished @mlazos's task in 3m 19s —— View job
Review: [user-streams] Enable FX metadata stream annotations in schedulerOverall the approach is sound — reading 1.
Similarly, The codegen loop at line 7282 iterates over 2. At line 7272-7274: unique_streams = OrderedSet(self.node_to_stream.values())
num_streams = max(unique_streams) + 1 if unique_streams else 1This computes 3. Origin iteration takes only the first stream annotation At lines 3315-3326, when iterating over FX node origins, you break on the first origin that has a for fx_node in origins:
...
if "stream" in custom_meta:
...
breakIf a scheduler node has multiple origins with different stream annotations, this silently picks one. This could happen if an operation's inputs come from different streams. A diagnostic warning or assertion that all origins agree on the stream assignment would make debugging easier. 4.
5. Stream context exit ordering in device guard cleanup At lines 7261-7263: # Exit stream context before exiting device guard
if self.current_stream_idx is not None:
self.generate_stream_ctx_exit()This is correct — you need to exit the stream context before exiting the device guard. But there's no corresponding stream context exit when the codegen loop finishes (after the last node). If the last node is on a non-default stream, does the stream context get properly closed? I'd check whether there's cleanup logic at the end of the Summary: The main concern is issue #1 — fused nodes won't be in --- | |
| self.buff_to_stream[buf] = stream_idx | ||
|
|
||
| # Check if we have any nodes on non-default streams | ||
| self._multi_stream_nodes = any( |
There was a problem hiding this comment.
self._multi_stream_nodes = next_stream_idx > 1 ?
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: 7486432 Pull Request resolved: #174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: 7486432 Pull Request resolved: #174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. ghstack-source-id: 4ec5175 Pull Request resolved: pytorch/pytorch#174223
…heduler" Read the 'custom.stream' FX metadata to determine which stream each scheduler node should run on. This enables the inductor scheduler to: 1. Populate stream assignments BEFORE fusion to prevent fusing nodes across stream boundaries 2. Check stream assignments in can_fuse() to block cross-stream fusion 3. Handle stream context switching in codegen with proper device guard nesting The stream metadata is set by dynamo when tracing torch.cuda.stream() context managers. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Add comprehensive tests for user stream support in inductor: - Stream utility tests (pool, context manager, naming) - Event factory tests (creation, ordering, hashing) - Wrapper codegen tests (stream context enter/exit) - Compile tests for stream semantics preservation The compile tests verify that torch.compile() correctly handles: - Stream context managers - Event record/wait operations - Multi-stream synchronization patterns - Fusion behavior within and across streams Test assertions check for generated code patterns that may appear as either custom ops (record_event/wait_event) or method calls. Pull Request resolved: #174224 Approved by: https://github.com/shunting314 ghstack dependencies: #165505, #174223
Stack from ghstack (oldest at bottom):
Read the 'custom.stream' FX metadata to determine which stream each
scheduler node should run on. This enables the inductor scheduler to:
across stream boundaries
nesting
The stream metadata is set by dynamo when tracing torch.cuda.stream()
context managers.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo