Support stream capture of event record and wait nodes in cuda graphs by galv · Pull Request #155372 · pytorch/pytorch

galv · 2025-06-06T21:58:19Z

These are created by the user passing cudaEventRecordExternal and
cudaEventWaitExternal to cudaEventRecordWithFlags() and
cudaStreamWaitEvent() respectively.

We do this by allowing the user to specify external=True when
constructing a torch.cuda.Event().

If external=False, the cudaEventRecord and cudaStreamWaitEvent API's
have a different meaning described here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cross-stream-dependencies-and-events

In short, they will be used to experess fork and join operations in
the graph if external=False.

External events can be used for expressing a fine-grained dependency
on the outcome of some nodes in a cuda graph (rather than all
nodes). They can also be used for timing parts of a cuda graph's
execution, rather than timing the entire graph's execution.

Finishes #146145

I'm a dummy and don't know how to use ghstack at this time. The first commit is a bug fix for _CudaKernel, which would previously always launch work on the NULL stream, rather than the user-passed stream.

cc @mcarilli @ezyang @eellison @penguinwu @BoyuanFeng

pytorch-bot · 2025-06-06T21:58:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155372

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6aaeb54 with merge base bf7e290 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/cuda/_utils.py

aten/src/ATen/cuda/CUDAEvent.h

nmacchioni · 2025-06-09T19:21:37Z

Thanks for taking this over! This will unlock some very interesting possibilities for autotuning

test/test_cuda.py

ngimel

Cool, thanks so much!

… right stream. Correctly pass stream argument to cuLaunchKernel. Previously, all kernels launched via _CudaKernel would use the NULL stream. Whoops.

These are created by the user passing cudaEventRecordExternal and cudaEventWaitExternal to cudaEventRecordWithFlags() and cudaStreamWaitEvent() respectively. We do this by allowing the user to specify external=True when constructing a torch.cuda.Event(). If external=False, the cudaEventRecord and cudaStreamWaitEvent API's have a different meaning described here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cross-stream-dependencies-and-events In short, they will be used to experess fork and join operations in the graph if external=False. External events can be used for expressing a fine-grained dependency on the outcome of some nodes in a cuda graph (rather than all nodes). They can also be used for timing parts of a cuda graph's execution, rather than timing the entire graph's execution.

This reverts commit 5237974a9bc3ec105b46fe2a18904a95ca3a414e. CI fails with errors like: `error: use of undeclared identifier 'hipEventWaitExternal'` Apparently the rocm version usd in CI is not new enough.

ngimel · 2025-06-17T00:00:25Z

test/test_cuda.py

+    @unittest.skipIf(
+        not TEST_CUDA_GRAPH or TEST_WITH_ROCM,
+        "CUDA >= 11.0 required for external events in cuda graphs. rocm does not support external events",
+    )


Delete this? cuda is always >=11.0 and you are skipping ROCM already

Sure. I ask that we defer removal of the CUDA >= 11.0 checks in the rest of the cuda graphs test code and implementation code for another PR, though. I'm scared of making PR's touch too much separate code.

Yeah sure but in this case it seems like a no brainer, and it's a new test.

galv · 2025-06-17T02:29:43Z

I made a small commit addressing #155372 (comment), but CI was passing, so this PR should be good to go once it passes again!

ngimel · 2025-06-17T03:58:37Z

Lint error is real, when fixed feel free to merge

galv · 2025-06-17T17:43:40Z

@pytorchbot merge

pytorchmergebot · 2025-06-17T17:45:27Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

galv requested review from ngimel and nmacchioni June 6, 2025 21:58

galv requested review from eqy and syed-ahmed as code owners June 6, 2025 21:58

pytorchbot added the open source label Jun 6, 2025

galv added module: cuda graphs Ability to capture and then replay streams of CUDA kernels release notes: cuda release notes category labels Jun 6, 2025

galv commented Jun 6, 2025

View reviewed changes

torch/cuda/_utils.py Outdated Show resolved Hide resolved

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 9, 2025

nmacchioni reviewed Jun 9, 2025

View reviewed changes

aten/src/ATen/cuda/CUDAEvent.h Outdated Show resolved Hide resolved

galv requested review from jeffdaily and jithunnair-amd as code owners June 11, 2025 16:10

galv force-pushed the dgalvez/finish-external-events-3 branch from 5237974 to 744f7b2 Compare June 12, 2025 19:53

ngimel reviewed Jun 13, 2025

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

ngimel reviewed Jun 13, 2025

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

ngimel approved these changes Jun 13, 2025

View reviewed changes

galv added 7 commits June 16, 2025 09:04

Support calling _CudaKernel with pinned cpu memory tensors and on the…

5eee24f

… right stream. Correctly pass stream argument to cuLaunchKernel. Previously, all kernels launched via _CudaKernel would use the NULL stream. Whoops.

hip mappings

c6b5b90

Revert "hip mappings"

1643ad5

This reverts commit 5237974a9bc3ec105b46fe2a18904a95ca3a414e. CI fails with errors like: `error: use of undeclared identifier 'hipEventWaitExternal'` Apparently the rocm version usd in CI is not new enough.

Error when people try to use external events with rocm.

a5ad491

Minor fixes.

89f4df5

fixup

0f15a29

galv force-pushed the dgalvez/finish-external-events-3 branch from ed53211 to 0f15a29 Compare June 16, 2025 16:07

galv added 2 commits June 16, 2025 09:08

fixup

7f1c22a

Run test only if gpu supports compute capability>=7.0

07e13da

ngimel reviewed Jun 17, 2025

View reviewed changes

Remove redundant check.

5d5e85c

Format documentation nicely with descriptive names for URLs.

1df8c34

galv added 2 commits June 16, 2025 21:29

Repress line-too-long warning correctly.

ebf0afa

Keep noqa out of the doc string itself.

6aaeb54

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 17, 2025

pytorchmergebot added the merging label Jun 17, 2025

pytorchmergebot added the Merged label Jun 17, 2025

pytorchmergebot closed this in 4c0aa37 Jun 17, 2025

pytorchmergebot removed the merging label Jun 17, 2025

galv mentioned this pull request Jul 17, 2025

[CUDA] Reuse blocks with record_stream during CUDA Graph capture in the CUDACachingAllocator #158352

Closed

eee4017 mentioned this pull request Feb 28, 2026

[CUDA] Free deferred record_stream blocks at graph capture end #175817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support stream capture of event record and wait nodes in cuda graphs#155372

Support stream capture of event record and wait nodes in cuda graphs#155372
galv wants to merge 13 commits intopytorch:mainfrom
galv:dgalvez/finish-external-events-3

galv commented Jun 6, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

nmacchioni commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

ngimel left a comment

Uh oh!

ngimel Jun 17, 2025

Uh oh!

galv Jun 17, 2025

Uh oh!

ngimel Jun 17, 2025

Uh oh!

galv commented Jun 17, 2025

Uh oh!

ngimel commented Jun 17, 2025

Uh oh!

galv commented Jun 17, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

galv commented Jun 6, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155372

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

nmacchioni commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

galv Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ngimel Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

galv commented Jun 17, 2025

Uh oh!

ngimel commented Jun 17, 2025

Uh oh!

galv commented Jun 17, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

galv commented Jun 6, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading