[HIP] Fix hipEventQuery/hipEventSynchronize stream capture mode restrictions#3176
Merged
JeniferC99 merged 1 commit intorelease/rocm-rel-7.2from Feb 13, 2026
Conversation
8e0da11 to
dcc6234
Compare
…ictions This fix addresses the issue where hipEventQuery and hipEventSynchronize incorrectly returned hipErrorStreamCaptureUnsupported when called from a thread that had switched to RELAXED or THREAD_LOCAL capture mode, even when another thread had GLOBAL capture active. Key changes: - Added checkEventCaptureRestrictions() helper function to properly handle stream capture mode restrictions for event operations - RELAXED mode threads can now query/sync events during cross-thread captures - THREAD_LOCAL mode threads skip cross-thread GLOBAL capture checks - Events recorded during capture still correctly return hipErrorCapturedEvent - Added comprehensive unit tests for all capture mode combinations This fix is critical for PyTorch/RCCL integration where watchdog threads need to call hipEventQuery while another thread has GLOBAL capture active.
dcc6234 to
d0e85c1
Compare
mangupta
approved these changes
Feb 12, 2026
Collaborator
|
manual psdb triggered: http://rocm-ci.amd.com/job/compute-psdb-rel-7.2/350 |
pianpwk
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Mar 17, 2026
This PR introduces a workaround for the HIP runtime bug (#177309) where `hipEventQuery` from a non-capturing thread invalidates graph captures on other threads, even in `THREAD_LOCAL` mode(ROCm/rocm-systems#3176). The NCCL/RCCL watchdog's polling queries hit this. ### Code Changes #### `ProcessGroupNCCL.cpp` - `queryEventWithRocmWatchdogCaptureWorkaround()` wraps `CUDAEvent::query()` logic: - Watchdog calling during active capture: skips the query, returns false (not ready) - Otherwise queries normally, but catches `hipErrorCapturedEvent` / `hipErrorStreamCaptureUnsupported` from the watchdog and maps them to "not ready" for race conditions - `RocmWatchdogEventQueryContextGuard` thread-local guard set in `runLoop()` so the skip path only activates on the watchdog — main-thread `wait()`/`isCompleted()` unchanged - Timeout checks gated on `!is_graph_capture_active()` to avoid false positives while queries are skipped #### `CUDAGraph.cpp/h` - `is_graph_capture_active()` reads the existing `_currently_capturing_graphs` map under its mutex - `capture_end()` erases the map entry before `AT_CUDA_CHECK` so the watchdog never sees stale state on error paths All `#ifdef USE_ROCM`. TODO to remove once the HIP runtime fix ships. Pull Request resolved: #176251 Approved by: https://github.com/jeffdaily, https://github.com/ngimel
EmanueleCoradin
pushed a commit
to EmanueleCoradin/pytorch
that referenced
this pull request
Mar 27, 2026
) This PR introduces a workaround for the HIP runtime bug (pytorch#177309) where `hipEventQuery` from a non-capturing thread invalidates graph captures on other threads, even in `THREAD_LOCAL` mode(ROCm/rocm-systems#3176). The NCCL/RCCL watchdog's polling queries hit this. - `queryEventWithRocmWatchdogCaptureWorkaround()` wraps `CUDAEvent::query()` logic: - Watchdog calling during active capture: skips the query, returns false (not ready) - Otherwise queries normally, but catches `hipErrorCapturedEvent` / `hipErrorStreamCaptureUnsupported` from the watchdog and maps them to "not ready" for race conditions - `RocmWatchdogEventQueryContextGuard` thread-local guard set in `runLoop()` so the skip path only activates on the watchdog — main-thread `wait()`/`isCompleted()` unchanged - Timeout checks gated on `!is_graph_capture_active()` to avoid false positives while queries are skipped - `is_graph_capture_active()` reads the existing `_currently_capturing_graphs` map under its mutex - `capture_end()` erases the map entry before `AT_CUDA_CHECK` so the watchdog never sees stale state on error paths All `#ifdef USE_ROCM`. TODO to remove once the HIP runtime fix ships. Pull Request resolved: pytorch#176251 Approved by: https://github.com/jeffdaily, https://github.com/ngimel (cherry picked from commit 5ae3a6f)
EmanueleCoradin
pushed a commit
to EmanueleCoradin/pytorch
that referenced
this pull request
Mar 30, 2026
) This PR introduces a workaround for the HIP runtime bug (pytorch#177309) where `hipEventQuery` from a non-capturing thread invalidates graph captures on other threads, even in `THREAD_LOCAL` mode(ROCm/rocm-systems#3176). The NCCL/RCCL watchdog's polling queries hit this. - `queryEventWithRocmWatchdogCaptureWorkaround()` wraps `CUDAEvent::query()` logic: - Watchdog calling during active capture: skips the query, returns false (not ready) - Otherwise queries normally, but catches `hipErrorCapturedEvent` / `hipErrorStreamCaptureUnsupported` from the watchdog and maps them to "not ready" for race conditions - `RocmWatchdogEventQueryContextGuard` thread-local guard set in `runLoop()` so the skip path only activates on the watchdog — main-thread `wait()`/`isCompleted()` unchanged - Timeout checks gated on `!is_graph_capture_active()` to avoid false positives while queries are skipped - `is_graph_capture_active()` reads the existing `_currently_capturing_graphs` map under its mutex - `capture_end()` erases the map entry before `AT_CUDA_CHECK` so the watchdog never sees stale state on error paths All `#ifdef USE_ROCM`. TODO to remove once the HIP runtime fix ships. Pull Request resolved: pytorch#176251 Approved by: https://github.com/jeffdaily, https://github.com/ngimel
AaronWang04
pushed a commit
to AaronWang04/pytorch
that referenced
this pull request
Mar 31, 2026
) This PR introduces a workaround for the HIP runtime bug (pytorch#177309) where `hipEventQuery` from a non-capturing thread invalidates graph captures on other threads, even in `THREAD_LOCAL` mode(ROCm/rocm-systems#3176). The NCCL/RCCL watchdog's polling queries hit this. ### Code Changes #### `ProcessGroupNCCL.cpp` - `queryEventWithRocmWatchdogCaptureWorkaround()` wraps `CUDAEvent::query()` logic: - Watchdog calling during active capture: skips the query, returns false (not ready) - Otherwise queries normally, but catches `hipErrorCapturedEvent` / `hipErrorStreamCaptureUnsupported` from the watchdog and maps them to "not ready" for race conditions - `RocmWatchdogEventQueryContextGuard` thread-local guard set in `runLoop()` so the skip path only activates on the watchdog — main-thread `wait()`/`isCompleted()` unchanged - Timeout checks gated on `!is_graph_capture_active()` to avoid false positives while queries are skipped #### `CUDAGraph.cpp/h` - `is_graph_capture_active()` reads the existing `_currently_capturing_graphs` map under its mutex - `capture_end()` erases the map entry before `AT_CUDA_CHECK` so the watchdog never sees stale state on error paths All `#ifdef USE_ROCM`. TODO to remove once the HIP runtime fix ships. Pull Request resolved: pytorch#176251 Approved by: https://github.com/jeffdaily, https://github.com/ngimel
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This fix addresses the issue where hipEventQuery and hipEventSynchronize incorrectly returned hipErrorStreamCaptureUnsupported when called from a thread that had switched to RELAXED or THREAD_LOCAL capture mode, even when another thread had GLOBAL capture active.
Technical Details
Added checkEventCaptureRestrictions() helper function to properly handle stream capture mode restrictions for event operations
RELAXED mode threads can now query/sync events during cross-thread captures
THREAD_LOCAL mode threads skip cross-thread GLOBAL capture checks
Events recorded during capture still correctly return hipErrorCapturedEvent
Added comprehensive unit tests for all capture mode combinations
This fix is critical for PyTorch/RCCL integration where watchdog threads need to call hipEventQuery while another thread has GLOBAL capture active.
JIRA ID
SWDEV-579185
Test Plan
NA
Test Result
NA
Submission Checklist