[1/2] Introduce at::accelerator::Graph as a unified Graph interface#171269
[1/2] Introduce at::accelerator::Graph as a unified Graph interface#171269guangyey wants to merge 20 commits intogh/guangyey/265/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171269
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 62ab0c3 with merge base f72a552 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Thanks @eellison |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m 'This is currently breaking some internal build, I need to revert and reland this' -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…erface (#171269)" This reverts commit 2afd3c1. Reverted #171269 on behalf of https://github.com/huydhn due to This is currently breaking some internal build, I need to revert and reland this ([comment](#171269 (comment)))
|
@guangyey your PR has been successfully reverted. |
|
@guangyey Please help do a rebase and reland this change |
|
Try to reland. |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
ghstack-source-id: d3b7f3f Pull Request resolved: pytorch/pytorch#171269
…ytorch#171269) # Motivation The original goal was to generalize `CUDAGraph` and share implementations and logic across different backends, as mentioned in pytorch#158827. However, after further offline discussions, we decided to take a more incremental approach: start by defining a unified interface, while allowing each backend to maintain its own implementation. This avoids premature coupling and addresses backend-specific concerns. This PR introduces `GraphImplInterface`, a lightweight, backend-agnostic interface that defines a unified API for graph capture and replay. Each backend (e.g., `CUDA`, `XPU`, `PrivateUse1`) provides its own implementation and registers it via `REGISTER_GRAPH_IMPL`. On top of this interface, we provide a unified graph API, `at::accelerator::Graph`, which transparently maps to: - `CUDAGraph` on CUDA - `XPUGraph` on XPU - and corresponding implementations for other backends (including `PrivateUse1`) This design establishes a common abstraction layer while preserving backend autonomy, and lays the groundwork for future sharing of logic once the interface and use cases have stabilized. An additional benefit is that, for `CUDA` and `XPU`, the backend-specific graph types (e.g., `cuda::CUDAGraph` and `xpu::XPUGraph`) can share the same underlying implementation as `accelerator::Graph` on each backend, avoiding code duplication and ensuring consistent behavior. For `PrivateUse1`, `accelerator::Graph` can be supported with minimal effort by reusing the existing `PU1Graph` implementation. Pull Request resolved: pytorch#171269 Approved by: https://github.com/EikanWang, https://github.com/eellison
…erface (pytorch#171269)" This reverts commit 2afd3c1. Reverted pytorch#171269 on behalf of https://github.com/huydhn due to This is currently breaking some internal build, I need to revert and reland this ([comment](pytorch#171269 (comment)))
…ytorch#171269) # Motivation The original goal was to generalize `CUDAGraph` and share implementations and logic across different backends, as mentioned in pytorch#158827. However, after further offline discussions, we decided to take a more incremental approach: start by defining a unified interface, while allowing each backend to maintain its own implementation. This avoids premature coupling and addresses backend-specific concerns. This PR introduces `GraphImplInterface`, a lightweight, backend-agnostic interface that defines a unified API for graph capture and replay. Each backend (e.g., `CUDA`, `XPU`, `PrivateUse1`) provides its own implementation and registers it via `REGISTER_GRAPH_IMPL`. On top of this interface, we provide a unified graph API, `at::accelerator::Graph`, which transparently maps to: - `CUDAGraph` on CUDA - `XPUGraph` on XPU - and corresponding implementations for other backends (including `PrivateUse1`) This design establishes a common abstraction layer while preserving backend autonomy, and lays the groundwork for future sharing of logic once the interface and use cases have stabilized. An additional benefit is that, for `CUDA` and `XPU`, the backend-specific graph types (e.g., `cuda::CUDAGraph` and `xpu::XPUGraph`) can share the same underlying implementation as `accelerator::Graph` on each backend, avoiding code duplication and ensuring consistent behavior. For `PrivateUse1`, `accelerator::Graph` can be supported with minimal effort by reusing the existing `PU1Graph` implementation. Pull Request resolved: pytorch#171269 Approved by: https://github.com/EikanWang, https://github.com/eellison
Stack from ghstack (oldest at bottom):
Motivation
The original goal was to generalize
CUDAGraphand share implementations and logic across different backends, as mentioned in #158827. However, after further offline discussions, we decided to take a more incremental approach: start by defining a unified interface, while allowing each backend to maintain its own implementation. This avoids premature coupling and addresses backend-specific concerns.This PR introduces
GraphImplInterface, a lightweight, backend-agnostic interface that defines a unified API for graph capture and replay. Each backend (e.g.,CUDA,XPU,PrivateUse1) provides its own implementation and registers it viaREGISTER_GRAPH_IMPL.On top of this interface, we provide a unified graph API,
at::accelerator::Graph, which transparently maps to:CUDAGraphon CUDAXPUGraphon XPUPrivateUse1)This design establishes a common abstraction layer while preserving backend autonomy, and lays the groundwork for future sharing of logic once the interface and use cases have stabilized.
An additional benefit is that, for
CUDAandXPU, the backend-specific graph types (e.g.,cuda::CUDAGraphandxpu::XPUGraph) can share the same underlying implementation asaccelerator::Graphon each backend, avoiding code duplication and ensuring consistent behavior.For
PrivateUse1,accelerator::Graphcan be supported with minimal effort by reusing the existingPU1Graphimplementation.cc @albanD @eellison @EikanWang