Add option to tweak inductor stride settings for user-defined triton kernels#135530
Add option to tweak inductor stride settings for user-defined triton kernels#135530zou3519 wants to merge 2 commits intogh/zou3519/1066/basefrom
Conversation
…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135530
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 448c377 with merge base 5a9ac83 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test ghstack-source-id: a67d19e Pull Request resolved: #135530
| debug("user_defined_triton_kernel_layout_constraints") | ||
| if ( | ||
| config.triton_kernel_default_layout_constraint | ||
| == "needs_fixed_stride_order" |
There was a problem hiding this comment.
Will strings needs_fixed_stride_order and flexible_layout be used in other places? If so, can we avoid hardcoding?
There was a problem hiding this comment.
I don't think so, I'll turn it into an enum if I end up needing it more. I wasn't sure if enums were valid Inductor config options (there's a limited set of types it accepts).
…ned triton kernels" Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test ghstack-source-id: 7c7bdd2 Pull Request resolved: #135530
eellison
left a comment
There was a problem hiding this comment.
Do we still want stride order as the api now that we support matching exact strides ?
We haven't put the exact strides logic in yet. The API should be {flexible_layout, match_stride_order, match_exact_strides}, and the ultimate default we want is match_exact_strides. We're moving there but "match_stride_order" is a good middle ground. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ntroduced in #135530 [ghstack-poisoned]
…rder" (#135581) This is to match the default layout constraint for custom operators. By default, Inductor should match the stride order of inputs to a triton kernel. Test Plan: - existing tests Pull Request resolved: #135581 Approved by: https://github.com/eellison ghstack dependencies: #135530
…ntroduced in #135530 (#135656) Pull Request resolved: #135656 Approved by: https://github.com/EikanWang, https://github.com/zou3519
…kernels (pytorch#135530) Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test Pull Request resolved: pytorch#135530 Approved by: https://github.com/FindHao, https://github.com/oulgen
…rder" (pytorch#135581) This is to match the default layout constraint for custom operators. By default, Inductor should match the stride order of inputs to a triton kernel. Test Plan: - existing tests Pull Request resolved: pytorch#135581 Approved by: https://github.com/eellison ghstack dependencies: pytorch#135530
…ntroduced in pytorch#135530 (pytorch#135656) Pull Request resolved: pytorch#135656 Approved by: https://github.com/EikanWang, https://github.com/zou3519
Stack from ghstack (oldest at bottom):
Previously, Inductor was allowed to modify the stride/storage_offset
(layout) for inputs to user-defined triton kernels. This can cause
silent incorrectness because most triton kernels are written for a
specific striding pattern (usually contiguous).
This PR adds a config to allow the user to choose Inductor's behavior on
this. The options are:
to user-defined triton kernels as much as it wants.
(when compared to tracing) for inputs to user-defined triton kernels.
This matches our handling for custom operators. In the future, we'll
want a "needs_exact_strides" option (this is the safest option).
Test Plan:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang