Support generic dynamic shape with padding by nandesuka · Pull Request #160997 · pytorch/pytorch

nandesuka · 2025-08-19T20:05:49Z

Summary:
Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.
This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

pytorch-bot · 2025-08-19T20:05:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160997

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 3f1027b with merge base e4bd0ff ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / unit-test / inductor-halide-build / build (gh) (trunk failure)
undefined reference to NVPW_InitializeHost'`
inductor / unit-test / inductor-triton-cpu-build / build (gh) (trunk failure)
undefined reference to NVPW_InitializeHost'`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-08-19T20:06:06Z

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka · 2025-08-19T20:11:53Z

@pytorchbot label "release notes: inductor"

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes In the case of static shape by enabling these two options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. config.pad_dynamic_shapes Test Plan: CI Rollback Plan: Differential Revision: D80468808

facebook-github-bot · 2025-08-20T16:07:47Z

This pull request was exported from Phabricator. Differential Revision: D80468808

blaine-rister · 2025-08-22T03:21:36Z

torch/_inductor/codegen/wrapper_fxir.py

            def replace_floor_div(expr: sympy.Expr) -> sympy.Expr:
                """
-                Converts floor(x / c) to x // c.
+                Converts -x / c or to (x + c - 1) / c


Is this a typo?

blaine-rister · 2025-08-22T03:24:45Z

torch/_inductor/config.py

 )
 pad_channels_last = False

+# Control if we will do padding on dynamic shapes


Is this to avoid performance/memory usage regressions on existing tests? I guess another option would be to set comprehensive_padding=False on those tests, but that could be a more invasive change. I'll defer to the other reviewers on this.

Yup that is the motivation, seems like there are models which have dynamic shapes with comprehensive_padding enabled but don't produced padded output today. This flag allows us to keep that behaviour to prevent regressions on perf/mem.

blaine-rister · 2025-08-22T03:29:04Z

torch/_inductor/ir.py

+            ) or (isinstance(stride, sympy.Expr) and config.pad_dynamic_shapes)
+            new_strides[idx] = stride
+            if require_padding:
+                new_strides[idx] = ceildiv(stride, align) * align


What kind of expression do we get for 3D tensors? It might be good to add a test case for that. I'm wondering if sympy is able to remove the extra ceildiv's on outer strides.

It looks something like this:

(8*s48*(((s87 + 7)//8)), 8*(((s87 + 7)//8)), 1)

That's good. It seems like it only has one ceildiv, with the others being optimized out. This is what I was hoping to see.

blaine-rister

Nice PR! This mostly LGTM. I left a few nits and a question about 3d testing.

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes Test Plan: CI Rollback Plan: Differential Revision: D80468808

facebook-github-bot · 2025-08-25T15:53:20Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes Test Plan: CI Rollback Plan: Differential Revision: D80468808

facebook-github-bot · 2025-08-25T16:03:07Z

This pull request was exported from Phabricator. Differential Revision: D80468808

facebook-github-bot · 2025-09-02T16:00:21Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T20:50:54Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T21:26:22Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Pull Request resolved: pytorch#160997 Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T21:26:33Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T21:59:28Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Pull Request resolved: pytorch#160997 Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T21:59:37Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T22:28:52Z

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary: Pull Request resolved: pytorch#160997 Inductor has the following configurations: config.comprehensive_padding config.padding_alignment_bytes config.padding_stride_threshold In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today. This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic. config.pad_dynamic_shapes In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister Differential Revision: D80468808

facebook-github-bot · 2025-09-02T22:29:00Z

This pull request was exported from Phabricator. Differential Revision: D80468808

facebook-github-bot · 2025-09-03T15:52:27Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-09-03T15:55:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 19, 2025

facebook-github-bot added the fb-exported label Aug 19, 2025

nandesuka requested review from angelayi, blaine-rister and eellison August 19, 2025 20:09

pytorch-bot bot added the release notes: inductor label Aug 19, 2025

nandesuka force-pushed the export-D80468808 branch from ba2a24b to 8022c1b Compare August 20, 2025 16:07

nandesuka requested a review from jansel August 20, 2025 20:29

blaine-rister reviewed Aug 22, 2025

View reviewed changes

blaine-rister requested a review from shunting314 August 22, 2025 03:32

nandesuka force-pushed the export-D80468808 branch from 8022c1b to 9944482 Compare August 25, 2025 15:53

nandesuka force-pushed the export-D80468808 branch from 9944482 to 4b38c30 Compare August 25, 2025 16:02

nandesuka force-pushed the export-D80468808 branch from 4b38c30 to 14bd563 Compare August 25, 2025 16:02

nandesuka mentioned this pull request Aug 25, 2025

Enable output padding when only outermost dim is dynamic #159404

Closed

nandesuka requested a review from blaine-rister August 25, 2025 16:06

nandesuka force-pushed the export-D80468808 branch from abc93ba to 2d0c332 Compare September 2, 2025 16:00

nandesuka force-pushed the export-D80468808 branch from 2d0c332 to 93f660b Compare September 2, 2025 20:50

nandesuka force-pushed the export-D80468808 branch from 93f660b to 9087e01 Compare September 2, 2025 21:18

nandesuka force-pushed the export-D80468808 branch from 9087e01 to e68ebfb Compare September 2, 2025 21:26

nandesuka force-pushed the export-D80468808 branch from e68ebfb to f348cd5 Compare September 2, 2025 21:55

nandesuka force-pushed the export-D80468808 branch from f348cd5 to 21f10df Compare September 2, 2025 21:59

nandesuka force-pushed the export-D80468808 branch from 21f10df to b39fb2d Compare September 2, 2025 22:23

nandesuka force-pushed the export-D80468808 branch from b39fb2d to 3f1027b Compare September 2, 2025 22:28

pytorchmergebot added the merging label Sep 3, 2025

pytorchmergebot closed this in 9491d28 Sep 3, 2025

pytorchmergebot added the Merged label Sep 3, 2025

nandesuka mentioned this pull request Sep 4, 2025

Forward fix for user defined triton kernel grid calc #162162

Closed

Conversation

nandesuka commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160997

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Aug 19, 2025

Uh oh!

nandesuka commented Aug 19, 2025

Uh oh!

facebook-github-bot commented Aug 20, 2025

Uh oh!

blaine-rister Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nandesuka Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nandesuka Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaine-rister left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 25, 2025

Uh oh!

facebook-github-bot commented Aug 25, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 2, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nandesuka commented Aug 19, 2025 •

edited

Loading

pytorch-bot bot commented Aug 19, 2025 •

edited

Loading

blaine-rister Aug 22, 2025 •

edited

Loading

blaine-rister Aug 22, 2025 •

edited

Loading

blaine-rister Aug 22, 2025 •

edited

Loading

nandesuka Aug 22, 2025 •

edited

Loading

blaine-rister Aug 27, 2025 •

edited

Loading