Skip to content

Support generic dynamic shape with padding#160997

Closed
nandesuka wants to merge 1 commit intopytorch:mainfrom
nandesuka:export-D80468808
Closed

Support generic dynamic shape with padding#160997
nandesuka wants to merge 1 commit intopytorch:mainfrom
nandesuka:export-D80468808

Conversation

@nandesuka
Copy link
Contributor

@nandesuka nandesuka commented Aug 19, 2025

Summary:
Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.
This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160997

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 3f1027b with merge base e4bd0ff (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

@nandesuka
Copy link
Contributor Author

@pytorchbot label "release notes: inductor"

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Aug 20, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes

In the case of static shape by enabling these two options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes. In the case where dynamic shapes is enabled no padding is done today. 

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases.

config.pad_dynamic_shapes

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

@nandesuka nandesuka requested a review from jansel August 20, 2025 20:29
def replace_floor_div(expr: sympy.Expr) -> sympy.Expr:
"""
Converts floor(x / c) to x // c.
Converts -x / c or to (x + c - 1) / c
Copy link
Contributor

@blaine-rister blaine-rister Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo?

)
pad_channels_last = False

# Control if we will do padding on dynamic shapes
Copy link
Contributor

@blaine-rister blaine-rister Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to avoid performance/memory usage regressions on existing tests? I guess another option would be to set comprehensive_padding=False on those tests, but that could be a more invasive change. I'll defer to the other reviewers on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that is the motivation, seems like there are models which have dynamic shapes with comprehensive_padding enabled but don't produced padded output today. This flag allows us to keep that behaviour to prevent regressions on perf/mem.

) or (isinstance(stride, sympy.Expr) and config.pad_dynamic_shapes)
new_strides[idx] = stride
if require_padding:
new_strides[idx] = ceildiv(stride, align) * align
Copy link
Contributor

@blaine-rister blaine-rister Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of expression do we get for 3D tensors? It might be good to add a test case for that. I'm wondering if sympy is able to remove the extra ceildiv's on outer strides.

Copy link
Contributor Author

@nandesuka nandesuka Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks something like this:

(8*s48*(((s87 + 7)//8)), 8*(((s87 + 7)//8)), 1)

Copy link
Contributor

@blaine-rister blaine-rister Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good. It seems like it only has one ceildiv, with the others being optimized out. This is what I was hoping to see.

Copy link
Contributor

@blaine-rister blaine-rister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! This mostly LGTM. I left a few nits and a question about 3d testing.

pytorch-bot bot pushed a commit that referenced this pull request Aug 25, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Aug 25, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808
nandesuka added a commit to nandesuka/pytorch that referenced this pull request Aug 25, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808
pytorch-bot bot pushed a commit that referenced this pull request Aug 25, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

Test Plan:
CI

Rollback Plan:

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:
Pull Request resolved: pytorch#160997

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:
Pull Request resolved: pytorch#160997

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

nandesuka added a commit to nandesuka/pytorch that referenced this pull request Sep 2, 2025
Summary:

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

Summary:
Pull Request resolved: pytorch#160997

Inductor has the following configurations:

config.comprehensive_padding
config.padding_alignment_bytes
config.padding_stride_threshold

In the case of static shape by enabling these three options Inductor will generate code for Flexible layout tensors that tries to pad up all stride dimension to be a multiple of config.padding_alignment_bytes for strides above: config.padding_stride_threshold. In the case where dynamic shapes is enabled no padding is done today.

This PR introduces the following configuration which allows the user to specify they wish to generated a padded stride even in the case of dynamic shape operations. This is mainly done so we don't break the previous behaviour of not padding up dynamic shape use cases. The config.padding_stride_threshold does not apply since the values of the strides are dynamic.

config.pad_dynamic_shapes

In addition to this a new mode "python_slow" has been added to launch grid calculation which achieves the same ceildiv behaviour that is generally applicable to integer division. This is done to prevent test regressions and make wrapper_fxir codegen more generic.

Test Plan:
CI

Rollback Plan:

Reviewed By: blaine-rister

Differential Revision: D80468808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80468808

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants