Enable output padding when only outermost dim is dynamic by nandesuka · Pull Request #159404 · pytorch/pytorch

nandesuka · 2025-07-29T19:25:04Z

Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if required.

Test Plan:
CI

Rollback Plan:

Differential Revision: D79146886

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

pytorch-bot · 2025-07-29T19:25:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159404

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 26 New Failures, 2 Unrelated Failures

As of commit 7095895 with merge base ee9f8ba ():

NEW FAILURES - The following jobs have failed:

pull / linux-docs / build-docs-cpp-false (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-docs / build-docs-functorch-false (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-docs / build-docs-python-false (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-cuda12.8-cudnn9-py3.10-clang12 / build (gh)
pull / linux-jammy-py3.10-clang12 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (default, 1, 5, lf.linux.4xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (default, 2, 5, lf.linux.4xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (default, 3, 5, lf.linux.4xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (default, 4, 5, lf.linux.4xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (default, 5, 5, lf.linux.4xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (dynamo_wrapped, 1, 3, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-clang12 / test (einops, 1, 1, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (backwards_compat, 1, 1, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (default, 1, 5, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (default, 4, 5, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (default, 5, 5, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (distributed, 2, 2, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (docs_test, 1, 1, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (jit_legacy, 1, 1, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.10-gcc11 / test (numpy_2_x, 1, 1, lf.linux.2xlarge) (gh)
Final attempt failed. Child_process exited with error code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32
trunk / win-vs2022-cuda12.6-py3 / build (gh) (trunk failure)
ninja: build stopped: subcommand failed

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-29T19:25:19Z

This pull request was exported from Phabricator. Differential Revision: D79146886

Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Differential Revision: D79146886

facebook-github-bot · 2025-07-31T13:08:25Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Differential Revision: D79146886

facebook-github-bot · 2025-07-31T17:52:21Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Differential Revision: D79146886

facebook-github-bot · 2025-07-31T19:53:24Z

This pull request was exported from Phabricator. Differential Revision: D79146886

facebook-github-bot · 2025-07-31T19:56:01Z

This pull request was exported from Phabricator. Differential Revision: D79146886

facebook-github-bot · 2025-07-31T19:56:08Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: Pull Request resolved: pytorch#159404 When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Differential Revision: D79146886

eellison

Nice generalization but one comment.

Also, there's nothing preventing us from padding dynamic strides. If you look at the lowerings.py sdpa we pad to the inputs to dynamic aligned strides.

The only thing to think abt is the heuristic in the lowering to not padding stride threshold.

eellison · 2025-08-01T15:19:09Z

torch/_inductor/ir.py

        if not all(
            isinstance(s, (int, sympy.Integer))
-            for s in itertools.chain(in_strides, size)
+            for s in itertools.chain(in_strides, size[1:])


The stride that is not dynamic even when the the corresponding dimension is dynamic will be the least dense dimension, not necessarily the last dimension.

It's only the last dimension when tensor is contiguous.

Would it be sufficient to just check strides here 🤔

Is there a good way to determine which dim is the outermost dim in the shape? Perhaps...

What about checking if isinstance(s, (int, sympy.Integer)) to see if the strides are static, then picking the least one? A good test case would be add -> transpose fusion.

To generalize this to symbolic strides, I have a hunch we could sort them by V.graph.sizevars.statically_known_leq. For example, if min(s0,s1)>=1 then 1 <= s0 <= s0 * s1. But if we want a strict ordering, we may need to assume min(s0,s1)>1, or substitute all the symbols with 2. (This makes a pretty strong assumption about what expressions strides can come from.) If this is false, then the strides will end up being equal anyways, so it may not matter which one we pick.

Do we even need to check the shapes here ? If we check the strides, wouldn't that be encompass the above check ?

Yup I think you are right we just need to check strides. I can take a look at spda in follow up. We probably should generalize the padding to all dimensions having dynamic shapes if possible.

@eellison @nandesuka I took another look at the existing code. I'm wondering if this comment is up to date. It seems like it calls get_stride_order to determine the least stride, and this uses size hints under the hood when it sees dynamic shapes. So static shapes may be too strict a requirement, although we might still require backend symints.

I'm wondering if the existing logic can generalize to dynamic shapes/strides, if we relax checks like

if stride > config.padding_stride_threshold and stride % align != 0

to something like this?

if V.graph.sizevars.statically_known_geq(stride, config.padding_stride_threshold)

Similarly dynamic strides could be computed with sympy:

stride = ceildiv(stride, align) * align

becomes

stride = CeilDiv(stride, align) * align

That being said, the dynamic stride formulae could be pretty complex. So maybe it's more practical to only pad when

isinstance(stride, (int, sympy.Integer))

Alternatively, if we want to fully support dynamic shaped padding, there's a trick which could simplify the formulae: the least stride is always 1, and the second least stride is CeilDiv(stride, align) * align. Then all the other strides have stride[k] = shape[k - 1] * stride[k - 1]. So it seems like we really only need to have a complex formula for the 2nd least stride. Does that seem right?

For this pr - would it make sense to pad only static strides, then do the dynamic shapes in a follow ?

Makes sense to me. The static inner stride case comes up a lot in practice with things like dynamic batch size.

test/inductor/test_padding.py

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Differential Revision: D79146886

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-08T16:09:33Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-08T21:21:47Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: Pull Request resolved: pytorch#159404 When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-08T21:21:55Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

Summary: When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-11T21:01:56Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: Pull Request resolved: pytorch#159404 When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-11T21:02:05Z

This pull request was exported from Phabricator. Differential Revision: D79146886

facebook-github-bot · 2025-08-12T17:35:55Z

This pull request was exported from Phabricator. Differential Revision: D79146886

) Summary: Pull Request resolved: pytorch#159404 When the shape of the output tensor has a dynamic outer most dim, the stride can still be padded to conform to configured alignment if specified by padding config. Test Plan: CI Rollback Plan: Reviewed By: blaine-rister, eellison Differential Revision: D79146886

facebook-github-bot · 2025-08-12T17:36:07Z

This pull request was exported from Phabricator. Differential Revision: D79146886

facebook-github-bot · 2025-08-13T00:04:58Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-08-13T00:06:50Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

nandesuka · 2025-08-13T01:25:43Z

@pytorchbot label "release notes: inductor"

nandesuka · 2025-08-13T01:26:06Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T01:27:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2025-08-23T07:15:33Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2025-08-23T07:17:20Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2025-08-23T07:17:34Z

@nandesuka your PR has been successfully reverted.

nandesuka · 2025-08-25T16:04:00Z

Abandoning in favour of: #160997

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 29, 2025

facebook-github-bot added the fb-exported label Jul 29, 2025

nandesuka requested review from ColinPeppler, blaine-rister, eellison and shunting314 July 29, 2025 19:26

nandesuka force-pushed the export-D79146886 branch from 9c66df6 to 7d7ba40 Compare July 31, 2025 13:08

nandesuka force-pushed the export-D79146886 branch from 7d7ba40 to 950ded0 Compare July 31, 2025 17:52

nandesuka changed the title ~~Enable output padding when only outter most dim is dynamic~~ Enable output padding when only outermost dim is dynamic Jul 31, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 31, 2025

nandesuka force-pushed the export-D79146886 branch from 950ded0 to d39e10a Compare July 31, 2025 19:52

nandesuka force-pushed the export-D79146886 branch from d39e10a to 1b8e223 Compare July 31, 2025 19:53

nandesuka force-pushed the export-D79146886 branch from 1b8e223 to 539c474 Compare July 31, 2025 19:56

eellison reviewed Aug 1, 2025

View reviewed changes

blaine-rister reviewed Aug 1, 2025

View reviewed changes

test/inductor/test_padding.py Show resolved Hide resolved

nandesuka force-pushed the export-D79146886 branch from 539c474 to 6cc4985 Compare August 1, 2025 21:22

nandesuka force-pushed the export-D79146886 branch from c4e8773 to da5ef22 Compare August 8, 2025 16:09

nandesuka force-pushed the export-D79146886 branch from da5ef22 to 63a2cc6 Compare August 8, 2025 21:17

nandesuka force-pushed the export-D79146886 branch from 63a2cc6 to ff38ac9 Compare August 8, 2025 21:21

nandesuka force-pushed the export-D79146886 branch from ff38ac9 to d28c92b Compare August 8, 2025 23:52

nandesuka force-pushed the export-D79146886 branch from d28c92b to 40729d5 Compare August 11, 2025 20:57

nandesuka force-pushed the export-D79146886 branch from 40729d5 to e09ee2d Compare August 11, 2025 21:02

Conversation

nandesuka commented Jul 29, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159404

❌ 26 New Failures, 2 Unrelated Failures

Uh oh!

facebook-github-bot commented Jul 29, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

nandesuka Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

nandesuka Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaine-rister Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

facebook-github-bot commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge failed

Uh oh!

nandesuka commented Aug 13, 2025

Uh oh!

nandesuka commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge started

Uh oh!

facebook-github-bot commented Aug 23, 2025

nandesuka commented Jul 29, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading

blaine-rister Aug 4, 2025 •

edited

Loading

blaine-rister Aug 5, 2025 •

edited

Loading

eellison Aug 5, 2025 •

edited

Loading