Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed by ezyang · Pull Request #164939 · pytorch/pytorch

ezyang · 2025-10-08T15:14:15Z

Stack from ghstack (oldest at bottom):

-> Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed #164939

This fixes AOTAutograd rms_norm not being bitwise equivalent to
eager, because it avoids a decomposition. You can force the
decomposition by having the decomposition in the dispatch table,
but if eager mode wouldn't have decomposed (because it went to the fused
one), we now default to preserving the fused call by default.

This largely reverts #103275 for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel.

Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name.

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @EikanWang @jgong5 @wenzhe-nrv

[ghstack-poisoned]

pytorch-bot · 2025-10-08T15:14:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164939

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 4 Pending, 2 Unrelated Failures

As of commit 7c9ddc1 with merge base 4f8a986 ():

NEW FAILURE - The following job has failed:

inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
shufflenet_v2_x1_0

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14) (gh) (detected as infra flaky with no log or failing log classifier)
trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m2-15) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…'t have decomposed This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: b35ae76 Pull-Request: #164939

bdhirsh · 2025-10-08T15:17:11Z

c10/core/DispatchKeySet.cpp

-    DispatchKeySet{DispatchKey::NestedTensor} |
-    // Functionalize should always reuse CompositeImplicit decomps.
-    DispatchKeySet{DispatchKey::Functionalize};
+    DispatchKeySet{DispatchKey::NestedTensor};


i could imagine this wobbling some tests

specifically, with the full set of PR changes we can rely on python functionalization decomposing CIA ops. but if you are only running C++ functionalization, we will no longer decompose CIA ops. This might wobble tests?

it might also be a problem if you are running C++ only functionalization and you have a CIA decomp that desugars into mutations

You were indeed right.

bdhirsh

stamp to unblock - sounds good to me if tests pass

[ghstack-poisoned]

…'t have decomposed This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5765a31 Pull-Request: #164939

[ghstack-poisoned]

…'t have decomposed This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5326bc5 Pull-Request: #164939

[ghstack-poisoned]

…'t have decomposed This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 9c16fe8 Pull-Request: #164939

[ghstack-poisoned]

…'t have decomposed This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: e82980e Pull-Request: #164939

pytorchmergebot · 2025-10-10T23:36:33Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

ezyang · 2025-10-11T01:02:01Z

@pytorchbot merge -f "unrelated problems"

pytorchmergebot · 2025-10-11T01:03:37Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…n in AOTDispatcher" I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

…n in AOTDispatcher" I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]

I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]