Support calling torch.compile inside non-strict export by tugsbayasgalan · Pull Request #164171 · pytorch/pytorch

tugsbayasgalan · 2025-09-29T19:38:55Z

Stack from ghstack (oldest at bottom):

-> Support calling torch.compile inside non-strict export #164171

So this fixes at least two issues:

When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops.
There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally.

So the things i did for fixing above were:

Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared.
The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked.
torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode.

With above fixes, we are able to export flex attention in export.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

[ghstack-poisoned]

pytorch-bot · 2025-09-29T19:38:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164171

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 85323e8 with merge base bac0f28 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4106c81 Pull Request resolved: #164171

torch/__init__.py

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. THis is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. With above fixes, we are able to export flex attention in export. [ghstack-poisoned]

ghstack-source-id: 720b291 Pull Request resolved: #164171

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. THis is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. With above fixes, we are able to export flex attention in export. [ghstack-poisoned]

ghstack-source-id: 3286821 Pull Request resolved: #164171

tugsbayasgalan · 2025-09-30T14:37:26Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2025-09-30T14:40:02Z

torch/__init__.py

+
+        # Create wrapper that always uses eager backend during export
+        def export_wrapped_fn(*args, **kwargs):
+            with setup_compilation_env(remove_pre_dispatch_tf_mode=False) as backend:


@ydwu4 I need this on to capture vmap ops at pre-dispatch level inside torch.compile region. I feel we also want this for cond as well? But didn't make the change here to avoid behavior difference. Let me know what you think.

I think the correct way of doing it is following how we handle _temp_remove_metadata_torch_function_mode, what it does is that it first pop the mode before dynamo tracing and when dynamo finished tracing and starts to execute the customized backend (i.e. the backend that make_eager_backend_with_torch_function_mode produces), it restores the mode such that the non-strict sees the torch function mode again.

If we do this approach, my mental model would be that 1. vmap will be preserved in dynamo traced graph 2. when dynamo finished compilation, non-strict start to trace the graph, it will see the vmap operations and since we restore the pre_dispath_torch_function mode, we'll be able to trace the vmap operations in non-strict export graph.

So idealy, we 1. remove the remove_pre_dispatch_tf_mode flag, 2. could merge the _temp_remove_metadata_torch_function_mode and _temp_remove_pre_dispatch_tf_mode and create a unified backend that can recovers all modes that have been popped out before dynamo tracing.

The reason why we need to remove pre_dispatch torch function mode for cond is because there are side effects created during tracing the ops (E.g. enter_autocast_nodes was mutated).

Done! I did have to change how the backend is implemented tho. Basically we need to return another callback to actually persist the modes when the gm is executed. Previously it was never actually running the modes.

torch/_higher_order_ops/utils.py

ydwu4 · 2025-09-30T18:07:46Z

torch/__init__.py

+
+        # Create wrapper that always uses eager backend during export
+        def export_wrapped_fn(*args, **kwargs):
+            with setup_compilation_env(remove_pre_dispatch_tf_mode=False) as backend:


I think the correct way of doing it is following how we handle _temp_remove_metadata_torch_function_mode, what it does is that it first pop the mode before dynamo tracing and when dynamo finished tracing and starts to execute the customized backend (i.e. the backend that make_eager_backend_with_torch_function_mode produces), it restores the mode such that the non-strict sees the torch function mode again.

If we do this approach, my mental model would be that 1. vmap will be preserved in dynamo traced graph 2. when dynamo finished compilation, non-strict start to trace the graph, it will see the vmap operations and since we restore the pre_dispath_torch_function mode, we'll be able to trace the vmap operations in non-strict export graph.

So idealy, we 1. remove the remove_pre_dispatch_tf_mode flag, 2. could merge the _temp_remove_metadata_torch_function_mode and _temp_remove_pre_dispatch_tf_mode and create a unified backend that can recovers all modes that have been popped out before dynamo tracing.

The reason why we need to remove pre_dispatch torch function mode for cond is because there are side effects created during tracing the ops (E.g. enter_autocast_nodes was mutated).

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. THis is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. With above fixes, we are able to export flex attention in export. Differential Revision: [D83569143](https://our.internmc.facebook.com/intern/diff/D83569143) [ghstack-poisoned]

ghstack-source-id: 183e383 Pull Request resolved: #164171

tugsbayasgalan · 2025-10-01T02:04:03Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. THis is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. With above fixes, we are able to export flex attention in export. Differential Revision: [D83569143](https://our.internmc.facebook.com/intern/diff/D83569143) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]

ghstack-source-id: 0f59e23 Pull Request resolved: #164171

tugsbayasgalan · 2025-10-01T14:35:10Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2025-10-01T14:36:57Z

test/functorch/test_aotdispatch.py

    ):
        cos: "f32[2, 2]" = torch.ops.aten.cos.default(arg0_1);  arg0_1 = None

-        _set_grad_enabled = torch._C._set_grad_enabled(True);  _set_grad_enabled = None


This is actually pretty tricky. The reason we saw this op in the first place is because the PreDispatchTorchFunction mode is still active while running through dynamo code. As a result, we end up proxy-ing dynamo global state restoration logic. In this new world, we disable tf modes when running through dynamo, so we don't see this anymore. This is fine because export also has its' own global state restoration logic and it just seems wrong to have these in the graph. cc: @ydwu4

ydwu4

Looks good!

ezyang · 2025-10-01T23:10:09Z

OK... so the PR description says what bugs you are fixing... but what exactly are you doing in the PR?

ezyang · 2025-10-01T23:10:26Z

torch/_higher_order_ops/utils.py

+
+    with (
+        _set_compilation_env(),
+        torch._dynamo.utils.disable_cache_limit(),


This is just codemod change.

Hi Ed, is the question that why we're doing a bunch of environment patching here?

The temp remove function mode change is to unblock lazos from the "dynamo inlining torch function mode" work, where hops saw state mutations inside the inlined torch function mode.

What we did is to 1. pop out the mode before dynamo tracing so dynamo captures a graph without torch function mode then 2. create a "patched eager" backend that restores the poped out function modes and execute the dynamo captured graph. In this case, export/aot can still trigger the torch function modes when dispatching operators in the "patched eager" backend. Do you see any problems with this workaround? Any suggestions on how we can improve the situation?

if it's a preexisting issue, I don't have any smart ideas here lol

facebook-github-bot · 2025-10-01T23:31:07Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-10-01T23:32:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-01T23:38:31Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 012887a6d69eeeeae41f85a8b7d0141af6651bba returned non-zero exit code 1

Auto-merging test/export/test_export.py
Auto-merging test/functorch/test_aotdispatch.py
Auto-merging torch/__init__.py
CONFLICT (content): Merge conflict in torch/__init__.py
Auto-merging torch/_higher_order_ops/cond.py
error: could not apply 012887a6d69... Support calling torch.compile inside non-strict export
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

tugsbayasgalan · 2025-10-02T14:23:13Z

OK... so the PR description says what bugs you are fixing... but what exactly are you doing in the PR?

updated!

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]

ghstack-source-id: 69512da Pull Request resolved: #164171

tugsbayasgalan · 2025-10-02T15:28:42Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2025-10-03T16:29:03Z

@pytorchbot merge -f "Landed internally"

pytorchmergebot · 2025-10-03T16:30:48Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. Pull Request resolved: pytorch#164171 Approved by: https://github.com/ydwu4

Support calling torch.compile inside non-strict export

8fc54f7

[ghstack-poisoned]

tugsbayasgalan added a commit that referenced this pull request Sep 29, 2025

Support calling torch.compile inside non-strict export

8923b12

ghstack-source-id: 4106c81 Pull Request resolved: #164171

tugsbayasgalan requested review from ezyang and ydwu4 September 29, 2025 19:45

ydwu4 reviewed Sep 29, 2025

View reviewed changes

torch/__init__.py Outdated Show resolved Hide resolved

tugsbayasgalan requested review from angelayi, avikchaudhuri, zhxchen17 and zou3519 as code owners September 30, 2025 14:32

tugsbayasgalan added a commit that referenced this pull request Sep 30, 2025

Support calling torch.compile inside non-strict export

f0e297c

ghstack-source-id: 720b291 Pull Request resolved: #164171

pytorch-bot bot added the release notes: export label Sep 30, 2025

tugsbayasgalan added a commit that referenced this pull request Sep 30, 2025

Support calling torch.compile inside non-strict export

a8b68e7

ghstack-source-id: 3286821 Pull Request resolved: #164171

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025

tugsbayasgalan requested a review from ydwu4 September 30, 2025 14:38

tugsbayasgalan commented Sep 30, 2025

View reviewed changes

ydwu4 reviewed Sep 30, 2025

View reviewed changes

tugsbayasgalan added a commit that referenced this pull request Oct 1, 2025

Support calling torch.compile inside non-strict export

6acd9af

ghstack-source-id: 183e383 Pull Request resolved: #164171

pytorch-bot bot added ciflow/inductor module: dynamo labels Oct 1, 2025

tugsbayasgalan requested a review from ydwu4 October 1, 2025 02:05

tugsbayasgalan requested a review from Chillee as a code owner October 1, 2025 14:33

tugsbayasgalan added a commit that referenced this pull request Oct 1, 2025

Support calling torch.compile inside non-strict export

012887a

ghstack-source-id: 0f59e23 Pull Request resolved: #164171

tugsbayasgalan commented Oct 1, 2025

View reviewed changes

ydwu4 approved these changes Oct 1, 2025

View reviewed changes

ezyang reviewed Oct 1, 2025

View reviewed changes

pytorchmergebot added the merging label Oct 1, 2025

pytorchmergebot removed the merging label Oct 1, 2025

tugsbayasgalan requested a review from ezyang October 2, 2025 15:21

tugsbayasgalan added a commit that referenced this pull request Oct 2, 2025

Support calling torch.compile inside non-strict export

310f4b5

ghstack-source-id: 69512da Pull Request resolved: #164171

pytorchmergebot added the merging label Oct 3, 2025

pytorchmergebot closed this in 2a11ce2 Oct 3, 2025

pytorchmergebot added Merged and removed merging labels Oct 3, 2025

github-actions bot deleted the gh/tugsbayasgalan/42/head branch November 3, 2025 02:17

Conversation

tugsbayasgalan commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164171

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

tugsbayasgalan commented Sep 30, 2025

Uh oh!

tugsbayasgalan Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ydwu4 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ydwu4 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan commented Oct 1, 2025

Uh oh!

tugsbayasgalan commented Oct 1, 2025

Uh oh!

tugsbayasgalan Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

ydwu4 left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Oct 1, 2025

Uh oh!

ezyang Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

ydwu4 Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 1, 2025

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge failed

Uh oh!

tugsbayasgalan commented Oct 2, 2025

Uh oh!

tugsbayasgalan commented Oct 2, 2025

Uh oh!

tugsbayasgalan commented Oct 3, 2025

Uh oh!

pytorchmergebot commented Oct 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tugsbayasgalan commented Sep 29, 2025 •

edited

Loading

pytorch-bot bot commented Sep 29, 2025 •

edited

Loading

ydwu4 Sep 30, 2025 •

edited

Loading

ydwu4 Sep 30, 2025 •

edited

Loading

ydwu4 Oct 2, 2025 •

edited

Loading