[dynamo] Fix torch.compile crash with TorchFunctionMode that has mutable state#177095
[dynamo] Fix torch.compile crash with TorchFunctionMode that has mutable state#177095mlazos wants to merge 6 commits intogh/mlazos/144/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177095
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 8 Unrelated FailuresAs of commit 281adc2 with merge base 1fd1814 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 1fbed6e Pull-Request: #177095
This PR needs a
|
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 1fbed6e Pull-Request: #177095
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: a1db80c Pull-Request: #177095
Lucaskabela
left a comment
There was a problem hiding this comment.
@claude summarize what gaps might exist with this implementation?
|
@claude are the test failures related to this PR? |
|
Claude finished @Lucaskabela's task in 4m 48s —— View job Test Failure Analysis
Neither test failure is related to this PR. Here's the evidence: 1.
2.
Both failures are pre-existing issues on the base branch (no changes to either test file in this PR: Also responding to the inline review comments separately. |
|
@claude why is lowering the context call here safe? Don't we risk missing some compilation with context we need? |
|
Claude finished @Lucaskabela's task in 2m 0s —— View job
Why lowering the context manager to
|
Lucaskabela
left a comment
There was a problem hiding this comment.
LGTM - claude comments make sense, code is tested, and focused
|
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 4 checks: pull / linux-jammy-py3.14-clang15 / test (crossref, 1, 2, lf.linux.2xlarge), pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge), inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: linux-aarch64 / linux-jammy-aarch64-py3.10 / test (default, 3, 3, lf.linux.arm64.m7g.4xlarge), linux-aarch64 / linux-jammy-aarch64-py3.10 / test (default, 3, 3, lf.linux.arm64.m8g.4xlarge) Details for Dev Infra teamRaised by workflow job |
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: c04d8df Pull-Request: #177095
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 3aa4f10 Pull-Request: #177095
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 2, 5, linux.g6.4xlarge.experimental.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 3aa4f10 Pull-Request: #177095
…ble state Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: b14b28d Pull-Request: #177095
|
@pytorchbot merge -f "unrelated failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ble state (pytorch#177095) Fixes pytorch#172088 Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> Pull Request resolved: pytorch#177095 Approved by: https://github.com/Lucaskabela
…has mutable state (pytorch#177095)" This reverts commit a65094f. Reverted pytorch#177095 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#177095 (comment)))
…ble state (pytorch#177095) Fixes pytorch#172088 Previously, `torch_function_mode_stack_state_mgr` only cleared the C-level mode stack during `trace_frame` (via `preserve_global_state`). This meant compilation infrastructure running outside tracing — guard building, global state cleanup — would trigger real `__torch_function__` dispatch, mutating mode state (e.g. incrementing a counter) and causing the compile-time guard verification to fail with "Guard failed on the same frame it was created". This change moves the mode stack save/clear/restore up to `compile_inner` so modes are off the C stack for the entire compilation pipeline. For guard building, modes are temporarily restored so guard expressions can reference them, but `DisableTorchFunction` prevents dispatch during construction. Co-authored-by: Claude <noreply@anthropic.com> Pull Request resolved: pytorch#177095 Approved by: https://github.com/Lucaskabela, https://github.com/williamwen42
Fixes #172088
Stack from ghstack (oldest at bottom):
Previously,
torch_function_mode_stack_state_mgronly cleared the C-levelmode stack during
trace_frame(viapreserve_global_state). This meantcompilation infrastructure running outside tracing — guard building, global
state cleanup — would trigger real
__torch_function__dispatch, mutatingmode state (e.g. incrementing a counter) and causing the compile-time guard
verification to fail with "Guard failed on the same frame it was created".
This change moves the mode stack save/clear/restore up to
compile_innersomodes are off the C stack for the entire compilation pipeline. For guard
building, modes are temporarily restored so guard expressions can reference
them, but
DisableTorchFunctionprevents dispatch during construction.Co-authored-by: Claude noreply@anthropic.com
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo