[dynamo] Add subgraph reuse for invoke_subgraph by anijain2305 · Pull Request #176478 · pytorch/pytorch

anijain2305 · 2026-03-04T19:44:31Z

Stack from ghstack (oldest at bottom):

Add an auto-caching mechanism for invoke_subgraph that avoids
re-tracing subgraphs when the same function is called again with
compatible inputs. On the first trace, a fingerprint (input tags,
tensor metadata, guards, traced_sources) is saved. On subsequent
calls, the cache is checked and if a match is found, the subgraph is
stamped out directly without re-tracing.

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. [ghstack-poisoned]

pytorch-bot · 2026-03-04T19:44:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176478

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures, 41 Pending, 2 Unrelated Failures

As of commit e1e0b20 with merge base b633f26 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-pyrefly-partial / linux-job (gh)
>>> Lint for torch/_dynamo/variables/higher_order_ops.py:
pull / linux-jammy-py3.10-clang15 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.10-clang15 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCPU::test_hops_in_bwd_invoke_subgraph_simple_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.10-clang18-asan / test (default, 4, 7, lf.linux.4xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.10-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCPU::test_hops_in_bwd_invoke_subgraph_simple_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.14-clang15 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCPU::test_hops_in_bwd_invoke_subgraph_simple_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.14t-clang15 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward
pull / linux-jammy-py3.14t-clang15 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-jammy-py3.10-gcc11 / test (default, 4, 5, lf.linux.2xlarge) (gh) (detected as infra flaky with no log or failing log classifier)
pull / linux-jammy-py3.14t-clang15 / test (default, 2, 5, lf.linux.4xlarge) (gh) (detected as infra flaky with no log or failing log classifier)

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-04T19:44:38Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: ab426b5 Pull Request resolved: #176478

anijain2305 · 2026-03-04T19:54:48Z

torch/_dynamo/variables/higher_order_ops.py

+    It is possible that a subgraph is morally reusable but does not fall
+    into the limited support that Dynamo has today. Current limitations:
+      - The subgraph must not have side effects.
+      - No variable accessed by the subgraph may have been mutated.


No sourceful variable accessed by the subgraph is mutated. Add a line for why this is important

anijain2305 · 2026-03-04T19:55:32Z

torch/_dynamo/variables/higher_order_ops.py

+      - The subgraph must not have side effects.
+      - No variable accessed by the subgraph may have been mutated.
+      - Output must be a single tensor, or a tuple/list of plain tensors.
+      - All flattened inputs must be one of: tensor, symnode, constant,


For sourceless or flattened inputs, we rely on the pytree_spec and tags to do the checking. So only acceptable types are supported.

anijain2305 · 2026-03-04T20:00:00Z

torch/_dynamo/variables/higher_order_ops.py

+    example_value: Any,
+    condition: "InvokeSubgraphReuseCondition",
+) -> None:
+    from torch._guards import InvokeSubgraphCache, InvokeSubgraphReuseEntry


Add docstring here.

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: ba3a556 Pull Request resolved: #176478

anijain2305 · 2026-03-04T21:42:54Z

torch/_dynamo/variables/higher_order_ops.py

+    Two-phase check:
+    (1) Verify that intermediates (tensor metadata, symnode types, constant
+        values) match the cached input_checks — these are lightweight
+        structural comparisons that don't require source resolution.


(2) Check for mutation on the remapped vars.

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: 6c67152 Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: 11b2b17 Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: 6fe8db0 Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: 2fd6f4c Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: b24ed5d Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: 73b24bb Pull Request resolved: #176478

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

Add an auto-caching mechanism for invoke_subgraph that avoids re-tracing subgraphs when the same function is called again with compatible inputs. On the first trace, a fingerprint (input tags, tensor metadata, guards, traced_sources) is saved. On subsequent calls, the cache is checked and if a match is found, the subgraph is stamped out directly without re-tracing. Authored with Claude. ghstack-source-id: d14e3e9 Pull Request resolved: #176478

anijain2305 requested review from aorenste and zou3519 as code owners March 4, 2026 19:44

anijain2305 mentioned this pull request Mar 4, 2026

[dynamo] Fix is_from_source to match at intermediate chain levels #176452

Closed

This was referenced Mar 4, 2026

[dynamo] Track side effect stack on SubgraphTracer #176453

Closed

[dynamo] Record traced_sources on SubgraphTracer #176459

Closed

[dynamo] Track mutated_sources on SideEffects for precise mutation detection #176477

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Mar 4, 2026

anijain2305 mentioned this pull request Mar 4, 2026

[dynamo] Add inline_invoke_subgraph post-tracing pass #176082

Closed

anijain2305 mentioned this pull request Mar 4, 2026

[dynamo] Add GUARD_VALUE_DISPATCH table for guard evaluation #176033

Closed

anijain2305 commented Mar 4, 2026

View reviewed changes

anijain2305 mentioned this pull request Mar 5, 2026

[fx] Fix quadratic name generation in _NamespaceBase.create_name #176515

Closed

anijain2305 closed this Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dynamo] Add subgraph reuse for invoke_subgraph#176478

[dynamo] Add subgraph reuse for invoke_subgraph#176478
anijain2305 wants to merge 9 commits intogh/anijain2305/1061/basefrom
gh/anijain2305/1061/head

anijain2305 commented Mar 4, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 4, 2026

Uh oh!

anijain2305 Mar 4, 2026

Uh oh!

anijain2305 Mar 4, 2026

Uh oh!

anijain2305 Mar 4, 2026

Uh oh!

anijain2305 Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anijain2305 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176478

❌ 11 New Failures, 41 Pending, 2 Unrelated Failures

Uh oh!

pytorch-bot bot commented Mar 4, 2026

This PR needs a release notes: label

Uh oh!

anijain2305 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

anijain2305 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

anijain2305 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

anijain2305 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anijain2305 commented Mar 4, 2026 •

edited

Loading

pytorch-bot bot commented Mar 4, 2026 •

edited

Loading

This PR needs a `release notes:` label