Normalize placeholder names in AOTAutogradCache by jamesjwu · Pull Request #157916 · pytorch/pytorch

jamesjwu · 2025-07-09T14:13:12Z

Stack from ghstack (oldest at bottom):

-> Normalize placeholder names in AOTAutogradCache #157916

This PR adds a pass to sanitize_gm_for_cache which normalizes all placeholder names across input dynamo graphs to AOTAutogradCache. This is safe because nothing underneath AOTAutograd uses the node names on the
original dynamo graph: AOTAutograd re-traces with its own nodes, and guards are
in terms of original sources rather than placeholder names.

Note that the dynamo output graphs traced by tlparse will not show this change because it's done before this sanitization step. The aot autograd outputs also will not change because AOTAutograd's own traced graphs don't use the original placeholders of the dynamo graph. Thus, this change is essentially a no-op from everyone's perspective except for cache key checks.

Fixes #157792

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-07-09T14:13:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157916

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d13df66 with merge base 1f57e0e ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: a0d83af Pull-Request: #157916

[ghstack-poisoned]

jamesjwu · 2025-07-09T16:28:10Z

torch/_functorch/_aot_autograd/autograd_cache.py



+@contextlib.contextmanager
+def normalize_placeholder_names(gm: torch.fx.GraphModule):


FWIW idk if this is the best way to implement this function: I had to hack in a few extra changes to each node like changing the target and used_names in the namespace so that we properly reconstruct the dynamo graph each time.

I wish it was easier than this, but I can't think of an easier way that isn't considerably slower, i.e. copying the entire graph. Tests seem to pass with this approach, though.

[ghstack-poisoned]

jamesjwu · 2025-07-09T16:30:39Z

torch/_functorch/_aot_autograd/autograd_cache.py

+    old_placeholder_names = [
+        str(n.target)
+        for n in gm.graph.nodes
+        if n.op == "placeholder" and n.type != torch.SymInt


@bobrenjc93 I'm pretty sure this isn't the way to check if the graph node is symbolic. Is there a preferred way? I don't want to rename the symbolic placeholders because symbolic hashing already makes sure they're consistent.

[ghstack-poisoned]

jamesjwu · 2025-07-10T00:32:51Z

test/dynamo/test_backward_higher_order_ops.py

                    """\
 class GraphModule(torch.nn.Module):
    def forward(self, L_inputs_ : list, s69: "Sym(s21)", L_sizes_0_: "f32[0, s21]"):
-        l_inputs_ = L_inputs_


cc @anijain2305 , this is what I mean: the current implementation of normalize_placeholder_for_gm() will remove some of these unnecessary local variable name changes. I can't figure out a clean way to preserve these (from what I can tell, unnecessary) name changes

[ghstack-poisoned]

jamesjwu · 2025-07-10T01:36:18Z

I bit the bullet and decided to just make a copy of the graph module instead of the hacky thing I was doing before. This way I can make edits to the new gm without afffecting the old one at all, which greatly simplifies the code. The cost here is an extra copy, but until I see evidence that this is going to affect compile times I think it's fine.

(For low overhead scenarios like precompile, we're not calculating any cache keys anyway)

[ghstack-poisoned]

jamesjwu · 2025-07-10T03:02:59Z

Ok I figured out a cleaner way to do it that does not modify the graph: we just have to be careful about preserving the old namespace when renaming nodes. I think this new context manager is idempotent and also cleaner, while not requiring us to clone the entire graph.

It obviously still needs to do some weird things like preserving the old used_names set, but I think it's much more reasonable now.

[ghstack-poisoned]

ghstack-source-id: d19a56e Pull-Request: #157916

[ghstack-poisoned]

ghstack-source-id: 3e964f1 Pull-Request: #157916

zou3519 · 2025-07-10T13:53:17Z

torch/_functorch/_aot_autograd/autograd_cache.py

+    old_placeholder_names = []
+    old_used_names = copy(gm.graph._graph_namespace._used_names)
+    i = 0
+    for n in gm.graph.find_nodes(op="placeholder", sort=True):


will this noticeably impact perf for models that have a lot of parameters?

If this is too expensive then we are gonna need to do some tradeoff analysis. If vLLM is the only situation that benefits from this, vLLM can just manually do the input name normalization if needed.

I'll kick off a benchmark to confirm, but I feel like the cost of looking for placeholder nodes is relatively cheap.

[ghstack-poisoned]

ghstack-source-id: c922305 Pull-Request: #157916

jamesjwu · 2025-07-13T19:52:53Z

https://hud.pytorch.org/benchmark/huggingface/inductor_dynamic?dashboard=torchinductor&startTime=Sun%2C%2006%20Jul%202025%2019%3A34%3A38%20GMT&stopTime=Sun%2C%2013%20Jul%202025%2019%3A34%3A38%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/jamesjwu/172/head&lCommit=cad3701c0899ba21e53150c5ae5db875868049e7&rBranch=gh/jamesjwu/172/base&rCommit=1308093ce0b0915bac65906f6207fbc8969f97a6

Seems fine on our OSS benchmarks. I will land it with is_fbcode() false for now, so I can test it internally as well. I'm having some trouble with ghimport. For VLLM, they can set that config flag on for now until it's on by default internally.

zou3519 · 2025-07-14T16:05:57Z

torch/_functorch/config.py

 bundled_autograd_cache: bool = False

+# Whether or not to normalize placeholder names in graphs
+# from dynaom in AOTAutogradCache


nit: dynamo

jamesjwu · 2025-07-14T17:37:52Z

@pytorchbot merge

pytorchmergebot · 2025-07-14T17:39:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

fb331d0

[ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Jul 9, 2025

Update

03aa10d

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jul 9, 2025

Normalize placeholder names in AOTAutogradCache

b982ca5

ghstack-source-id: a0d83af Pull-Request: #157916

jamesjwu mentioned this pull request Jul 9, 2025

[standalone_compile] Fix single Tensor outputs from split_module #157803

Closed

jamesjwu added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jul 9, 2025

Update

95adf72

[ghstack-poisoned]

pytorch-bot bot added the module: dynamo label Jul 9, 2025

jamesjwu requested review from oulgen and zou3519 July 9, 2025 16:26

jamesjwu marked this pull request as ready for review July 9, 2025 16:26

jamesjwu requested a review from bdhirsh as a code owner July 9, 2025 16:26

jamesjwu commented Jul 9, 2025

View reviewed changes

Update

5790e61

[ghstack-poisoned]

jamesjwu commented Jul 9, 2025

View reviewed changes

Update

6cecbac

[ghstack-poisoned]

Update

de58308

[ghstack-poisoned]

jamesjwu requested a review from anijain2305 July 10, 2025 00:26

Update

6541f42

[ghstack-poisoned]

pytorch-bot bot added the module: inductor label Jul 10, 2025

jamesjwu commented Jul 10, 2025

View reviewed changes

Update

d95ec4e

[ghstack-poisoned]

Update

966e322

[ghstack-poisoned]

Update

1ab34e3

[ghstack-poisoned]

Update

0eca5f8

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jul 10, 2025

Normalize placeholder names in AOTAutogradCache

e11c101

ghstack-source-id: d19a56e Pull-Request: #157916

Add counters check to test_split_module

3bb7977

[ghstack-poisoned]

Update

cad3701

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jul 10, 2025

Normalize placeholder names in AOTAutogradCache

2dd52a6

ghstack-source-id: 3e964f1 Pull-Request: #157916

zou3519 reviewed Jul 10, 2025

View reviewed changes

Update

4ad6817

[ghstack-poisoned]

Put behind a fbcode flag

d13df66

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jul 13, 2025

Normalize placeholder names in AOTAutogradCache

5cb3f50

ghstack-source-id: c922305 Pull-Request: #157916

jamesjwu requested a review from zou3519 July 13, 2025 19:52

zou3519 approved these changes Jul 14, 2025

View reviewed changes

pytorchmergebot added the merging label Jul 14, 2025

pytorchmergebot closed this in fb462ce Jul 14, 2025

pytorchmergebot added Merged and removed merging labels Jul 14, 2025

github-actions bot deleted the gh/jamesjwu/172/head branch August 14, 2025 02:20



		@contextlib.contextmanager
		def normalize_placeholder_names(gm: torch.fx.GraphModule):

Conversation

jamesjwu commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157916

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

jamesjwu Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Jul 10, 2025

Uh oh!

jamesjwu commented Jul 10, 2025

Uh oh!

zou3519 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Jul 13, 2025

Uh oh!

zou3519 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Jul 14, 2025

Uh oh!

pytorchmergebot commented Jul 14, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamesjwu commented Jul 9, 2025 •

edited

Loading

pytorch-bot bot commented Jul 9, 2025 •

edited

Loading