Skip to content

Normalize placeholder names in AOTAutogradCache#157916

Closed
jamesjwu wants to merge 16 commits intogh/jamesjwu/172/basefrom
gh/jamesjwu/172/head
Closed

Normalize placeholder names in AOTAutogradCache#157916
jamesjwu wants to merge 16 commits intogh/jamesjwu/172/basefrom
gh/jamesjwu/172/head

Conversation

@jamesjwu
Copy link
Copy Markdown
Contributor

@jamesjwu jamesjwu commented Jul 9, 2025

Stack from ghstack (oldest at bottom):

This PR adds a pass to sanitize_gm_for_cache which normalizes all placeholder names across input dynamo graphs to AOTAutogradCache. This is safe because nothing underneath AOTAutograd uses the node names on the
original dynamo graph: AOTAutograd re-traces with its own nodes, and guards are
in terms of original sources rather than placeholder names.

Note that the dynamo output graphs traced by tlparse will not show this change because it's done before this sanitization step. The aot autograd outputs also will not change because AOTAutograd's own traced graphs don't use the original placeholders of the dynamo graph. Thus, this change is essentially a no-op from everyone's perspective except for cache key checks.

Fixes #157792

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jul 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157916

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d13df66 with merge base 1f57e0e (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jul 9, 2025
@jamesjwu jamesjwu added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jul 9, 2025
[ghstack-poisoned]
@jamesjwu jamesjwu requested review from oulgen and zou3519 July 9, 2025 16:26
@jamesjwu jamesjwu marked this pull request as ready for review July 9, 2025 16:26
@jamesjwu jamesjwu requested a review from bdhirsh as a code owner July 9, 2025 16:26


@contextlib.contextmanager
def normalize_placeholder_names(gm: torch.fx.GraphModule):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW idk if this is the best way to implement this function: I had to hack in a few extra changes to each node like changing the target and used_names in the namespace so that we properly reconstruct the dynamo graph each time.

I wish it was easier than this, but I can't think of an easier way that isn't considerably slower, i.e. copying the entire graph. Tests seem to pass with this approach, though.

[ghstack-poisoned]
old_placeholder_names = [
str(n.target)
for n in gm.graph.nodes
if n.op == "placeholder" and n.type != torch.SymInt
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobrenjc93 I'm pretty sure this isn't the way to check if the graph node is symbolic. Is there a preferred way? I don't want to rename the symbolic placeholders because symbolic hashing already makes sure they're consistent.

[ghstack-poisoned]
[ghstack-poisoned]
@jamesjwu jamesjwu requested a review from anijain2305 July 10, 2025 00:26
[ghstack-poisoned]
"""\
class GraphModule(torch.nn.Module):
def forward(self, L_inputs_ : list, s69: "Sym(s21)", L_sizes_0_: "f32[0, s21]"):
l_inputs_ = L_inputs_
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @anijain2305 , this is what I mean: the current implementation of normalize_placeholder_for_gm() will remove some of these unnecessary local variable name changes. I can't figure out a clean way to preserve these (from what I can tell, unnecessary) name changes

[ghstack-poisoned]
@jamesjwu
Copy link
Copy Markdown
Contributor Author

I bit the bullet and decided to just make a copy of the graph module instead of the hacky thing I was doing before. This way I can make edits to the new gm without afffecting the old one at all, which greatly simplifies the code. The cost here is an extra copy, but until I see evidence that this is going to affect compile times I think it's fine.

(For low overhead scenarios like precompile, we're not calculating any cache keys anyway)

[ghstack-poisoned]
[ghstack-poisoned]
@jamesjwu
Copy link
Copy Markdown
Contributor Author

Ok I figured out a cleaner way to do it that does not modify the graph: we just have to be careful about preserving the old namespace when renaming nodes. I think this new context manager is idempotent and also cleaner, while not requiring us to clone the entire graph.

It obviously still needs to do some weird things like preserving the old used_names set, but I think it's much more reasonable now.

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jul 10, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jul 10, 2025
old_placeholder_names = []
old_used_names = copy(gm.graph._graph_namespace._used_names)
i = 0
for n in gm.graph.find_nodes(op="placeholder", sort=True):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this noticeably impact perf for models that have a lot of parameters?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is too expensive then we are gonna need to do some tradeoff analysis. If vLLM is the only situation that benefits from this, vLLM can just manually do the input name normalization if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll kick off a benchmark to confirm, but I feel like the cost of looking for placeholder nodes is relatively cheap.

[ghstack-poisoned]
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jul 13, 2025
@jamesjwu
Copy link
Copy Markdown
Contributor Author

@jamesjwu jamesjwu requested a review from zou3519 July 13, 2025 19:52
bundled_autograd_cache: bool = False

# Whether or not to normalize placeholder names in graphs
# from dynaom in AOTAutogradCache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dynamo

@jamesjwu
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/jamesjwu/172/head branch August 14, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants