Normalize placeholder names in AOTAutogradCache#157916
Normalize placeholder names in AOTAutogradCache#157916jamesjwu wants to merge 16 commits intogh/jamesjwu/172/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157916
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit d13df66 with merge base 1f57e0e ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
||
|
|
||
| @contextlib.contextmanager | ||
| def normalize_placeholder_names(gm: torch.fx.GraphModule): |
There was a problem hiding this comment.
FWIW idk if this is the best way to implement this function: I had to hack in a few extra changes to each node like changing the target and used_names in the namespace so that we properly reconstruct the dynamo graph each time.
I wish it was easier than this, but I can't think of an easier way that isn't considerably slower, i.e. copying the entire graph. Tests seem to pass with this approach, though.
| old_placeholder_names = [ | ||
| str(n.target) | ||
| for n in gm.graph.nodes | ||
| if n.op == "placeholder" and n.type != torch.SymInt |
There was a problem hiding this comment.
@bobrenjc93 I'm pretty sure this isn't the way to check if the graph node is symbolic. Is there a preferred way? I don't want to rename the symbolic placeholders because symbolic hashing already makes sure they're consistent.
| """\ | ||
| class GraphModule(torch.nn.Module): | ||
| def forward(self, L_inputs_ : list, s69: "Sym(s21)", L_sizes_0_: "f32[0, s21]"): | ||
| l_inputs_ = L_inputs_ |
There was a problem hiding this comment.
cc @anijain2305 , this is what I mean: the current implementation of normalize_placeholder_for_gm() will remove some of these unnecessary local variable name changes. I can't figure out a clean way to preserve these (from what I can tell, unnecessary) name changes
|
I bit the bullet and decided to just make a copy of the graph module instead of the hacky thing I was doing before. This way I can make edits to the new gm without afffecting the old one at all, which greatly simplifies the code. The cost here is an extra copy, but until I see evidence that this is going to affect compile times I think it's fine. (For low overhead scenarios like precompile, we're not calculating any cache keys anyway) |
|
Ok I figured out a cleaner way to do it that does not modify the graph: we just have to be careful about preserving the old namespace when renaming nodes. I think this new context manager is idempotent and also cleaner, while not requiring us to clone the entire graph. It obviously still needs to do some weird things like preserving the old used_names set, but I think it's much more reasonable now. |
[ghstack-poisoned]
| old_placeholder_names = [] | ||
| old_used_names = copy(gm.graph._graph_namespace._used_names) | ||
| i = 0 | ||
| for n in gm.graph.find_nodes(op="placeholder", sort=True): |
There was a problem hiding this comment.
will this noticeably impact perf for models that have a lot of parameters?
There was a problem hiding this comment.
If this is too expensive then we are gonna need to do some tradeoff analysis. If vLLM is the only situation that benefits from this, vLLM can just manually do the input name normalization if needed.
There was a problem hiding this comment.
I'll kick off a benchmark to confirm, but I feel like the cost of looking for placeholder nodes is relatively cheap.
[ghstack-poisoned]
|
Seems fine on our OSS benchmarks. I will land it with is_fbcode() false for now, so I can test it internally as well. I'm having some trouble with ghimport. For VLLM, they can set that config flag on for now until it's on by default internally. |
| bundled_autograd_cache: bool = False | ||
|
|
||
| # Whether or not to normalize placeholder names in graphs | ||
| # from dynaom in AOTAutogradCache |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
This PR adds a pass to sanitize_gm_for_cache which normalizes all placeholder names across input dynamo graphs to AOTAutogradCache. This is safe because nothing underneath AOTAutograd uses the node names on the
original dynamo graph: AOTAutograd re-traces with its own nodes, and guards are
in terms of original sources rather than placeholder names.
Note that the dynamo output graphs traced by tlparse will not show this change because it's done before this sanitization step. The aot autograd outputs also will not change because AOTAutograd's own traced graphs don't use the original placeholders of the dynamo graph. Thus, this change is essentially a no-op from everyone's perspective except for cache key checks.
Fixes #157792
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov