-
Notifications
You must be signed in to change notification settings - Fork 27.7k
Support AOT Autograd level Caching #125958
Copy link
Copy link
Open
Labels
module: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startupCompilation mechanism or time spent in (re)compilation, tracing, startuponcall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
module: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startupCompilation mechanism or time spent in (re)compilation, tracing, startuponcall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🚀 The feature, motivation and pitch
torch.compilecan takes order of seconds to compile a decent size model like Llama2 7B with aaot-autograenabled backend. Note that I only include thedyanmo+aot_autogradtime, this does not include the backend compiler(like inductor) compilation time. It would be ideal if dynamo can cache thetorch.compileto speed up development time.We(PyTorch/XLA) are trying to integrate with the
VLLM. @WoosukKwon reports that in the warm up phase of theVLLM, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today sotorch.compilewill keep compiling the model code which slows down the development speed(@WoosukKwon needs to wait for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation buttorch.compileitself is pretty expensive.Alternatives
Reduce
torch.compiletime for a model with only batch dimension changes.Additional context
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @yanboliang