Skip to content

Support AOT Autograd level Caching #125958

@JackCaoG

Description

@JackCaoG

🚀 The feature, motivation and pitch

torch.compile can takes order of seconds to compile a decent size model like Llama2 7B with a aot-autogra enabled backend. Note that I only include the dyanmo + aot_autograd time, this does not include the backend compiler(like inductor) compilation time. It would be ideal if dynamo can cache the torch.compile to speed up development time.

We(PyTorch/XLA) are trying to integrate with the VLLM. @WoosukKwon reports that in the warm up phase of the VLLM, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today so torch.compile will keep compiling the model code which slows down the development speed(@WoosukKwon needs to wait for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation but torch.compile itself is pretty expensive.

Alternatives

Reduce torch.compile time for a model with only batch dimension changes.

Additional context

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @yanboliang

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startuponcall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions