Support AOT Autograd level Caching

### 🚀 The feature, motivation and pitch

`torch.compile` can takes order of seconds to compile a decent size model like Llama2 7B with a `aot-autogra` enabled backend. Note that I only include the `dyanmo` + `aot_autograd` time, this does not include the backend compiler(like inductor) compilation time. It would be ideal if dynamo can cache the `torch.compile` to speed up development time.

We(PyTorch/XLA) are trying to integrate with the `VLLM`. @WoosukKwon reports that in the warm up phase of the `VLLM`, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today so `torch.compile` will keep compiling the model code which slows down the development speed(@WoosukKwon needs to wait for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation but `torch.compile` itself is pretty expensive.

### Alternatives

Reduce `torch.compile` time for a model with only batch dimension changes.

### Additional context

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @yanboliang 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AOT Autograd level Caching #125958

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support AOT Autograd level Caching #125958

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions