Dynamo persistent cache real-time look-up

## 🚀 Feature
As described in https://github.com/pytorch/pytorch/issues/125958, we are integrating with vLLM on TPUs. We see that in the warm up phase of the vLLM, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today so torch.compile will keep compiling the model code which slows down the development speed (waiting for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation but torch.compile itself is pretty expensive.

This feature request pitches to  achieve the similar effect of dynamic shapes by persistent caching and real time look up of the compiled program.

## Details
To do this, in high-level, we need to do the following:
- Turn on the dynamo dynamic shape mode, dynamo will start passing the inputs with dynamic shapes to PyTorch/XLA
- PyTorch/XLA can then try to figure out if this shape is compiled in XLA
- If it is, we can map the different input shape to different compiled binaries

## Open questions
- Does persistent FxGraph caching work with PyTorch/XLA? Details at https://github.com/pytorch/pytorch/issues/125958#issuecomment-2204040977. 
- How can we properly map the different input shape to different compiled binaries?

cc @JackCaoG @WoosukKwon


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamo persistent cache real-time look-up #7614

🚀 Feature

Details

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamo persistent cache real-time look-up #7614

Description

🚀 Feature

Details

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions