Skip to content

Dynamo persistent cache real-time look-up #7614

@wonjoo-wj

Description

@wonjoo-wj

🚀 Feature

As described in pytorch/pytorch#125958, we are integrating with vLLM on TPUs. We see that in the warm up phase of the vLLM, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today so torch.compile will keep compiling the model code which slows down the development speed (waiting for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation but torch.compile itself is pretty expensive.

This feature request pitches to achieve the similar effect of dynamic shapes by persistent caching and real time look up of the compiled program.

Details

To do this, in high-level, we need to do the following:

  • Turn on the dynamo dynamic shape mode, dynamo will start passing the inputs with dynamic shapes to PyTorch/XLA
  • PyTorch/XLA can then try to figure out if this shape is compiled in XLA
  • If it is, we can map the different input shape to different compiled binaries

Open questions

cc @JackCaoG @WoosukKwon

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions