🚀 Feature
Support torch.distributed.scatter and torch.distributed.gather using the XLA backend. They are currently not implemented.
Motivation
(Copied from a discussion thread) The fastsafetensors library uses torch.distributed to distribute/shuffle weights across GPU devices on the same host when using tensor parallelism. This loader library was introduced into vLLM last month. I'm trying to understand if the same approach can be used to speed up model loading in vLLM, in conjunction with XLA caching
🚀 Feature
Support
torch.distributed.scatterandtorch.distributed.gatherusing the XLA backend. They are currently not implemented.Motivation
(Copied from a discussion thread) The fastsafetensors library uses torch.distributed to distribute/shuffle weights across GPU devices on the same host when using tensor parallelism. This loader library was introduced into vLLM last month. I'm trying to understand if the same approach can be used to speed up model loading in vLLM, in conjunction with XLA caching