Support torch.distributed scatter/gather natively

## 🚀 Feature
Support `torch.distributed.scatter` and `torch.distributed.gather` using the XLA backend. They are currently [not implemented](https://github.com/pytorch/xla/blob/v2.7.0/torch_xla/distributed/xla_backend.py#L232-L236).

## Motivation

(Copied from a discussion thread) The [fastsafetensors](https://github.com/foundation-model-stack/fastsafetensors) library uses torch.distributed to distribute/shuffle weights across GPU devices on the same host when using tensor parallelism. This loader library was introduced into vLLM last month. I'm trying to understand if the same approach can be used to speed up model loading in vLLM, in conjunction with XLA caching



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support torch.distributed scatter/gather natively #9069

🚀 Feature

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support torch.distributed scatter/gather natively #9069

Description

🚀 Feature

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions