Introduce apply_xla_patch_to_nn_linear and test that in a scan by tengyifei · Pull Request #8739 · pytorch/xla

tengyifei · 2025-02-24T22:54:53Z

In order to propagate sharding annotations in 2D sharding, linear layers should be implemented with einsum instead of tranposes/reshapes. Additionally, they need to continue to function inside scan/scan_layers.

For this to work we need three pieces:

I added a apply_xla_patch_to_nn_linear function to replace the implementation of nn.Linear with einsum (calling XLAPatchedLinear).
The XLAPatchedLinear implementation should be wrapped in torch custom ops. That's because AOTAutograd used by scan will decompose all einsums into transposes/reshapes, unless we use @custom_op to mark a function as opaque to AOTAutograd.
Even after wrapping them with @custom_op, the einsum is still decomposed into transposes/reshapes due to
torch.einsum is incorrectly decomposed when wrapped inside a custom op #8713. That's a bug/PyTorch limitation. To workaround this, I added a _xla_einsum C++ function that directly builds an einsum given XLA tensors, skipping over any PyTorch dispatcher complexity.

Added a test that demonstrates how nn.Linear layers by default flattens any non-contracting dims, and how we could avoid that with apply_xla_patch_to_nn_linear.

In order to propagate sharding annotations in 2D sharding, linear layers should be implemented with einsum instead of tranposes/reshapes. Additionally, they need to continue to function inside scan/scan_layers. For this to work we need three pieces: - I added a `apply_xla_patch_to_nn_linear` function to replace the implementation of `nn.Linear` with einsum (calling XLAPatchedLinear). - The XLAPatchedLinear implementation should be wrapped in torch custom ops. That's because AOTAutograd used by scan will decompose all einsums into transposes/reshapes, unless we use `@custom_op` to mark a function as opaque to AOTAutograd. - Even after wrapping them with `@custom_op`, the einsum is still decomposed into transposes/reshapes due to #8713. That's a bug/PyTorch limitation. To workaround this, I added a `_xla_einsum` C++ function that directly builds an einsum given XLA tensors, skipping over any PyTorch dispatcher complexity. Added a test that demonstrates how `nn.Linear` layers by default flattens any non-contracting dims, and how we could avoid that with `apply_xla_patch_to_nn_linear`.

zpcore

LGTM!

pgmoka

LGTM

tengyifei force-pushed the yifeit/workaround-einsum branch from c53acea to 7d834f7 Compare February 24, 2025 22:56

tengyifei changed the title ~~Support einsum layers in a scan~~ Introduce apply_xla_patch_to_nn_linear and test that in a scan Feb 24, 2025

tengyifei marked this pull request as ready for review February 24, 2025 23:22

tengyifei force-pushed the yifeit/workaround-einsum branch from 7d834f7 to 2cf50c8 Compare February 24, 2025 23:48

tengyifei requested review from bhavya01, pgmoka and zpcore February 24, 2025 23:48

zpcore reviewed Feb 25, 2025

View reviewed changes

Comment thread torch_xla/distributed/spmd/xla_sharding.py

zpcore approved these changes Feb 25, 2025

View reviewed changes

tengyifei merged commit 6f020aa into master Feb 25, 2025

pgmoka reviewed Feb 25, 2025

View reviewed changes

pgmoka mentioned this pull request Feb 26, 2025

torch.einsum is incorrectly decomposed when wrapped inside a custom op #8713

Closed

tengyifei mentioned this pull request Mar 3, 2025

Adopt apply_xla_patch_to_nn_linear to replace all nn.Linear ops with einsum AI-Hypercomputer/torchprime#139

Closed

pgmoka mentioned this pull request Mar 4, 2025

Manually register einsum xla #8787

Closed

pgmoka pushed a commit that referenced this pull request Mar 5, 2025

Introduce apply_xla_patch_to_nn_linear and test that in a scan (#8739)

a54b0ed

pgmoka mentioned this pull request Mar 6, 2025

Manually register einsum on xla #8801

Merged

tengyifei mentioned this pull request Jun 17, 2025

Support skipping tracing of selected pure modules AI-Hypercomputer/torchprime#308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce apply_xla_patch_to_nn_linear and test that in a scan#8739

Introduce apply_xla_patch_to_nn_linear and test that in a scan#8739
tengyifei merged 1 commit intomasterfrom
yifeit/workaround-einsum

tengyifei commented Feb 24, 2025

Uh oh!

Uh oh!

zpcore left a comment

Uh oh!

pgmoka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tengyifei commented Feb 24, 2025

Uh oh!

Uh oh!

zpcore left a comment

Choose a reason for hiding this comment

Uh oh!

pgmoka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants