Skip to content

[DTensor][XLA] support XLA backend in distirbute_module API#121355

Closed
yeounoh wants to merge 40 commits intopytorch:mainfrom
yeounoh:xla_distribute_module_auto
Closed

[DTensor][XLA] support XLA backend in distirbute_module API#121355
yeounoh wants to merge 40 commits intopytorch:mainfrom
yeounoh:xla_distribute_module_auto

Conversation

@yeounoh
Copy link
Copy Markdown
Contributor

@yeounoh yeounoh commented Mar 6, 2024

yeounoh and others added 30 commits July 8, 2022 02:10
@yeounoh yeounoh force-pushed the xla_distribute_module_auto branch from eb86cba to 5d0c751 Compare March 7, 2024 17:30
@yeounoh yeounoh self-assigned this Mar 7, 2024
@yeounoh yeounoh force-pushed the xla_distribute_module_auto branch from 5d0c751 to 61416f1 Compare March 7, 2024 17:55
@yeounoh
Copy link
Copy Markdown
Contributor Author

yeounoh commented Mar 7, 2024

This depends on pytorch/xla#6683, @wanchaol should we skip distribute_module test here, and just rely on the downstream test? I remember that you were suggesting it in the refactoring PR. If we skip here, then this doesn't need to be blocked on pytorch/xla#6683.

I am good keeping it as a safeguard here, to ensure that we can call the xla_distribute_module API via distribute_module here.

@yeounoh yeounoh requested a review from a team as a code owner March 7, 2024 18:33
@yeounoh yeounoh requested a review from JackCaoG March 7, 2024 19:38
Comment thread torch/distributed/_tensor/api.py Outdated
@yeounoh yeounoh force-pushed the xla_distribute_module_auto branch from 54676ba to 45a6342 Compare March 7, 2024 23:54
Copy link
Copy Markdown
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for addressing comments!

@yeounoh
Copy link
Copy Markdown
Contributor Author

yeounoh commented Mar 8, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2024
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@yeounoh yeounoh force-pushed the xla_distribute_module_auto branch from 44a5043 to 0923943 Compare March 8, 2024 06:45
@yeounoh
Copy link
Copy Markdown
Contributor Author

yeounoh commented Mar 8, 2024

Not updating the xla pin, since the latest one seems to have some pjrt gpu plugin build issue... I hope the original one in main is intact cc @vanbasten23

@yeounoh
Copy link
Copy Markdown
Contributor Author

yeounoh commented Mar 8, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@yeounoh
Copy link
Copy Markdown
Contributor Author

yeounoh commented Mar 9, 2024

This is also related to pytorch/xla#6322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dtensor distributed tensor tag oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants