Skip to content

Add process group documentation for SPMD#6469

Merged
jonb377 merged 1 commit intomasterfrom
jonbolin/pg
Feb 5, 2024
Merged

Add process group documentation for SPMD#6469
jonb377 merged 1 commit intomasterfrom
jonbolin/pg

Conversation

@jonb377
Copy link
Copy Markdown
Collaborator

@jonb377 jonb377 commented Feb 5, 2024

As pointed out in #6465, our documentation is missing discussion of how to initialize the process group in SPMD execution mode.

A process group is required for distributed checkpointing and can be used with various other torch.distributed APIs. In SPMD, we don't allow process groups on the XLA backend, since the compiler is responsible for controlling the on-device collectives.

@jonb377 jonb377 requested a review from yeounoh February 5, 2024 19:10
@jonb377 jonb377 self-assigned this Feb 5, 2024
@yeounoh
Copy link
Copy Markdown
Contributor

yeounoh commented Feb 5, 2024

cc @vanbasten23 , you might have already done, did we add the SPMD + GPU documentqtaion/section as well?

Copy link
Copy Markdown
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@jonb377 jonb377 merged commit 732a1c7 into master Feb 5, 2024
@jonb377 jonb377 deleted the jonbolin/pg branch February 5, 2024 21:22
@vanbasten23
Copy link
Copy Markdown
Collaborator

cc @vanbasten23 , you might have already done, did we add the SPMD + GPU documentqtaion/section as well?

Good call. I haven't done that yet but let me add some tmr.

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants