Skip to content

Revamp SPMD guide#8807

Merged
tengyifei merged 3 commits intomasterfrom
yifeit/document-virtual-mesh
Mar 14, 2025
Merged

Revamp SPMD guide#8807
tengyifei merged 3 commits intomasterfrom
yifeit/document-virtual-mesh

Conversation

@tengyifei
Copy link
Copy Markdown
Collaborator

I was going to document some device ID assignment tricks and then I realized in order to explain that we need to first explain what is a device mesh. The existing documentation is very poor so I'm adding an in-depth guide in this PR.

While going over the APIs used I noticed the API documentation and typing also has some problems so I fixed those while I'm here.

@tengyifei tengyifei force-pushed the yifeit/document-virtual-mesh branch 5 times, most recently from f359bce to 9b50d9e Compare March 7, 2025 09:10
@tengyifei tengyifei requested review from bhavya01 and lsy323 March 7, 2025 18:25
@tengyifei tengyifei marked this pull request as ready for review March 7, 2025 18:40
@tengyifei tengyifei force-pushed the yifeit/document-virtual-mesh branch from 9b50d9e to 4bb3685 Compare March 7, 2025 18:44
Copy link
Copy Markdown
Collaborator

@yaoshiang yaoshiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some optional nits. Great work on cleaning this up!

Comment thread torch_xla/distributed/spmd/xla_sharding.py
Comment thread torch_xla/distributed/spmd/xla_sharding.py Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md
Comment thread torch_xla/distributed/spmd/xla_sharding.py
Comment thread torch_xla/distributed/spmd/xla_sharding.py
Comment thread docs/source/perf/spmd_basic.md
Comment thread docs/source/perf/spmd_basic.md Outdated
@tengyifei tengyifei force-pushed the yifeit/document-virtual-mesh branch from 4bb3685 to 4a6b5ae Compare March 11, 2025 22:40
Copy link
Copy Markdown
Collaborator

@mikegre-google mikegre-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better descriptions. I have few questions and suggestions..

Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread docs/source/perf/spmd_basic.md Outdated
Comment thread torch_xla/distributed/spmd/xla_sharding.py Outdated
Comment thread torch_xla/distributed/spmd/xla_sharding.py Outdated
Comment thread torch_xla/distributed/spmd/xla_sharding.py Outdated
Comment thread torch_xla/distributed/spmd/xla_sharding.py Outdated
Comment thread torch_xla/distributed/spmd/xla_sharding.py
I was going to document some device ID assignment tricks and then I
realized in order to explain that we need to first explain what is a
device mesh. The existing documentation is very poor so I'm adding an
in-depth guide in this PR.

While going over the APIs used I noticed the API documentation and
typing also has some problems so I fixed those while I'm here.
@tengyifei tengyifei force-pushed the yifeit/document-virtual-mesh branch from 4a6b5ae to bcd4129 Compare March 14, 2025 06:56
@tengyifei tengyifei force-pushed the yifeit/document-virtual-mesh branch from bcd4129 to 69ee526 Compare March 14, 2025 07:01
@mikegre-google
Copy link
Copy Markdown
Collaborator

Looks good. Only one further suggestion...

Copy link
Copy Markdown
Collaborator

@mikegre-google mikegre-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@tengyifei tengyifei merged commit 5beb651 into master Mar 14, 2025
@bhavya01
Copy link
Copy Markdown
Collaborator

Thanks for making the change. The reference to JAX guide will be very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants