Skip to content

[SPMD] Add debug of SPMD for single/multi host#5742

Merged
ManfeiBai merged 53 commits intomasterfrom
newmastercheckcommitoct19
Dec 5, 2023
Merged

[SPMD] Add debug of SPMD for single/multi host#5742
ManfeiBai merged 53 commits intomasterfrom
newmastercheckcommitoct19

Conversation

@ManfeiBai
Copy link
Copy Markdown
Collaborator

@ManfeiBai ManfeiBai commented Oct 26, 2023

Add debug of SPMD for single/multi device for visualize sharded tensors

@ManfeiBai ManfeiBai changed the title [debug] add debug for FSDP for single device [debug] add debug for SPMD for single device Oct 26, 2023
@yeounoh yeounoh self-requested a review October 26, 2023 17:17
@yeounoh yeounoh added the distributed SPMD and other distributed things. label Oct 26, 2023
@ManfeiBai ManfeiBai force-pushed the newmastercheckcommitoct19 branch from ba40a9b to ef482ea Compare October 26, 2023 17:45
@yeounoh
Copy link
Copy Markdown
Contributor

yeounoh commented Oct 27, 2023

Hi @ManfeiBai could you also move the file to torch_xla/experimental/spmd/ you can create a new dir, I will refactor in the upcoming release.

@ManfeiBai
Copy link
Copy Markdown
Collaborator Author

Hi @ManfeiBai could you also move the file to torch_xla/experimental/spmd/ you can create a new dir, I will refactor in the upcoming release.

Thanks, updated to torch_xla/experimental/spmd/ and created torch_xla/experimental/spmd/ too

@ManfeiBai ManfeiBai marked this pull request as ready for review November 8, 2023 19:38
@ManfeiBai ManfeiBai changed the title [debug] add debug for SPMD for single device Add debug of SPMD for single device Nov 8, 2023
@ManfeiBai ManfeiBai marked this pull request as draft November 8, 2023 19:41
Comment thread torch_xla/distributed/spmd/test_debugging.py Outdated
@ManfeiBai ManfeiBai marked this pull request as ready for review November 26, 2023 22:49
Comment thread torch_xla/distributed/spmd/__init__.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread test/spmd/test_spmd_debugging.py Outdated
Comment thread torch_xla/distributed/spmd/debugging.py Outdated
@ManfeiBai ManfeiBai changed the title Add debug of SPMD for single device Add debug of SPMD for single host Nov 28, 2023
Comment thread torch_xla/distributed/spmd/debugging.py Outdated
Comment thread torch_xla/distributed/spmd/debugging.py Outdated
Comment thread torch_xla/distributed/spmd/debugging.py Outdated
Copy link
Copy Markdown
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments, thanks @ManfeiBai

@ManfeiBai ManfeiBai force-pushed the newmastercheckcommitoct19 branch 2 times, most recently from c83a2a5 to 281685b Compare November 29, 2023 00:25
@ManfeiBai ManfeiBai requested a review from yeounoh December 4, 2023 02:18
@ManfeiBai ManfeiBai mentioned this pull request Dec 4, 2023
Comment thread torch_xla/distributed/spmd/__init__.py Outdated
Copy link
Copy Markdown
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport_2.2 distributed SPMD and other distributed things.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants