Skip to content

Allow MpDeviceLoader to shard dictionaries of tensor #8202

Merged
bhavya01 merged 4 commits intomasterfrom
mp_loader_1
Oct 2, 2024
Merged

Allow MpDeviceLoader to shard dictionaries of tensor #8202
bhavya01 merged 4 commits intomasterfrom
mp_loader_1

Conversation

@bhavya01
Copy link
Copy Markdown
Collaborator

@bhavya01 bhavya01 commented Oct 1, 2024

Reland #7912

The new code is only used when input_sharding is a dictionary of ShardingSpec. So, it should be backward compatible. Before this PR, it is expected to be a ShardingSpec

@bhavya01 bhavya01 self-assigned this Oct 1, 2024
@bhavya01 bhavya01 added the tpuci label Oct 1, 2024
@bhavya01 bhavya01 requested a review from JackCaoG October 1, 2024 20:46
@bhavya01
Copy link
Copy Markdown
Collaborator Author

bhavya01 commented Oct 1, 2024

@ManfeiBai: I will cherry pick this into 2.5 release.
@zpcore: This should make SD2 script compatible with future nighty versions.

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Oct 2, 2024

I will take a look today

Comment thread docs/spmd_advanced.md Outdated
Comment thread torch_xla/distributed/parallel_loader.py Outdated
Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to separate the error handling from loading. iPutting an exception into the data_queue makes logic a bit messy IMO. I will approve to unblock, I can do a follow up here.

@bhavya01
Copy link
Copy Markdown
Collaborator Author

bhavya01 commented Oct 2, 2024

The test failure is not related to this PR. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants