Skip to content

fix eager mode spmd module loading with fsdpv2#7631

Merged
JackCaoG merged 1 commit intomasterfrom
JackCaoG/eager_fsdpv2
Jul 4, 2024
Merged

fix eager mode spmd module loading with fsdpv2#7631
JackCaoG merged 1 commit intomasterfrom
JackCaoG/eager_fsdpv2

Conversation

@JackCaoG
Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG commented Jul 4, 2024

The problem is module.to is always a empty + to_copy. in eager mode empty is a expand and will be evaluated directly. However this evulation will result in a replicated sharding spec. In fsdpv2 code, we incorrectly believe this tensor is intentionally replicated and fsdpv2 won't shard it.

@JackCaoG JackCaoG added distributed SPMD and other distributed things. eager PyTorch/XLA eager-mode labels Jul 4, 2024
@JackCaoG JackCaoG requested a review from alanwaketan July 4, 2024 00:19
Copy link
Copy Markdown
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@JackCaoG JackCaoG merged commit 10a5130 into master Jul 4, 2024
bhavya01 pushed a commit that referenced this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

distributed SPMD and other distributed things. eager PyTorch/XLA eager-mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants