Skip to content

[FSDP][4/N] Refactor func to share state/init handle attrs#90871

Closed
awgu wants to merge 13 commits intogh/awgu/276/basefrom
gh/awgu/276/head
Closed

[FSDP][4/N] Refactor func to share state/init handle attrs#90871
awgu wants to merge 13 commits intogh/awgu/276/basefrom
gh/awgu/276/head

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Dec 14, 2022

Stack from ghstack:

For limit_all_gathers, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the bool can be violated. It could be as if none of them set that value to be True.

For use_orig_params, optimizer state dict assumes that the value is the same for all FSDP instances.

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 14, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90871

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 94ed76e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@awgu awgu marked this pull request as ready for review December 15, 2022 22:43
@awgu awgu requested a review from mrshenli as a code owner December 15, 2022 22:43
return state
for attr_name, attr_values in attr_name_to_values.items():
if len(attr_values) != 1:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is for the manual wrapping case, as this will never happen with auto wrap policy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

Comment on lines +52 to +55
"backward_prefetch",
"forward_prefetch",
"_use_orig_params",
"limit_all_gathers",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these needs to be homogeneous? No need to change this, just wonder if there is any hard requirement for this behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I should have written out the explanation to prove it to myself as well. Refreshing my understanding of the implementation, actually backward_prefetch and forward_prefetch do not need to be homogeneous. However, I believe the other two do.

For limit_all_gathers, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the bool can be violated. It could be as if none of them set that value to be True.

For use_orig_params, optimizer state dict assumes that the value is the same for all FSDP instances.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the check for backward_prefetch and forward_prefetch.

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 17, 2022
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`.

For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances.


[ghstack-poisoned]
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`.

For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances.


[ghstack-poisoned]
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`.

For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances.


[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 19, 2022
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`.

For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances.


[ghstack-poisoned]
@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 20, 2022
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in da9af98.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants