[FSDP][4/N] Refactor func to share state/init handle attrs#90871
[FSDP][4/N] Refactor func to share state/init handle attrs#90871awgu wants to merge 13 commits intogh/awgu/276/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90871
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 94ed76e: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 86dd8f3 Pull Request resolved: pytorch#90871
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
| return state | ||
| for attr_name, attr_values in attr_name_to_values.items(): | ||
| if len(attr_values) != 1: | ||
| raise ValueError( |
There was a problem hiding this comment.
I assume this is for the manual wrapping case, as this will never happen with auto wrap policy?
| "backward_prefetch", | ||
| "forward_prefetch", | ||
| "_use_orig_params", | ||
| "limit_all_gathers", |
There was a problem hiding this comment.
why these needs to be homogeneous? No need to change this, just wonder if there is any hard requirement for this behavior.
There was a problem hiding this comment.
This is a good point. I should have written out the explanation to prove it to myself as well. Refreshing my understanding of the implementation, actually backward_prefetch and forward_prefetch do not need to be homogeneous. However, I believe the other two do.
For limit_all_gathers, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the bool can be violated. It could be as if none of them set that value to be True.
For use_orig_params, optimizer state dict assumes that the value is the same for all FSDP instances.
There was a problem hiding this comment.
I will remove the check for backward_prefetch and forward_prefetch.
[ghstack-poisoned]
ghstack-source-id: f04773b Pull Request resolved: pytorch#90871
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`. For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances. [ghstack-poisoned]
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`. For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances. [ghstack-poisoned]
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`. For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances. [ghstack-poisoned]
ghstack-source-id: ef02c5d Pull Request resolved: pytorch#90871
For `limit_all_gathers`, if we do not enforce that they all have the same value, then the entire semantics guaranteed by the `bool` can be violated. It could be as if none of them set that value to be `True`. For `use_orig_params`, optimizer state dict assumes that the value is the same for all FSDP instances. [ghstack-poisoned]
|
This pull request has been merged in da9af98. |
ghstack-source-id: 5996506 Pull Request resolved: pytorch#90871
Stack from ghstack:
replicateinfully_shard#91044 [FSDP][7/N] Supportreplicateinfully_shard_FSDPStatetraversal #90959 [FSDP][6/N] Add note explaining idioms for_FSDPStatetraversalfully_shard#90874 [FSDP][5/N] Add manual "wrapping" support forfully_shardFor
limit_all_gathers, if we do not enforce that they all have the same value, then the entire semantics guaranteed by theboolcan be violated. It could be as if none of them set that value to beTrue.For
use_orig_params, optimizer state dict assumes that the value is the same for all FSDP instances.