[FSDP][5/N] Add manual "wrapping" support for `fully_shard` by awgu · Pull Request #90874 · pytorch/pytorch

awgu · 2022-12-14T22:48:07Z

Stack from ghstack:

[FSDP][7/N] Support replicate in fully_shard #91044 [FSDP][7/N] Support replicate in fully_shard
[FSDP][6/N] Add note explaining idioms for _FSDPState traversal #90959 [FSDP][6/N] Add note explaining idioms for _FSDPState traversal
[FSDP][5/N] Add manual "wrapping" support for fully_shard #90874 [FSDP][5/N] Add manual "wrapping" support for fully_shard
[FSDP][4/N] Refactor func to share state/init handle attrs #90871 [FSDP][4/N] Refactor func to share state/init handle attrs
fully_shard load state_dict #90945 fully_shard load state_dict

This PR adds manual "wrapping" support for fully_shard. For example, for

fully_shard(mod.sub)
fully_shard(mod)

mod.sub and mod will share the same FSDP data structures.

To have parity with wrapper FSDP, this PR only checks support for when each manual application of fully_shard passes policy=None. Hybrid auto / manual wrapping is not in scope for this PR since it is not supported for wrapper FSDP either. I can follow up to either add support properly or raise and error early.

[ghstack-poisoned]

pytorch-bot · 2022-12-14T22:48:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90874

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1924517:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This is not ready for review. [ghstack-poisoned]

ghstack-source-id: 912d0a3 Pull Request resolved: #90874

This is not ready for review. [ghstack-poisoned]

ghstack-source-id: 331ef4c Pull Request resolved: #90874

This is not ready for review. [ghstack-poisoned]

ghstack-source-id: d82cd76 Pull Request resolved: #90874

This is not ready for review. [ghstack-poisoned]

mrshenli · 2022-12-16T22:00:26Z

torch/distributed/fsdp/_exec_order_utils.py

            flat_param = handle.flat_param
-            if flat_param not in self.flat_param_to_prefixed_param_names:
+            if flat_param not in self.param_to_fqn:
                continue


when is a handle invalid?

I traced through the history of my PRs, and it looks like this check was added arbitrarily. However, reasoning about it now, I feel like this check is important for execution to not crash for use_orig_params=True since self.param_to_fqn is constructed via _get_param_to_fqns(root_module) and for use_orig_params=True, _get_param_to_fqns does not include any FlatParameters since they are not registered.

I think _exec_order_utils.py needs to be revisited for use_orig_params=True.

mrshenli · 2022-12-16T22:06:06Z

torch/distributed/fsdp/_runtime_utils.py


-    states = [state] if _is_composable(state) else _get_fsdp_states(state)
-    for state in states:
+    for state in _get_fsdp_states(module):


hmm, is this intentional to abuse the var name state?

and on line 887, is it a bug? I assume your intention is to set it on the original state var instead of the one used in the loop?

Oh man. Thanks for the great catch. Let me fix this.

This PR adds manual "wrapping" support for `fully_shard`. For example, for ``` fully_shard(mod.sub) fully_shard(mod) ``` `mod.sub` and `mod` will share the same FSDP data structures. To have parity with wrapper FSDP, this PR only checks support for when each manual application of `fully_shard` passes `policy=None`. Hybrid auto / manual wrapping is not in scope for this PR since it is not supported for wrapper FSDP either. I can follow up to either add support properly or raise and error early. [ghstack-poisoned]

mrshenli

LGTM!

ghstack-source-id: 4e1e0b4 Pull Request resolved: pytorch#90874

This PR adds manual "wrapping" support for `fully_shard`. For example, for ``` fully_shard(mod.sub) fully_shard(mod) ``` `mod.sub` and `mod` will share the same FSDP data structures. To have parity with wrapper FSDP, this PR only checks support for when each manual application of `fully_shard` passes `policy=None`. Hybrid auto / manual wrapping is not in scope for this PR since it is not supported for wrapper FSDP either. I can follow up to either add support properly or raise and error early. [ghstack-poisoned]

ghstack-source-id: 03633b9 Pull Request resolved: pytorch#90874

This PR adds manual "wrapping" support for `fully_shard`. For example, for ``` fully_shard(mod.sub) fully_shard(mod) ``` `mod.sub` and `mod` will share the same FSDP data structures. To have parity with wrapper FSDP, this PR only checks support for when each manual application of `fully_shard` passes `policy=None`. Hybrid auto / manual wrapping is not in scope for this PR since it is not supported for wrapper FSDP either. I can follow up to either add support properly or raise and error early. [ghstack-poisoned]

facebook-github-bot · 2022-12-20T16:49:37Z

This pull request has been merged in 32fde53.

ghstack-source-id: 97a14bc Pull Request resolved: pytorch#90874

[FSDP][5/N] Add manual "wrapping" support for fully_shard

fe9887d

[ghstack-poisoned]

awgu mentioned this pull request Dec 14, 2022

[FSDP][Easy] ufmt files #90858

Closed

awgu mentioned this pull request Dec 14, 2022

[FSDP][Easy] Use run_subtests for hybrid shard test #90859

Closed

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 14, 2022

awgu added the topic: improvements topic category label Dec 14, 2022

Update on "[FSDP][5/N] Add manual "wrapping" support for fully_shard"

e291cf2

This is not ready for review. [ghstack-poisoned]

awgu pushed a commit that referenced this pull request Dec 15, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

06611ef

ghstack-source-id: 912d0a3 Pull Request resolved: #90874

Update on "[FSDP][5/N] Add manual "wrapping" support for fully_shard"

faaba5e

This is not ready for review. [ghstack-poisoned]

awgu pushed a commit that referenced this pull request Dec 15, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

d304605

ghstack-source-id: 331ef4c Pull Request resolved: #90874

Update on "[FSDP][5/N] Add manual "wrapping" support for fully_shard"

3105ed7

This is not ready for review. [ghstack-poisoned]

Update on "[FSDP][5/N] Add manual "wrapping" support for fully_shard"

2c9559d

This is not ready for review. [ghstack-poisoned]

awgu pushed a commit that referenced this pull request Dec 15, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

77dec40

ghstack-source-id: d82cd76 Pull Request resolved: #90874

Update on "[FSDP][5/N] Add manual "wrapping" support for fully_shard"

c92fd6f

This is not ready for review. [ghstack-poisoned]

mrshenli reviewed Dec 16, 2022

View reviewed changes

awgu mentioned this pull request Dec 16, 2022

[FSDP][7/N] Support replicate in fully_shard #91044

Closed

mrshenli approved these changes Dec 17, 2022

View reviewed changes

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 17, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

b349f2b

ghstack-source-id: 4e1e0b4 Pull Request resolved: pytorch#90874

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 19, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

c710f2b

ghstack-source-id: 03633b9 Pull Request resolved: pytorch#90874

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 20, 2022

pytorchmergebot closed this in 32fde53 Dec 20, 2022

facebook-github-bot added the Merged label Dec 20, 2022

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 20, 2022

[FSDP][5/N] Add manual "wrapping" support for fully_shard

d63a0df

ghstack-source-id: 97a14bc Pull Request resolved: pytorch#90874

This was referenced Dec 20, 2022

[FSDP] Re-support model dtype change after FSDP init #91192

Closed

[FSDP] Test use_orig_params=True, no_sync(), mixed precision #91193

Closed

[FSDP][Easy] Fix context manager syntax #91410

Closed

This was referenced Jan 5, 2023

[FSDP] Do not clean FQNs even for use_orig_params=True #91767

Closed

[PoC][FSDP] Async reduce-scatter #91865

Closed

facebook-github-bot deleted the gh/awgu/279/head branch June 8, 2023 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP][5/N] Add manual "wrapping" support for `fully_shard`#90874

[FSDP][5/N] Add manual "wrapping" support for `fully_shard`#90874
awgu wants to merge 13 commits intogh/awgu/279/basefrom
gh/awgu/279/head

awgu commented Dec 14, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 14, 2022 •

edited

Loading

Uh oh!

mrshenli Dec 16, 2022

Uh oh!

awgu Dec 16, 2022

Uh oh!

mrshenli Dec 16, 2022

Uh oh!

mrshenli Dec 16, 2022

Uh oh!

awgu Dec 16, 2022

Uh oh!

mrshenli left a comment

Uh oh!

facebook-github-bot commented Dec 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

awgu commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90874

✅ No Failures

Uh oh!

mrshenli Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

mrshenli Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

mrshenli Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

awgu commented Dec 14, 2022 •

edited

Loading

pytorch-bot bot commented Dec 14, 2022 •

edited

Loading