[DCP] Avoid in-place update and deepcopy during dudpe#149320
[DCP] Avoid in-place update and deepcopy during dudpe#149320saumishr wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149320
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 35c4f4a with merge base 6c7d841 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
There was a problem hiding this comment.
nit:
| plan_to_item_indexes: list[set[MetadataIndex]] = [set(item.index for item in plan.items) for plan in all_plans] | |
| plan_to_item_indices: list[set[MetadataIndex]] = [set(item.index for item in plan.items) for plan in all_plans] |
4f984ef to
d199cad
Compare
d199cad to
f21e63d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
Summary: Pull Request resolved: pytorch#149320 Avoid in-place update and deepcopy during dudpe. Deepcopy becomes prohibitively expensive with models having a huge number of FQNs. This was manifestd in the Ads 2K experiment as well. Here are the results from the TextRay model in Mitra: #### Control job with deepcopy regression: - https://fburl.com/scuba/dai_modelstore/ac5so0i0 First save ~24.8s Global step latency is ~7-8s https://fburl.com/scuba/pytorch_dcp_logging/kdmwiemk Test job with the new fix to avoid deepcopy: - https://fburl.com/scuba/dai_modelstore/aqi93x3a First save is ~21s - global step latency ~2s https://fburl.com/scuba/pytorch_dcp_logging/w7muzr84 Test Plan: ``` buck test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/distributed/checkpoint:test_planner ``` https://www.internalfb.com/intern/testinfra/testrun/3940649945104822 Reviewed By: MeetVadakkanchery Differential Revision: D71245218
407ae55 to
072a3f8
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
072a3f8 to
bb30ade
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
bb30ade to
008915c
Compare
008915c to
7c3abf3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
7c3abf3 to
621a7ec
Compare
Summary: Pull Request resolved: pytorch#149320 Avoid in-place update and deepcopy during dudpe. Deepcopy becomes prohibitively expensive with models having a huge number of FQNs. This was manifestd in the Ads 2K experiment as well. Here are the results from the TextRay model in Mitra: #### Control job with deepcopy regression: - https://fburl.com/scuba/dai_modelstore/ac5so0i0 First save ~24.8s Global step latency is ~7-8s https://fburl.com/scuba/pytorch_dcp_logging/kdmwiemk Test job with the new fix to avoid deepcopy: - https://fburl.com/scuba/dai_modelstore/aqi93x3a First save is ~21s - global step latency ~2s https://fburl.com/scuba/pytorch_dcp_logging/w7muzr84 Test Plan: ``` buck test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/distributed/checkpoint:test_planner ``` https://www.internalfb.com/intern/testinfra/testrun/3940649945104822 Reviewed By: MeetVadakkanchery Differential Revision: D71245218
|
This pull request was exported from Phabricator. Differential Revision: D71245218 |
621a7ec to
35c4f4a
Compare
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
Merge startedYour change will be merged while ignoring the following 2 checks: pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Avoid in-place update and deepcopy during dudpe. Deepcopy becomes prohibitively expensive with models having a huge number of FQNs. This was manifestd in the Ads 2K experiment as well. Here are the results from the TextRay model in Mitra:
Control job with deepcopy regression:
First save ~24.8s
Global step latency is ~7-8s
Test job with the new fix to avoid deepcopy:
First save is ~21s
global step latency ~2s
Test Plan:
https://www.internalfb.com/intern/testinfra/testrun/3940649945104822
Differential Revision: D71245218
cc @LucasLLC @pradeepfn @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o