[1/n]adding torch.distributed.run option to provide destination for event logging#154644
[1/n]adding torch.distributed.run option to provide destination for event logging#154644
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154644
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 43e7862 with merge base 4d57644 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
7a90ce3 to
777306e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Differential Revision: D75183591
777306e to
3a883fe
Compare
3a883fe to
4462825
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Differential Revision: D75183591
5c67c54 to
a6336bd
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Differential Revision: D75183591
a6336bd to
7df7b53
Compare
7df7b53 to
cd14e15
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Differential Revision: D75183591
cd14e15 to
f4640f3
Compare
f4640f3 to
158fc98
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Differential Revision: D75183591
158fc98 to
08657e6
Compare
08657e6 to
1378f98
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
1378f98 to
24efe9f
Compare
24efe9f to
a9a5241
Compare
…vent logging (#154644) Summary: Pull Request resolved: #154644 **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Rollback Plan: Reviewed By: kiukchung Differential Revision: D75183591
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
a9a5241 to
43e7862
Compare
…vent logging (#154644) (#155268) Summary: **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Rollback Plan: Reviewed By: kiukchung Differential Revision: D75183591 Pull Request resolved: #155268 Approved by: https://github.com/d4l3k
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary:
Problem Statement
Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run.
recording events to default destination: https://fburl.com/code/7f9b0993
The default destination is "null".
Solution
adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line.
Test Plan:
https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION
Differential Revision: D75183591
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta