[1/n]adding torch.distributed.run option to provide destination for event logging (#154644)#155268
[1/n]adding torch.distributed.run option to provide destination for event logging (#154644)#155268aschhabra wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155268
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (5 Unrelated Failures)As of commit 23a8be9 with merge base 7ae7c14 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
71e6b66 to
0dd64f5
Compare
…vent logging (pytorch#155268) Summary: **Problem Statement** Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run. *recording events to default destination:* https://fburl.com/code/7f9b0993 The default destination is "null". ***Solution*** adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line. Test Plan: ``` buck test //caffe2/test/distributed:distributed_run_test ``` https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION Rollback Plan: Reviewed By: kiukchung Differential Revision: D75183591
0dd64f5 to
23a8be9
Compare
|
This pull request was exported from Phabricator. Differential Revision: D75183591 |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Problem Statement
Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run.
recording events to default destination: https://fburl.com/code/7f9b0993
The default destination is "null".
Solution
adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line.
Test Plan:
https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION
Rollback Plan:
Reviewed By: kiukchung
Differential Revision: D75183591
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k