Use torchrun for dynamo/distributed.py#89149
Use torchrun for dynamo/distributed.py#89149wconstab wants to merge 1 commit intogh/wconstab/39/basefrom
Conversation
Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89149
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a1f8de9: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| os.environ["MASTER_PORT"] = "12355" | ||
| dist.init_process_group("nccl", rank=rank, world_size=world_size) | ||
| # set defaults in case torchrun isn't used; no idea why the if is needed, but it hangs torchrun otherwise | ||
| if not os.getenv("MASTER_ADDR"): |
There was a problem hiding this comment.
why do u need the check here but not for RANK and WORLD_SIZE?
There was a problem hiding this comment.
i literally have no idea. I think i shouldn't need the check, but without it torchrun was hanging on the call to init process group. i printed the env strings before/after my setting them and they were apparently the same..
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| if not os.getenv("MASTER_ADDR"): | ||
| os.environ["MASTER_ADDR"] = os.getenv("MASTER_ADDR", "localhost") | ||
| if not os.getenv("MASTER_PORT"): | ||
| os.environ["MASTER_PORT"] = os.getenv("MASETER_PORT", "12355") |
There was a problem hiding this comment.
typo in the os.getenv "MASETER_PORT" ?
Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. Partially addresses issue pytorch#1779 New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] Pull Request resolved: pytorch#89149 Approved by: https://github.com/aazzolini
Stack from ghstack (oldest at bottom):
Mainly wanted to confirm torchrun works fine with dynamo/ddp,
but it is also a better system than manually launching processes.
Partially addresses issue #1779
New run commands
single process:
python benchmarks/dynamo/distributed.py [args]
multi-gpu (e.g. 2 gpu on one host):
torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args]
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire