Use torchrun for dynamo/distributed.py by wconstab · Pull Request #89149 · pytorch/pytorch

wconstab · 2022-11-16T18:40:40Z

Stack from ghstack (oldest at bottom):

Mainly wanted to confirm torchrun works fine with dynamo/ddp,
but it is also a better system than manually launching processes.

Partially addresses issue #1779

New run commands

single process:
python benchmarks/dynamo/distributed.py [args]

multi-gpu (e.g. 2 gpu on one host):
torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args]

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] [ghstack-poisoned]

pytorch-bot · 2022-11-16T18:40:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89149

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a1f8de9:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aazzolini · 2022-11-16T18:59:37Z

benchmarks/dynamo/dist_util.py

-    os.environ["MASTER_PORT"] = "12355"
-    dist.init_process_group("nccl", rank=rank, world_size=world_size)
+    # set defaults in case torchrun isn't used; no idea why the if is needed, but it hangs torchrun otherwise
+    if not os.getenv("MASTER_ADDR"):


why do u need the check here but not for RANK and WORLD_SIZE?

i literally have no idea. I think i shouldn't need the check, but without it torchrun was hanging on the call to init process group. i printed the env strings before/after my setting them and they were apparently the same..

wconstab · 2022-11-16T20:01:58Z

@pytorchbot merge

pytorchmergebot · 2022-11-16T20:03:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

davidberard98 · 2022-11-16T22:17:21Z

benchmarks/dynamo/dist_util.py

+    if not os.getenv("MASTER_ADDR"):
+        os.environ["MASTER_ADDR"] = os.getenv("MASTER_ADDR", "localhost")
+    if not os.getenv("MASTER_PORT"):
+        os.environ["MASTER_PORT"] = os.getenv("MASETER_PORT", "12355")


typo in the os.getenv "MASETER_PORT" ?

Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. Partially addresses issue pytorch#1779 New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] Pull Request resolved: pytorch#89149 Approved by: https://github.com/aazzolini

This was referenced Nov 16, 2022

Enable DDPOptimizer by default in dynamo #88523

Closed

[dynamo] NNModuleVariable traces into call_function #89015

Closed

Update DDP docs for Dynamo/DDPOptimizer #89096

Closed

github-actions bot added ciflow/inductor module: dynamo labels Nov 16, 2022

wconstab added the topic: not user facing topic category label Nov 16, 2022

wconstab requested review from aazzolini and davidberard98 November 16, 2022 18:53

wconstab mentioned this pull request Nov 16, 2022

PyTorch Distributed: next level of tests pytorch/torchdynamo#1779

Closed

aazzolini approved these changes Nov 16, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 16, 2022

davidberard98 reviewed Nov 16, 2022

View reviewed changes

pytorchmergebot added the Merged label Nov 16, 2022

pytorchmergebot closed this in f920bfa Nov 16, 2022

This was referenced Nov 16, 2022

Fix typo in dist_util.py #89167

Closed

Add torchvis support to dist bench #89324

Closed

[don't land] - debug fsdp stuff #89325

Closed

Special-case fsdp wrapped modules to be Unspecialized #89330

Closed

facebook-github-bot deleted the gh/wconstab/39/head branch June 8, 2023 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use torchrun for dynamo/distributed.py#89149

Use torchrun for dynamo/distributed.py#89149
wconstab wants to merge 1 commit intogh/wconstab/39/basefrom
gh/wconstab/39/head

wconstab commented Nov 16, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 16, 2022 •

edited

Loading

Uh oh!

aazzolini Nov 16, 2022

Uh oh!

wconstab Nov 16, 2022

Uh oh!

wconstab commented Nov 16, 2022

Uh oh!

pytorchmergebot commented Nov 16, 2022

Uh oh!

davidberard98 Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wconstab commented Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New run commands

Uh oh!

pytorch-bot bot commented Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89149

✅ No Failures

Uh oh!

aazzolini Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

wconstab Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

wconstab commented Nov 16, 2022

Uh oh!

pytorchmergebot commented Nov 16, 2022

Merge started

Uh oh!

davidberard98 Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wconstab commented Nov 16, 2022 •

edited

Loading

pytorch-bot bot commented Nov 16, 2022 •

edited

Loading