[dynamo] Add FakeProcessGroup support for fx_graph_runnable with distributed collectives by skarjala · Pull Request #157162 · pytorch/pytorch

skarjala · 2025-06-27T21:40:29Z

Stack from ghstack (oldest at bottom):

Summary:

Modified generate_compiler_repro_string() to automatically detect distributed operations and inject FakeProcessGroup setup code
Added distributed collective tests in test/dynamo/test_fx_graph_runnable.py using FakeProcessGroup API to test distributed collective operations
Generated fx_graph_runnable code now runs successfully standalone when containing distributed operations

os.environ['TORCHINDUCTOR_CACHE_DIR'] = '/var/folders/fd/kcv8m1kn0lqgxz42wvgr46sc0000gn/T/torchinductor_skarjala'

import torch
from torch import tensor, device
import torch.fx as fx
from torch._dynamo.testing import rand_strided
from math import inf
import torch._inductor.inductor_prims
import torch.distributed as dist
from torch.testing._internal.distributed.fake_pg import FakeStore

import torch._dynamo.config
import torch._inductor.config
import torch._functorch.config
import torch.fx.experimental._config


torch._functorch.config.functionalize_rng_ops = False
torch._functorch.config.fake_tensor_allow_unsafe_data_ptr_access = True
torch._functorch.config.unlift_effect_tokens = True



isolate_fails_code_str = None




# torch version: 2.9.0a0+gitf23d314
# torch cuda version: None
# torch git version: f23d31463ca452918e23063409a2bdc55efc0d46


# torch.cuda.is_available()==False, no GPU info collected

from torch.nn import *
class Repro(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

    
    
    def forward(self, arg0_1):
        all_reduce = torch.ops._c10d_functional.all_reduce.default(arg0_1, 'sum', '0')
        wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_reduce);  all_reduce = None
        mul = torch.ops.aten.mul.Tensor(wait_tensor, 2)
        copy_ = torch.ops.aten.copy_.default(arg0_1, wait_tensor);  arg0_1 = wait_tensor = copy_ = None
        return (mul,)
        
def load_args(reader):
    buf0 = reader.storage(None, 64)
    reader.tensor(buf0, (4, 4), is_leaf=True)  # arg0_1
load_args._version = 0
mod = Repro()
if __name__ == '__main__':
    from torch._dynamo.repro.after_aot import run_repro
    # Initialize FakeProcessGroup for distributed operations
    store = FakeStore()
    dist.init_process_group(
        backend="fake",
        rank=0,
        world_size=2,
        store=store
    )
    with torch.no_grad():
        run_repro(mod, load_args, accuracy=False, command='run', save_dir=None, tracing_mode='real', check_str=None)
        # To run it separately, do 
        # mod, args = run_repro(mod, load_args, accuracy=False, command='get_args', save_dir=None, tracing_mode='real', check_str=None)
        # mod(*args)
    dist.destroy_process_group()

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

[ghstack-poisoned]

pytorch-bot · 2025-06-27T21:40:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157162

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 Cancelled Jobs

As of commit 544ce40 with merge base 178fe7a ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh)
##[error]The operation was canceled.
pull / before-test / target-determination (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-cuda12.8-py3.10-gcc11-build-distributed / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3-clang12-executorch / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3.9-clang12 / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ollectives ghstack-source-id: e7138b8 Pull-Request: #157162

[ghstack-poisoned]

…ollectives ghstack-source-id: 91319f9 Pull-Request: #157162

torch/_dynamo/repro/after_aot.py

test/dynamo/test_fx_graph_runnable.py

[ghstack-poisoned]

…ollectives ghstack-source-id: de80d7a Pull-Request: #157162

[ghstack-poisoned]

…ollectives ghstack-source-id: 8d2a31b Pull-Request: #157162

[ghstack-poisoned]

…ollectives ghstack-source-id: e915609 Pull-Request: #157162

test/dynamo/test_fx_graph_runnable.py

xmfan · 2025-07-08T23:55:30Z

test/dynamo/test_fx_graph_runnable.py

+from torch.distributed._tensor import DeviceMesh, DTensor, Replicate, Shard
 from torch.testing._internal.common_utils import IS_FBCODE, IS_SANDCASTLE
+from torch.testing._internal.distributed.fake_pg import FakeStore


There will probably be some failed tests, the distributed imports other than torch.distributed must be gated under a check:

if torch.distributed.is_available(): from torch.distributed._tensor import DeviceMesh, DTensor, Replicate, Shard from torch.testing._internal.distributed.fake_pg import FakeStore

xmfan · 2025-07-08T23:58:47Z

torch/_dynamo/repro/after_aot.py

+        fd.write(
+            "import torch.distributed as dist\n"
+            "from torch.testing._internal.distributed.fake_pg import FakeStore\n"
+        )


could you include a generated fx graph runnable into the PR summary? let's make sure the imports still stay at the top of the file, with others

xmfan · 2025-07-08T23:59:36Z

torch/_dynamo/repro/after_aot.py

+
+    # Add distributed cleanup after run_repro
+    if has_distributed_ops:
+        fd.write("dist.destroy_process_group()\n")


Also would like to double check with the generated fx graph runnable file, there's no identation for this line?

[ghstack-poisoned]

xmfan · 2025-07-09T15:38:14Z

torch/_dynamo/repro/after_aot.py

+    # Add distributed cleanup after run_repro if needed
    if has_distributed_ops:
-        fd.write("dist.destroy_process_group()\n")
+        fd.write("    \n    dist.destroy_process_group()\n")


nit: take a look at the codegen'd fx graph runnable:

with torch.no_grad(): run_repro(mod, load_args, accuracy=False, command='run', save_dir=None, tracing_mode='real', check_str=None) dist.destroy_process_group() # To run it separately, do # mod, args = run_repro(mod, load_args, accuracy=False, command='get_args', save_dir=None, tracing_mode='real', check_str=None) # mod(*args)

See how the comment below the run_repro is indented? the original intent is likely for people to uncomment those lines if they need them. But now with your added destroy process group, uncommenting those lines would error. Those lines need to run under the no_grad context, so I'd recommend you to move the destroy process group after those comment lines

xmfan · 2025-07-09T15:39:51Z

test/dynamo/test_fx_graph_runnable.py

+    from torch.distributed._tensor import DeviceMesh, DTensor, Replicate, Shard
+    from torch.testing._internal.distributed.fake_pg import FakeStore
+else:
+    # Define dummy classes if distributed is not available


Tests using these classes should be skipped when distributed is not available

test/dynamo/test_fx_graph_runnable.py

[ghstack-poisoned]

test/dynamo/test_fx_graph_runnable.py

[ghstack-poisoned]

xmfan · 2025-07-10T00:06:40Z

torch/_dynamo/repro/after_aot.py

+    )
+
+    fd.write(


this seems unnecessary

[ghstack-poisoned]

skarjala · 2025-07-10T15:47:10Z

@pytorchbot merge -i

pytorchmergebot · 2025-07-10T15:49:09Z

Merge started

Your change will be merged while ignoring the following 5 checks: pull / linux-jammy-py3.9-clang12 / build, pull / linux-jammy-cuda12.8-py3.10-gcc11-build-distributed / build, pull / linux-jammy-py3-clang12-executorch / build, pull / before-test / target-determination, inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…le (#157594) Pull Request resolved: #157594 Approved by: https://github.com/xmfan ghstack dependencies: #157162

Update

0d42f4a

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo release notes: fx release notes category labels Jun 27, 2025

skarjala added a commit that referenced this pull request Jun 27, 2025

Add FakeProcessGroup support for fx_graph_runnable with distributed c…

33de6b5

…ollectives ghstack-source-id: e7138b8 Pull-Request: #157162

skarjala mentioned this pull request Jun 27, 2025

[dynamo] Fix dynamic shapes handling in after_aot repro generation #157136

Closed

skarjala changed the title ~~Add FakeProcessGroup support for fx_graph_runnable with distributed collectives~~ [dynamo] Add FakeProcessGroup support for fx_graph_runnable with distributed collectives Jun 27, 2025

skarjala requested review from StrongerXi, bdhirsh and xmfan June 27, 2025 21:43

Update

3c3aab2

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Jul 1, 2025

Add FakeProcessGroup support for fx_graph_runnable with distributed c…

012be40

…ollectives ghstack-source-id: 91319f9 Pull-Request: #157162

skarjala mentioned this pull request Jul 1, 2025

[dynamo] Add fx_graph_runnable test coverage #157021

Closed

bdhirsh reviewed Jul 1, 2025

View reviewed changes

torch/_dynamo/repro/after_aot.py Outdated Show resolved Hide resolved

bdhirsh reviewed Jul 1, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Outdated Show resolved Hide resolved

bdhirsh reviewed Jul 1, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Outdated Show resolved Hide resolved

skarjala requested a review from bdhirsh July 1, 2025 22:32

Update

265cc26

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Jul 2, 2025

Add FakeProcessGroup support for fx_graph_runnable with distributed c…

a3cdacb

…ollectives ghstack-source-id: de80d7a Pull-Request: #157162

Update

5119136

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Jul 3, 2025

Add FakeProcessGroup support for fx_graph_runnable with distributed c…

3b5d88a

…ollectives ghstack-source-id: 8d2a31b Pull-Request: #157162

skarjala marked this pull request as draft July 3, 2025 18:58

Update

223a33b

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Jul 3, 2025

Add FakeProcessGroup support for fx_graph_runnable with distributed c…

39ebece

…ollectives ghstack-source-id: e915609 Pull-Request: #157162

skarjala marked this pull request as ready for review July 3, 2025 21:40

xmfan reviewed Jul 3, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Show resolved Hide resolved

skarjala mentioned this pull request Jul 3, 2025

[dynamo] Move skipIf decorator to class level in test_fx_graph_runnable #157594

Closed

skarjala requested a review from xmfan July 3, 2025 22:35

xmfan reviewed Jul 7, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Show resolved Hide resolved

skarjala marked this pull request as draft July 8, 2025 22:10

skarjala marked this pull request as ready for review July 8, 2025 22:26

xmfan reviewed Jul 8, 2025

View reviewed changes

Update

d1c5e2d

[ghstack-poisoned]

xmfan reviewed Jul 9, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Show resolved Hide resolved

skarjala mentioned this pull request Jul 9, 2025

Fix PR3 feedback #157934

Closed

skarjala added 4 commits July 9, 2025 10:02

Update

b741da6

[ghstack-poisoned]

Update

83371cb

[ghstack-poisoned]

Update

0dd16db

[ghstack-poisoned]

Update

c1c78ed

[ghstack-poisoned]

xmfan reviewed Jul 9, 2025

View reviewed changes

test/dynamo/test_fx_graph_runnable.py Show resolved Hide resolved

Update

2061239

[ghstack-poisoned]

xmfan reviewed Jul 10, 2025

View reviewed changes

Update

544ce40

[ghstack-poisoned]

xmfan approved these changes Jul 10, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 10, 2025

pytorchmergebot added the merging label Jul 10, 2025

pytorchmergebot added the Merged label Jul 10, 2025

pytorchmergebot closed this in 76ca23c Jul 10, 2025

pytorchmergebot removed the merging label Jul 10, 2025

pytorchmergebot pushed a commit that referenced this pull request Jul 22, 2025

[dynamo] Move skipIf decorator to class level in test_fx_graph_runnab…

e44e05f

…le (#157594) Pull Request resolved: #157594 Approved by: https://github.com/xmfan ghstack dependencies: #157162

github-actions bot deleted the gh/skarjala/10/head branch August 10, 2025 02:20

Conversation

skarjala commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157162

❌ 5 Cancelled Jobs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skarjala commented Jul 10, 2025

Uh oh!

pytorchmergebot commented Jul 10, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

skarjala commented Jun 27, 2025 •

edited

Loading

pytorch-bot bot commented Jun 27, 2025 •

edited

Loading