Skip to content

[coor-targets] Accept ProcessGroup directly in functional collective op schemas#172795

Closed
aorenste wants to merge 13 commits intogh/aorenste/184/basefrom
gh/aorenste/184/head
Closed

[coor-targets] Accept ProcessGroup directly in functional collective op schemas#172795
aorenste wants to merge 13 commits intogh/aorenste/184/basefrom
gh/aorenste/184/head

Conversation

@aorenste
Copy link
Copy Markdown
Contributor

@aorenste aorenste commented Jan 19, 2026

Change _c10d_functional and _dtensor op schemas from str group_name
to Any group so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime Any by
checking whether the IValue is a string or capsule.

On the Python side, _resolve_group() replaces _resolve_group_name()
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Stack from ghstack (oldest at bottom):

…ring or ProcessGroup inputs

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172795

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 2080c58 with merge base 79546f5 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aorenste added a commit that referenced this pull request Jan 19, 2026
…ring or ProcessGroup inputs

ghstack-source-id: 24ef88b
Pull Request resolved: #172795
…ake both string or ProcessGroup inputs"

[ghstack-poisoned]
…ake both string or ProcessGroup inputs"

[ghstack-poisoned]
@aorenste aorenste added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Jan 22, 2026
@aorenste aorenste changed the title [coor-targets] Change functional collectives so they can take both string or ProcessGroup inputs [coor-targets] Accept ProcessGroup directly in functional collective op schemas Mar 10, 2026
@pytorch-bot pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 10, 2026
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 10, 2026
…op schemas

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

Authored with Claude.

ghstack-source-id: 527fedc
Pull Request resolved: #172795
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 10, 2026
…op schemas

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

Authored with Claude.

ghstack-source-id: 04a2d91
Pull Request resolved: #172795
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 11, 2026
…op schemas

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

Authored with Claude.

ghstack-source-id: b906563
Pull Request resolved: #172795
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 12, 2026
…op schemas

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

Authored with Claude.

ghstack-source-id: 05d9dbf
Pull Request resolved: #172795
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
…ring or ProcessGroup inputs

ghstack-source-id: ae449aa
Pull Request resolved: pytorch/pytorch#172795
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 13, 2026
…op schemas

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

Authored with Claude.

ghstack-source-id: 94fd9b1
Pull Request resolved: #172795
@pytorch-auto-revert
Copy link
Copy Markdown

@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert

This PR is attributed to have caused regression in:

Please investigate and fix the issues.
@claude Can you please read this revert comment, follow the links and read the errors, to then give a brief diagnostics on the cause of the error? If you judge the error to be legitimate reason for a revert, please provide brief guidance on how the author could fix it.

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

Claude finished @pytorch-auto-revert[bot]'s task in 11m 10s —— View job


Diagnosis of Revert for PR #172795

  • Read failing test code and changed files
  • Analyze the code paths affected by the PR
  • Diagnose root cause
  • Provide fix guidance

Failing Tests

All failures are in test/dynamo/test_fake_distributed.py, class TestFakeDistributedP2P:

  • test_compiled_isend_graph
  • test_compiled_fire_and_forget_isend_graph
  • test_compiled_irecv_graph
  • test_compiled_batch_isend_irecv_mixed_graph
  • test_compiled_p2p_interleave_graph

Root Cause

These tests use assertExpectedInline to compare the FX graph output against hardcoded expected strings. All five tests assert that the group argument in the traced graph is the string '0', e.g.:

# From test_compiled_isend_graph (line 213):
tensor: "f32[0]" = torch.ops._c10d_functional.isend(l_tensor_, 1, 0, '0')

The PR changed op schemas from str group_name to Any group across _c10d_functional ops. While the PR description mentions this was done for all functional collectives, the P2P ops (isend, irecv, batch_p2p_ops) were likely also affected.

The critical code path is:

  1. Dynamo intercepts dist.isend()/dist.irecv()/dist.batch_isend_irecv() via CollectiveFunctionRewriteVariable in torch/_dynamo/variables/functions.py:2571-2644
  2. Remapped functions (isend_inplace, irecv_inplace, batch_p2p_ops_inplace) are called during tracing
  3. These functions call _resolve_group_name(group) (line 1777, 1803, 1819) which always returns a string (e.g., '0')
  4. The string is passed to torch.ops._c10d_functional.isend(..., group_name)

The PR introduced _resolve_group() (line 1195) which returns either a ProcessGroup or a string, and the _group_or_group_name() function (line 1831) that also returns either type.

The most likely issue is that the PR changed something in how the P2P compilation path passes the group argument — either:

  1. The P2P ops were switched from _resolve_group_name() to _resolve_group(), which now returns a ProcessGroup object instead of a string '0'. When Dynamo traces this, the FX graph captures the ProcessGroup object differently than a string constant, causing all the assertExpectedInline checks to fail.

  2. The schema change from str to Any caused Dynamo's FX graph printer to represent the group argument differently (e.g., as an IValue rather than a quoted string).

Note: I was unable to access the actual CI logs (the jobs show conclusion: null, likely still running or logs not yet downloadable) or the full PR diff (the PR is closed and fetch commands require additional permissions). However, the diagnosis is high confidence based on code analysis.

Is the Revert Legitimate?

Yes — this is a legitimate revert. The PR broke 5 existing P2P compilation tests that were not updated to match the new graph output format. These tests verify the compiled graph structure for distributed P2P operations, and graph correctness is critical.

How to Fix

The author should:

  1. Update the expected graph strings in the 5 failing tests in test/dynamo/test_fake_distributed.py (lines 206-388) to match the new representation of the group argument after the schema change. If the group argument now appears as e.g. a ProcessGroup capsule reference instead of '0', the expected strings need to reflect that.

  2. Alternatively, if P2P ops should still pass string group names (as suggested by the review comments — the author mentioned P2P changes would come in a follow-up PR), ensure the P2P code paths (isend_inplace, irecv_inplace, batch_p2p_ops_inplace) continue to use _resolve_group_name() which always returns a string, rather than _resolve_group().

  3. Run the tests locally before re-landing:

    python test/dynamo/test_fake_distributed.py -k "TestFakeDistributedP2P"

    This will show the actual vs expected graph output, making it straightforward to update the expected strings.

The cleanest fix is likely option (2): keep P2P ops using _resolve_group_name() → string in this PR, and defer the P2P Any group changes to the follow-up PR as originally planned. This would avoid changing the traced graph representation for P2P ops.


@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@aorenste your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Mar 18, 2026
…lective op schemas (#172795)"

This reverts commit 71352fa.

Reverted #172795 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#172795 (comment)))
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 18, 2026
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
…collective op schemas"

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning 
ProcessGroups into constant strings.




[ghstack-poisoned]
@aorenste
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 19, 2026
ryanzhang22 pushed a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
…op schemas (pytorch#172795)

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Pull Request resolved: pytorch#172795
Approved by: https://github.com/angelayi
ryanzhang22 pushed a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…op schemas (pytorch#172795)

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Pull Request resolved: pytorch#172795
Approved by: https://github.com/angelayi
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…lective op schemas (pytorch#172795)"

This reverts commit 71352fa.

Reverted pytorch#172795 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#172795 (comment)))
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…op schemas (pytorch#172795)

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Pull Request resolved: pytorch#172795
Approved by: https://github.com/angelayi
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
…op schemas (pytorch#172795)

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Pull Request resolved: pytorch#172795
Approved by: https://github.com/angelayi
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
…lective op schemas (pytorch#172795)"

This reverts commit 71352fa.

Reverted pytorch#172795 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#172795 (comment)))
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
…op schemas (pytorch#172795)

Change _c10d_functional and _dtensor op schemas from `str group_name`
to `Any group` so callers can pass either a string name (resolved via
the group registry) or a ProcessGroup object directly, avoiding an
unnecessary registry round-trip.

Each C++ collective gets a ProcessGroup-typed overload (the real
implementation) and a string-typed overload (thin wrapper that resolves
and delegates). New dispatch functions handle the runtime `Any` by
checking whether the IValue is a string or capsule.

On the Python side, `_resolve_group()` replaces `_resolve_group_name()`
and returns ProcessGroup objects when possible. The AOTInductor proxy
executor gains AnyType argument handling.

This is intended to be used by compile-on-one-rank to avoid turning
ProcessGroups into constant strings.

Pull Request resolved: pytorch#172795
Approved by: https://github.com/angelayi
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (c10d) release notes category release notes: distributed (dtensor) release notes category release notes: inductor (aoti) Reverted suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants