Skip to content

Implement c10d::reduce_scatter_ (list API) in LocalTensorMode#175710

Open
ezyang wants to merge 5 commits intogh/ezyang/3274/basefrom
gh/ezyang/3274/head
Open

Implement c10d::reduce_scatter_ (list API) in LocalTensorMode#175710
ezyang wants to merge 5 commits intogh/ezyang/3274/basefrom
gh/ezyang/3274/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Feb 25, 2026

Stack from ghstack (oldest at bottom):

The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by dist.reduce_scatter(output, input_list), which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175710

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit d1a08d4 with merge base c15f33d (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Feb 25, 2026
The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.


ghstack-source-id: 7b294c1
Pull-Request: #175710
@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Feb 25, 2026
[ghstack-poisoned]
ezyang added a commit that referenced this pull request Feb 25, 2026
The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.


ghstack-source-id: 4300002
Pull-Request: #175710
@ezyang ezyang requested a review from dzmitry-huba March 3, 2026 15:12
[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 4, 2026
The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.

ghstack-source-id: 8a10383
Pull-Request: #175710
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 4, 2026

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 4, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@albanD albanD removed their request for review March 5, 2026 18:59
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 11, 2026

@claude fix lint and trigger merge

@claude
Copy link
Copy Markdown

claude bot commented Mar 11, 2026

Claude finished @ezyang's task in 2m 44s —— View job


Fixing lint and triggering merge

  • Investigate lint failure — F841: unused variable fake_pg on line 610 of test/distributed/test_local_tensor.py
  • Fix lint issue — removed the unused fake_pg variable (it was assigned but never passed to dist.reduce_scatter)
  • Push fix — committed and pushed to gh/ezyang/3274/head
  • Trigger merge — I'm unable to post comments on the PR to invoke @pytorchbot merge due to permissions. Please comment @pytorchbot merge to trigger the merge once CI passes.

@pytorch-bot pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 11, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 11, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@ezyang ezyang force-pushed the gh/ezyang/3274/head branch from 636e102 to c6435b8 Compare March 12, 2026 14:51
[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 12, 2026
The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.

ghstack-source-id: 7e81d41
Pull-Request: #175710
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 12, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-auto-revert
Copy link
Copy Markdown

@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert

This PR is attributed to have caused regression in:

Please investigate and fix the issues.
@claude Can you please read this revert comment, follow the links and read the errors, to then give a brief diagnostics on the cause of the error? If you judge the error to be legitimate reason for a revert, please provide brief guidance on how the author could fix it.

@claude
Copy link
Copy Markdown

claude bot commented Mar 12, 2026

Claude finished @pytorch-auto-revert[bot]'s task in 1m 1s —— View job


Revert Diagnosis

  • Read CI failure logs from the lint job
  • Diagnose root cause
  • Assess if revert is legitimate and provide fix guidance

Root Cause

The lint failure is RUFF S101 ("Use of assert detected") in torch/distributed/_local_tensor/_c10d.py. The new reduce_scatter_ function uses bare assert statements on lines 619, 620, 628, 629, and 641, which violates the project's lint rules.

Is the revert legitimate?

Yes, this is a legitimate revert. The lint failure is a real issue — the rest of the file consistently uses raise AssertionError(...) instead of bare assert statements. This is the established pattern throughout _c10d.py (there are 40+ instances of raise AssertionError in the file, and zero assert statements prior to this PR).

How to fix

Replace the bare assert statements with explicit raise AssertionError(...), matching the existing code style. Specifically, change:

# Lines 619-620
assert len(output_tensors) == 1
assert len(input_tensors) == 1

to:

if len(output_tensors) != 1:
    raise AssertionError
if len(input_tensors) != 1:
    raise AssertionError
# Line 628
assert isinstance(output_tensor, LocalTensor), "Output tensor must be a LocalTensor"

to:

if not isinstance(output_tensor, LocalTensor):
    raise AssertionError("Output tensor must be a LocalTensor")
# Lines 629-631
assert len(input_list) == len(ranks), (
    f"Number of input chunks ({len(input_list)}) must match number of ranks ({len(ranks)})"
)

to:

if len(input_list) != len(ranks):
    raise AssertionError(
        f"Number of input chunks ({len(input_list)}) must match number of ranks ({len(ranks)})"
    )
# Line 641
assert isinstance(chunk, LocalTensor), "Input chunk must be a LocalTensor"

to:

if not isinstance(chunk, LocalTensor):
    raise AssertionError("Input chunk must be a LocalTensor")

This matches the established pattern used by every other function in the file (e.g., _local_all_reduce_, _local_reduce_scatter_, _local_all_gather_, etc.).


@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Mar 12, 2026
…#175710)"

This reverts commit 90035af.

Reverted #175710 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#175710 (comment)))
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@ezyang your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 12, 2026
[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 12, 2026
The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.

ghstack-source-id: 55246e5
Pull-Request: #175710
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 12, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1)

Details for Dev Infra team Raised by workflow job

EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…h#175710)

The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by `dist.reduce_scatter(output, input_list)`, which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.
Pull Request resolved: pytorch#175710
Approved by: https://github.com/dzmitry-huba
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…pytorch#175710)"

This reverts commit 90035af.

Reverted pytorch#175710 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#175710 (comment)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (c10d) release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants