Implement c10d::reduce_scatter_ (list API) in LocalTensorMode by ezyang · Pull Request #175710 · pytorch/pytorch

ezyang · 2026-02-25T04:55:05Z

Stack from ghstack (oldest at bottom):

-> Implement c10d::reduce_scatter_ (list API) in LocalTensorMode #175710

The list-based reduce_scatter_ op was the only reduce_scatter variant
missing from LocalTensorMode's dispatch table. This is the variant used
by dist.reduce_scatter(output, input_list), which is the only API that
supports uneven split sizes across ranks.

Authored with Claude.

[ghstack-poisoned]

pytorch-bot · 2026-02-25T04:55:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175710

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit d1a08d4 with merge base c15f33d ():

NEW FAILURE - The following job has failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1) (gh)
test/inductor/test_combo_kernels.py::ComboKernelTests::test_combo_kernel_dynamic_shapes_grid_changes

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-jammy-py3.10-clang15 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh) (disabled by #137892 but the issue was closed recently and a rebase is needed to make it pass)
test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_e2e
trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #106027 but the issue was closed recently and a rebase is needed to make it pass)
test/profiler/test_profiler.py::TestProfiler::test_profiler_cuda_sync_events
trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #107893 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_inductor_freezing.py::FreezingCpuTests::test_conv_with_as_strided_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. ghstack-source-id: 7b294c1 Pull-Request: #175710

[ghstack-poisoned]

The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. ghstack-source-id: 4300002 Pull-Request: #175710

[ghstack-poisoned]

The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. ghstack-source-id: 8a10383 Pull-Request: #175710

ezyang · 2026-03-04T19:57:27Z

@pytorchbot merge

pytorchmergebot · 2026-03-04T19:59:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-04T20:10:28Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang-all / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ezyang · 2026-03-11T21:21:53Z

@claude fix lint and trigger merge

claude · 2026-03-11T21:22:25Z

Claude finished @ezyang's task in 2m 44s —— View job

Fixing lint and triggering merge

Investigate lint failure — F841: unused variable fake_pg on line 610 of test/distributed/test_local_tensor.py
Fix lint issue — removed the unused fake_pg variable (it was assigned but never passed to dist.reduce_scatter)
Push fix — committed and pushed to gh/ezyang/3274/head
Trigger merge — I'm unable to post comments on the PR to invoke @pytorchbot merge due to permissions. Please comment @pytorchbot merge to trigger the merge once CI passes.

linux-foundation-easycla · 2026-03-11T21:25:10Z

The committers listed above are authorized under a signed CLA.

✅ login: ezyang / name: Edward Z. Yang (02c3e0b, 8ed38de, c6435b8, d1a08d4)

[ghstack-poisoned]

The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. ghstack-source-id: 7e81d41 Pull-Request: #175710

ezyang · 2026-03-12T14:54:33Z

@pytorchbot merge

pytorchmergebot · 2026-03-12T15:07:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-auto-revert · 2026-03-12T17:55:45Z

@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert

This PR is attributed to have caused regression in:

Lint: lintrunner-noclang-all / linux-job (hud)

Please investigate and fix the issues.
@claude Can you please read this revert comment, follow the links and read the errors, to then give a brief diagnostics on the cause of the error? If you judge the error to be legitimate reason for a revert, please provide brief guidance on how the author could fix it.

claude · 2026-03-12T17:56:36Z

Claude finished @pytorch-auto-revert[bot]'s task in 1m 1s —— View job

Revert Diagnosis

Read CI failure logs from the lint job
Diagnose root cause
Assess if revert is legitimate and provide fix guidance

Root Cause

The lint failure is RUFF S101 ("Use of assert detected") in torch/distributed/_local_tensor/_c10d.py. The new reduce_scatter_ function uses bare assert statements on lines 619, 620, 628, 629, and 641, which violates the project's lint rules.

Is the revert legitimate?

Yes, this is a legitimate revert. The lint failure is a real issue — the rest of the file consistently uses raise AssertionError(...) instead of bare assert statements. This is the established pattern throughout _c10d.py (there are 40+ instances of raise AssertionError in the file, and zero assert statements prior to this PR).

How to fix

Replace the bare assert statements with explicit raise AssertionError(...), matching the existing code style. Specifically, change:

# Lines 619-620
assert len(output_tensors) == 1
assert len(input_tensors) == 1

to:

if len(output_tensors) != 1:
    raise AssertionError
if len(input_tensors) != 1:
    raise AssertionError

# Line 628
assert isinstance(output_tensor, LocalTensor), "Output tensor must be a LocalTensor"

to:

if not isinstance(output_tensor, LocalTensor):
    raise AssertionError("Output tensor must be a LocalTensor")

# Lines 629-631
assert len(input_list) == len(ranks), (
    f"Number of input chunks ({len(input_list)}) must match number of ranks ({len(ranks)})"
)

to:

if len(input_list) != len(ranks):
    raise AssertionError(
        f"Number of input chunks ({len(input_list)}) must match number of ranks ({len(ranks)})"
    )

# Line 641
assert isinstance(chunk, LocalTensor), "Input chunk must be a LocalTensor"

to:

if not isinstance(chunk, LocalTensor):
    raise AssertionError("Input chunk must be a LocalTensor")

This matches the established pattern used by every other function in the file (e.g., _local_all_reduce_, _local_reduce_scatter_, _local_all_gather_, etc.).

pytorchmergebot · 2026-03-12T17:57:36Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…#175710)" This reverts commit 90035af. Reverted #175710 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#175710 (comment)))

pytorchmergebot · 2026-03-12T17:57:45Z

@ezyang your PR has been successfully reverted.

[ghstack-poisoned]

The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. ghstack-source-id: 55246e5 Pull-Request: #175710

ezyang · 2026-03-12T19:34:53Z

@pytorchbot merge

pytorchmergebot · 2026-03-12T19:37:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-12T21:53:44Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1)

Details for Dev Infra team

Raised by workflow job

…h#175710) The list-based reduce_scatter_ op was the only reduce_scatter variant missing from LocalTensorMode's dispatch table. This is the variant used by `dist.reduce_scatter(output, input_list)`, which is the only API that supports uneven split sizes across ranks. Authored with Claude. Pull Request resolved: pytorch#175710 Approved by: https://github.com/dzmitry-huba

…pytorch#175710)" This reverts commit 90035af. Reverted pytorch#175710 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#175710 (comment)))

Update

be97dd7

[ghstack-poisoned]

ezyang mentioned this pull request Feb 25, 2026

Add _allocation_traceback API to retrieve allocation context for live CUDA pointers #175707

Closed

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Feb 25, 2026

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh and miladm February 25, 2026 04:55

Update

02c3e0b

[ghstack-poisoned]

ezyang requested a review from dzmitry-huba March 3, 2026 15:12

dzmitry-huba approved these changes Mar 3, 2026

View reviewed changes

Update

c6435b8

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 4, 2026

pytorchmergebot added the merging label Mar 4, 2026

pytorchmergebot removed the merging label Mar 4, 2026

albanD removed their request for review March 5, 2026 18:59

pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 11, 2026

ezyang force-pushed the gh/ezyang/3274/head branch from 636e102 to c6435b8 Compare March 12, 2026 14:51

Update

8ed38de

[ghstack-poisoned]

pytorchmergebot added the merging label Mar 12, 2026

pytorchmergebot added the Merged label Mar 12, 2026

pytorchmergebot closed this in 90035af Mar 12, 2026

pytorchmergebot removed the merging label Mar 12, 2026

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 12, 2026

pytorchmergebot reopened this Mar 12, 2026

Update

d1a08d4

[ghstack-poisoned]

pytorchmergebot added the merging label Mar 12, 2026

pytorchmergebot removed the merging label Mar 12, 2026

Conversation

ezyang commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175710

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

ezyang commented Mar 4, 2026

Uh oh!

pytorchmergebot commented Mar 4, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 4, 2026

Merge failed

Uh oh!

ezyang commented Mar 11, 2026

Uh oh!

claude bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixing lint and triggering merge

Uh oh!

linux-foundation-easycla bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Merge started

Uh oh!

pytorch-auto-revert bot commented Mar 12, 2026

Uh oh!

claude bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Revert Diagnosis

Root Cause

Is the revert legitimate?

How to fix

Uh oh!

pytorchmergebot commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Uh oh!

ezyang commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 12, 2026

Merge failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ezyang commented Feb 25, 2026 •

edited

Loading

pytorch-bot bot commented Feb 25, 2026 •

edited

Loading

claude bot commented Mar 11, 2026 •

edited

Loading

linux-foundation-easycla bot commented Mar 11, 2026 •

edited

Loading

claude bot commented Mar 12, 2026 •

edited

Loading