fix redistribute() handling for finding flattened device mesh dims under compile by bdhirsh · Pull Request #173873 · pytorch/pytorch

bdhirsh · 2026-01-30T00:29:21Z

Co-authored with claude. I noticed after #172610 that DTensor's new redistribute call that looks for flattened device meshes can crash under torch.compile/tracing. It looks like submesh = mesh[dim_names] will try to construct a fresh DeviceMesh, and ends up calling .item() (full stacktrace of the error below).

I'm not 100% familiar with the DeviceMesh API's, but claude seemed to find an alternative way to "look for an existing flattened device mesh" that didn't need to call .item

Stacktrace:

    output = redistribute_local_tensor(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 1452, in redistribute_local_tensor
    optimized_transform_infos = _optimize_transform_infos(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 475, in _optimize_transform_infos
    flattened, failure_reason = try_create_flattened(group)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 381, in try_create_flattened
    flattened_mesh = _get_flattened_mesh_by_layout(device_mesh, sorted_mesh_dims)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 189, in _get_flattened_mesh_by_layout
    submesh = mesh[dim_names]
              ~~~~^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 669, in __getitem__
    submesh = self._create_sub_mesh(sliced_mesh_layout, mesh_dim_names)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 758, in _create_sub_mesh
    res_submesh = DeviceMesh(
                  ^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 258, in __init__
    if self._layout.numel() != self.mesh.numel():
                               ^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 360, in mesh
    return self._get_mesh_tensor_from_full_mesh(full_mesh)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 349, in _get_mesh_tensor_from_full_mesh
    return full_mesh[my_coords[0, 0]]
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1625, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/_subclasses/functional_tensor.py", line 625, in __torch_dispatch__
    outs_unwrapped = func._op_dk(
                     ^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/utils/_stats.py", line 29, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1756, in __torch_dispatch__
    return proxy_call(self, func, self.pre_dispatch, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1139, in proxy_call
    raise RuntimeError(
torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised:
RuntimeError: It appears that you're trying to get value out of a tracing tensor with aten._local_scalar_dense.default - erroring out! It's likely that this is caused by data-dependent control flow or similar.  It may be possible to trace this with dynamic shapes; try setting tracing_mode='symbolic' in your make_fx call.

Stack from ghstack (oldest at bottom):

-> fix redistribute() handling for finding flattened device mesh dims under compile #173873

…der compile [ghstack-poisoned]

pytorch-bot · 2026-01-30T00:29:24Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…der compile ghstack-source-id: f263abc Pull Request resolved: #173873

pytorch-bot · 2026-01-30T00:29:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173873

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 7 Unrelated Failures

As of commit ee61515 with merge base 969986a ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3.14t-clang15 / test (crossref, 1, 2, linux.2xlarge) (gh)
test/test_serialization.py::TestSerialization::test_serialization_4gb_file

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / inductor-test-cuda13 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
shufflenet_v2_x1_0
inductor / unit-test / inductor-halide-test / test (inductor-halide, 1, 1, linux.12xlarge) (gh) (disabled by #150624 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_halide.py::HalideCpuTests::test_special_polygamma_cpu_halide
inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #139828 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (disabled by #118346 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 2, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (disabled by #118346 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 3, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab · 2026-01-30T00:50:34Z

-    expected_layout = submesh._layout.coalesce()
+    # Compute expected layout WITHOUT creating a submesh (avoids tracing issues)
+    # _get_slice_mesh_layout does pure layout math, no tensor operations
+    sliced_layout = root_mesh._get_slice_mesh_layout(dim_names)


should this be mesh instead of root_mesh? (for the same reason as #173790)

…esh dims under compile" Co-authored with claude. I noticed after #172610 that DTensor's new redistribute call that looks for flattened device meshes can crash under torch.compile/tracing. It looks like `submesh = mesh[dim_names]` will try to construct a fresh DeviceMesh, and ends up calling `.item()` (full stacktrace of the error below). I'm not 100% familiar with the `DeviceMesh` API's, but claude seemed to find an alternative way to "look for an existing flattened device mesh" that didn't need to call `.item` Stacktrace: ``` output = redistribute_local_tensor( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 1452, in redistribute_local_tensor optimized_transform_infos = _optimize_transform_infos( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 475, in _optimize_transform_infos flattened, failure_reason = try_create_flattened(group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 381, in try_create_flattened flattened_mesh = _get_flattened_mesh_by_layout(device_mesh, sorted_mesh_dims) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 189, in _get_flattened_mesh_by_layout submesh = mesh[dim_names] ~~~~^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 669, in __getitem__ submesh = self._create_sub_mesh(sliced_mesh_layout, mesh_dim_names) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 758, in _create_sub_mesh res_submesh = DeviceMesh( ^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 258, in __init__ if self._layout.numel() != self.mesh.numel(): ^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 360, in mesh return self._get_mesh_tensor_from_full_mesh(full_mesh) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 349, in _get_mesh_tensor_from_full_mesh return full_mesh[my_coords[0, 0]] ~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1625, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_subclasses/functional_tensor.py", line 625, in __torch_dispatch__ outs_unwrapped = func._op_dk( ^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/utils/_stats.py", line 29, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1756, in __torch_dispatch__ return proxy_call(self, func, self.pre_dispatch, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1139, in proxy_call raise RuntimeError( torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: RuntimeError: It appears that you're trying to get value out of a tracing tensor with aten._local_scalar_dense.default - erroring out! It's likely that this is caused by data-dependent control flow or similar. It may be possible to trace this with dynamic shapes; try setting tracing_mode='symbolic' in your make_fx call. ``` [ghstack-poisoned]

…der compile ghstack-source-id: 26553fb Pull Request resolved: #173873

wconstab

lgtm. also i am happy to help make an internal diff of this to land asap if this is blocking, i'm not sure it is. thanks for the fix!

fegin

@bdhirsh, @wconstab Does this mean that users are not allowed to create any DeviceMesh during forward/backward even if this DeviceMesh won't create a new PG?

…der compile Summary: internal-first land of #173873 Co-authored with claude. I noticed after #172610 that DTensor's new redistribute call that looks for flattened device meshes can crash under torch.compile/tracing. It looks like submesh = mesh[dim_names] will try to construct a fresh DeviceMesh, and ends up calling .item() (full stacktrace of the error below). I'm not 100% familiar with the DeviceMesh API's, but claude seemed to find an alternative way to "look for an existing flattened device mesh" that didn't need to call .item Stacktrace: output = redistribute_local_tensor( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 1452, in redistribute_local_tensor optimized_transform_infos = _optimize_transform_infos( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 475, in _optimize_transform_infos flattened, failure_reason = try_create_flattened(group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 381, in try_create_flattened flattened_mesh = _get_flattened_mesh_by_layout(device_mesh, sorted_mesh_dims) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 189, in _get_flattened_mesh_by_layout submesh = mesh[dim_names] ~~~~^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 669, in __getitem__ submesh = self._create_sub_mesh(sliced_mesh_layout, mesh_dim_names) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 758, in _create_sub_mesh res_submesh = DeviceMesh( ^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 258, in __init__ if self._layout.numel() != self.mesh.numel(): ^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 360, in mesh return self._get_mesh_tensor_from_full_mesh(full_mesh) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 349, in _get_mesh_tensor_from_full_mesh return full_mesh[my_coords[0, 0]] ~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1625, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_subclasses/functional_tensor.py", line 625, in __torch_dispatch__ outs_unwrapped = func._op_dk( ^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/utils/_stats.py", line 29, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1756, in __torch_dispatch__ return proxy_call(self, func, self.pre_dispatch, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1139, in proxy_call raise RuntimeError( torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: RuntimeError: It appears that you're trying to get value out of a tracing tensor with aten._local_scalar_dense.default - erroring out! It's likely that this is caused by data-dependent control flow or similar. It may be possible to trace this with dynamic shapes; try setting tracing_mode='symbolic' in your make_fx call. Test Plan: python test/distributed/tensor/test_dtensor_compile.py -k test_compile_redistribute_flattened_mesh Differential Revision: D91852906

wconstab · 2026-01-30T04:46:36Z

No, it just means we need to work together to figure this out! I think your front end/backend proposal may help. We'll have to think through which APIs we want traced into the graph, and then figure out how to do it.

wconstab · 2026-01-30T04:47:11Z

@pytorchbot merge

pytorchmergebot · 2026-01-30T04:49:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…der compile (pytorch#173873) Co-authored with claude. I noticed after pytorch#172610 that DTensor's new redistribute call that looks for flattened device meshes can crash under torch.compile/tracing. It looks like `submesh = mesh[dim_names]` will try to construct a fresh DeviceMesh, and ends up calling `.item()` (full stacktrace of the error below). I'm not 100% familiar with the `DeviceMesh` API's, but claude seemed to find an alternative way to "look for an existing flattened device mesh" that didn't need to call `.item` Stacktrace: ``` output = redistribute_local_tensor( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 1452, in redistribute_local_tensor optimized_transform_infos = _optimize_transform_infos( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 475, in _optimize_transform_infos flattened, failure_reason = try_create_flattened(group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 381, in try_create_flattened flattened_mesh = _get_flattened_mesh_by_layout(device_mesh, sorted_mesh_dims) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/tensor/_redistribute.py", line 189, in _get_flattened_mesh_by_layout submesh = mesh[dim_names] ~~~~^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 669, in __getitem__ submesh = self._create_sub_mesh(sliced_mesh_layout, mesh_dim_names) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 758, in _create_sub_mesh res_submesh = DeviceMesh( ^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 258, in __init__ if self._layout.numel() != self.mesh.numel(): ^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 360, in mesh return self._get_mesh_tensor_from_full_mesh(full_mesh) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/distributed/device_mesh.py", line 349, in _get_mesh_tensor_from_full_mesh return full_mesh[my_coords[0, 0]] ~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1625, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_subclasses/functional_tensor.py", line 625, in __torch_dispatch__ outs_unwrapped = func._op_dk( ^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/_dynamo/eval_frame.py", line 1227, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/utils/_stats.py", line 29, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1756, in __torch_dispatch__ return proxy_call(self, func, self.pre_dispatch, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/hirsheybar/new2/pytorch/torch/fx/experimental/proxy_tensor.py", line 1139, in proxy_call raise RuntimeError( torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: RuntimeError: It appears that you're trying to get value out of a tracing tensor with aten._local_scalar_dense.default - erroring out! It's likely that this is caused by data-dependent control flow or similar. It may be possible to trace this with dynamic shapes; try setting tracing_mode='symbolic' in your make_fx call. ``` Pull Request resolved: pytorch#173873 Approved by: https://github.com/wconstab, https://github.com/fegin

facebook-github-bot · 2026-02-03T22:07:31Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2026-02-03T22:09:10Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

… dims under compile (#173873)" This reverts commit 2517bc4. Reverted #173873 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#173873 (comment)))

pytorchmergebot · 2026-02-03T22:09:19Z

@bdhirsh your PR has been successfully reverted.

izaitsevfb · 2026-02-03T22:36:06Z

sorry for the churn, please feel free to rebase and reland

wconstab · 2026-02-06T04:10:51Z

I will take care of this

Summary: Reland of #172610 - includes fixes #173873 (credit bdhirsh) and #173790 (credit IvanKobzarev) Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256

wconstab · 2026-02-09T23:14:23Z

squashed into #174630

Summary: Reland of #172610 - includes fixes #173873 (credit bdhirsh) and #173790 (credit IvanKobzarev) Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256

@bdhirsh

Reland of #172610: same code as previous land except: - includes #173873 (credit @bdhirsh) - includes #173790 (credit @IvanKobzarev) - includes #173436 - adds disable contextmanager + test Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256 Pull Request resolved: #174630 Approved by: https://github.com/zpcore

… dims under compile (pytorch#173873)" This reverts commit 2517bc4. Reverted pytorch#173873 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#173873 (comment)))

fix redistribute() handling for finding flattened device mesh dims un…

2a2b481

…der compile [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Jan 30, 2026

fix redistribute() handling for finding flattened device mesh dims un…

453e15e

…der compile ghstack-source-id: f263abc Pull Request resolved: #173873

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Jan 30, 2026

github-actions Bot requested review from SherlockNoMad, albanD, antoniojkim, ezyang and miladm January 30, 2026 00:29

bdhirsh requested a review from wconstab January 30, 2026 00:32

wconstab reviewed Jan 30, 2026

View reviewed changes

bdhirsh added a commit that referenced this pull request Jan 30, 2026

fix redistribute() handling for finding flattened device mesh dims un…

67d4e2b

…der compile ghstack-source-id: 26553fb Pull Request resolved: #173873

wconstab approved these changes Jan 30, 2026

View reviewed changes

fegin approved these changes Jan 30, 2026

View reviewed changes

bdhirsh mentioned this pull request Jan 30, 2026

fix redistribute() handling for finding flattened device mesh dims under compile #173882

Closed

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 30, 2026

pytorchmergebot added the merging label Jan 30, 2026

pytorchmergebot added the Merged label Jan 30, 2026

pytorchmergebot closed this in 2517bc4 Jan 30, 2026

pytorchmergebot removed the merging label Jan 30, 2026

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Feb 3, 2026

pytorchmergebot reopened this Feb 3, 2026

wconstab mentioned this pull request Feb 9, 2026

[DTensor] Optimize redistribute comms using flattened meshes #174630

Closed

wconstab closed this Feb 9, 2026

github-actions Bot deleted the gh/bdhirsh/699/head branch March 12, 2026 02:22

Conversation

bdhirsh commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 30, 2026

This PR needs a release notes: label

Uh oh!

pytorch-bot Bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173873

❌ 1 New Failure, 7 Unrelated Failures

Uh oh!

wconstab Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jan 30, 2026

Uh oh!

wconstab commented Jan 30, 2026

Uh oh!

pytorchmergebot commented Jan 30, 2026

Merge started

Uh oh!

facebook-github-bot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

izaitsevfb commented Feb 3, 2026

Uh oh!

wconstab commented Feb 6, 2026

Uh oh!

wconstab commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bdhirsh commented Jan 30, 2026 •

edited

Loading

This PR needs a `release notes:` label

pytorch-bot Bot commented Jan 30, 2026 •

edited

Loading

wconstab Jan 30, 2026 •

edited

Loading