[coor-slicing] Add SymInt support for DTensor mesh coordinate computation in PT2 by aorenste · Pull Request #169552 · pytorch/pytorch

aorenste · 2025-12-04T06:29:18Z

This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware.

New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants.
DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime.
Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning.
Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types.
Adds a config to enable compile_on_one_rank

Stack from ghstack (oldest at bottom):

-> [coor-slicing] Add SymInt support for DTensor mesh coordinate computation in PT2 #169552

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

[ghstack-poisoned]

pytorch-bot · 2025-12-04T06:29:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169552

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit f8a63b8 with merge base ccc09a8 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) (gh) (similar failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3.10-clang12 / test (default, 1, 5, lf.linux.4xlarge) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
pull / linux-jammy-py3.10-clang18-asan / test (default, 3, 7, lf.linux.4xlarge) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
pull / linux-jammy-py3.14-clang12 / test (default, 4, 5, lf.linux.4xlarge) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output
trunk / win-vs2022-cpu-py3 / test (default, 3, 4, lf.windows.4xlarge.nonephemeral) (gh) (trunk failure)
test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_remove_unused_output

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 2e9b636 Pull Request resolved: #169552

[ghstack-poisoned]

ghstack-source-id: 62fda46 Pull Request resolved: #169552

[ghstack-poisoned]

ghstack-source-id: f9383e1 Pull Request resolved: #169552

[ghstack-poisoned]

ghstack-source-id: 9fcce86 Pull Request resolved: #169552

[ghstack-poisoned]

ghstack-source-id: e2829f8 Pull Request resolved: #169552

[ghstack-poisoned]

ghstack-source-id: 173e6e5 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: f8bef2f Pull Request resolved: #169552

ezyang · 2026-01-08T15:49:36Z

Is it difficult to have tests at this stage for the PR stack? I feel you now have enough kit for tests

ezyang

I don't consider the unbacked symint stuff blocking, but this is more a question for @laithsakka

aorenste · 2026-01-08T22:44:34Z

Is it difficult to have tests at this stage for the PR stack? I feel you now have enough kit for tests

It's reasonable to ask for tests at this stage. It's a little tricky because we still have the redistribute targets as non-CooR so I have to make sure the tests only rely on slicing - but I'll put something together.

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: b110b06 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: 14647f4 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: cc37d97 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: 0c5a59d Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op device_mesh::_runtime_compute_coordinate_on_dim - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. DeviceMesh.sym_get_coordinate - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. Shard._select_split_tensor - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. Shard.local_shard_size_and_offset - Updated type hints to properly reflect SymInt return types. 5. Dynamo config - Always enables allow_dynamic_output_shape_ops (with TODO to reconsider if still needed). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: 6529012 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op `device_mesh::_runtime_compute_coordinate_on_dim` - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. `DeviceMesh.sym_get_coordinate` - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. `Shard._select_split_tensor` - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. `Shard.local_shard_size_and_offset` - Updated type hints to properly reflect SymInt return types. 5. Adds a config to enable compile_on_one_rank cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: e3574bb Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op `device_mesh::_runtime_compute_coordinate_on_dim` - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. `DeviceMesh.sym_get_coordinate` - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. `Shard._select_split_tensor` - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. `Shard.local_shard_size_and_offset` - Updated type hints to properly reflect SymInt return types. 5. Adds a config to enable compile_on_one_rank cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: 7fc0002 Pull Request resolved: #169552

…ate computation in PT2" This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op `device_mesh::_runtime_compute_coordinate_on_dim` - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. `DeviceMesh.sym_get_coordinate` - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. `Shard._select_split_tensor` - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. `Shard.local_shard_size_and_offset` - Updated type hints to properly reflect SymInt return types. 5. Adds a config to enable compile_on_one_rank cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: e09df08 Pull Request resolved: #169552

aorenste · 2026-01-17T15:17:44Z

@pytorchbot merge

pytorchmergebot · 2026-01-17T15:19:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…tion in PT2 (pytorch#169552) This change enables compile-on-one-rank for DTensor slicing by making mesh coordinate lookups symbolic-aware. 1. New custom op `device_mesh::_runtime_compute_coordinate_on_dim` - An operator that computes mesh coordinates at runtime, allowing coordinate lookups to be deferred during tracing rather than baked in as constants. 2. `DeviceMesh.sym_get_coordinate` - Now uses the custom op when in fake mode (tracing), lifting the rank map as a graph constant and deferring the actual coordinate computation to runtime. 3. `Shard._select_split_tensor` - Extended to handle SymInt indices by using torch.narrow with symbolic start/length instead of list indexing, enabling symbolic tensor partitioning. 4. `Shard.local_shard_size_and_offset` - Updated type hints to properly reflect SymInt return types. 5. Adds a config to enable compile_on_one_rank Pull Request resolved: pytorch#169552 Approved by: https://github.com/ezyang

WIP: [compile-one-rank] slicing

10e1a8b

[ghstack-poisoned]

aorenste mentioned this pull request Dec 4, 2025

Add GroupName NewType #167552

Closed

pytorch-bot Bot added the ciflow/inductor label Dec 4, 2025

This was referenced Dec 4, 2025

typing for local_tensor #169546

Closed

Compile on one rank support: Redistribute targets #167553

Closed

pytorch-bot Bot added the module: dynamo label Dec 4, 2025

aorenste added a commit that referenced this pull request Dec 4, 2025

WIP: [compile-one-rank] slicing

893f513

ghstack-source-id: 2e9b636 Pull Request resolved: #169552

Update on "WIP: [compile-one-rank] slicing"

b5b93b0

[ghstack-poisoned]

aorenste added a commit that referenced this pull request Dec 4, 2025

WIP: [compile-one-rank] slicing

50a7a19

ghstack-source-id: 62fda46 Pull Request resolved: #169552

Update on "WIP: [compile-one-rank] slicing"

1232642

[ghstack-poisoned]

aorenste added a commit that referenced this pull request Dec 5, 2025

WIP: [compile-one-rank] slicing

f333212

ghstack-source-id: f9383e1 Pull Request resolved: #169552

Update on "WIP: [compile-one-rank] slicing"

dd97487

[ghstack-poisoned]

aorenste added a commit that referenced this pull request Dec 8, 2025

WIP: [compile-one-rank] slicing

a6bc137

ghstack-source-id: 9fcce86 Pull Request resolved: #169552

Update on "WIP: [compile-one-rank] slicing"

3cabdac

[ghstack-poisoned]

aorenste added a commit that referenced this pull request Dec 8, 2025

WIP: [compile-one-rank] slicing

5587f34

ghstack-source-id: e2829f8 Pull Request resolved: #169552

Update on "WIP: [compile-one-rank] slicing"

b5ab556

[ghstack-poisoned]

aorenste added a commit that referenced this pull request Dec 10, 2025

WIP: [compile-one-rank] slicing

f167860

ghstack-source-id: 173e6e5 Pull Request resolved: #169552

aorenste changed the title ~~WIP: [compile-one-rank] slicing~~ [coor-slicing] Add dynamic slicing Jan 6, 2026

aorenste changed the title ~~[coor-slicing] Add dynamic slicing~~ [coor-slicing] Add SymInt support for DTensor mesh coordinate computation in PT2 Jan 6, 2026

aorenste added topic: not user facing topic category release notes: distributed (dtensor) release notes category and removed topic: not user facing topic category labels Jan 6, 2026

aorenste added a commit that referenced this pull request Jan 7, 2026

[coor-slicing] slicing

d921aec

ghstack-source-id: f8bef2f Pull Request resolved: #169552

ezyang approved these changes Jan 8, 2026

View reviewed changes

aorenste added a commit that referenced this pull request Jan 8, 2026

[coor-slicing] slicing

9a2aa54

ghstack-source-id: b110b06 Pull Request resolved: #169552

aorenste added a commit that referenced this pull request Jan 9, 2026

[coor-slicing] slicing

6e51065

ghstack-source-id: 14647f4 Pull Request resolved: #169552

aorenste added a commit that referenced this pull request Jan 9, 2026

[coor-slicing] slicing

4d21c9d

ghstack-source-id: cc37d97 Pull Request resolved: #169552

aorenste mentioned this pull request Jan 16, 2026

Fix PendingUnbackedSymbolNotFound for unbacked SymInts created in __torch_dispatch__ #172615

Closed

aorenste added a commit that referenced this pull request Jan 16, 2026

[coor-slicing] slicing

6757ade

ghstack-source-id: 0c5a59d Pull Request resolved: #169552

aorenste added a commit that referenced this pull request Jan 16, 2026

[coor-slicing] slicing

a21a0b0

ghstack-source-id: 6529012 Pull Request resolved: #169552

aorenste added a commit that referenced this pull request Jan 16, 2026

[coor-slicing] slicing

4a399bf

ghstack-source-id: e3574bb Pull Request resolved: #169552

aorenste added a commit that referenced this pull request Jan 16, 2026

[coor-slicing] slicing

a45370d

ghstack-source-id: 7fc0002 Pull Request resolved: #169552

aorenste added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 16, 2026

aorenste added a commit that referenced this pull request Jan 17, 2026

[coor-slicing] slicing

b6dd4ae

ghstack-source-id: e09df08 Pull Request resolved: #169552

pytorchmergebot added the merging label Jan 17, 2026

pytorchmergebot added the Merged label Jan 17, 2026

pytorchmergebot closed this in 57b3e39 Jan 17, 2026

pytorchmergebot removed the merging label Jan 17, 2026

github-actions Bot deleted the gh/aorenste/159/head branch February 17, 2026 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[coor-slicing] Add SymInt support for DTensor mesh coordinate computation in PT2#169552

[coor-slicing] Add SymInt support for DTensor mesh coordinate computation in PT2#169552
aorenste wants to merge 15 commits intogh/aorenste/159/basefrom
gh/aorenste/159/head

aorenste commented Dec 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

ezyang commented Jan 8, 2026

Uh oh!

ezyang left a comment

Uh oh!

aorenste commented Jan 8, 2026

Uh oh!

aorenste commented Jan 17, 2026

Uh oh!

pytorchmergebot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aorenste commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169552

✅ You can merge normally! (9 Unrelated Failures)

Uh oh!

ezyang commented Jan 8, 2026

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

aorenste commented Jan 8, 2026

Uh oh!

aorenste commented Jan 17, 2026

Uh oh!

pytorchmergebot commented Jan 17, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aorenste commented Dec 4, 2025 •

edited

Loading

pytorch-bot Bot commented Dec 4, 2025 •

edited

Loading