Revert "[DTensor] Refactor strategy/rule registration into dedicated module (#168221)"#170615

Closed

wconstab wants to merge 6 commits intogh/wconstab/479/basefrom

gh/wconstab/479/head

Contributor

wconstab commented Dec 16, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

[ghstack-poisoned]

pytorch-bot Bot commented Dec 16, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/170615

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1f25975 with merge base 1984725 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

da8c4fd

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: d68ea51
Pull Request resolved: #170615

pytorch-bot Bot added ci-no-td ciflow/inductor labels


          Update on "Revert "[DTensor] Refactor strategy/rule registration into…

20ca970

… dedicated module (#168221)""

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

[ghstack-poisoned]

wconstab added a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

d6acb74

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: 8c00738
Pull Request resolved: #170615

wconstab added the release notes: distributed (dtensor) label

wdvr approved these changes

View reviewed changes

malfet approved these changes

View reviewed changes


          Update on "Revert "[DTensor] Refactor strategy/rule registration into…

3ffb3c1

… dedicated module (#168221)""

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

[ghstack-poisoned]

wconstab added a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

6a1fe37

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: 29c372c
Pull Request resolved: #170615


          Update on "Revert "[DTensor] Refactor strategy/rule registration into…

d0b76d3

… dedicated module (#168221)""

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

[ghstack-poisoned]

wconstab added a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

020ac18

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: 97ee16a
Pull Request resolved: #170615

wconstab mentioned this pull request

[DTensor] Single Dim Strategy infra #167677

Closed

Collaborator

pytorchmergebot commented Dec 17, 2025

Starting merge as part of PR stack under #167677

3 similar comments

Collaborator

pytorchmergebot commented Dec 17, 2025

Starting merge as part of PR stack under #167677

Collaborator

pytorchmergebot commented Dec 17, 2025

Starting merge as part of PR stack under #167677

Collaborator

pytorchmergebot commented Dec 17, 2025

Starting merge as part of PR stack under #167677


          Update on "Revert "[DTensor] Refactor strategy/rule registration into…

7cc2baa

… dedicated module (#168221)""

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

[ghstack-poisoned]

Collaborator

pytorchmergebot commented Dec 17, 2025

Starting merge as part of PR stack under #167677

pytorchmergebot closed this in

c65f67b

pytorchmergebot added the Merged label

pytorchmergebot pushed a commit that referenced this pull request


          [DTensor] Single Dim Strategy infra (#167677)

c3e628e

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: #167677
Approved by: https://github.com/weifengpy
ghstack dependencies: #170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

This was referenced Dec 17, 2025

[DTensor] Add single-dim registration infra #170359

Closed

[DTensor] single-dim pointwise strategy #168115

Closed

weifengpy mentioned this pull request

[DTensor] matmul with strided shard input #170716

Closed

Contributor

jeanschmidt commented Dec 18, 2025

@pytorchbot revert -m "Required to revert #170030" -c ghfirst

Contributor

jeanschmidt commented Dec 18, 2025

@wconstab QQ what is the use case for us to land reverts as manual ghstack PRs over using the pytorchbot? Is there a particular workflow we don't fully support with pytorchbot commands?

weifengpy pushed a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

e868374

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: b787988
Pull Request resolved: #170615

pytorchmergebot closed this in

358cb7d

pytorchmergebot pushed a commit that referenced this pull request


          [DTensor] Single Dim Strategy infra (#167677)

392f09b

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: #167677
Approved by: https://github.com/weifengpy
ghstack dependencies: #170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

pytorchmergebot pushed a commit that referenced this pull request


          [DTensor] Add single-dim registration infra (#170359)

468b722

This PR adds the register_single_dim_strategy util,  and hooks it up to sharding_propagator.  It also tests the registration.

Notes:
* I didn't yet decide how multiple registrations should be handled.  I was planning to make it an error if you register twice for the same op for either single_dim or regular strategies.
* I took the cleanest path of integration for now in sharding_prop, reusing as much code as possible with the existing 'op_strategy' case.  I may have to change this later when integrating find_min_cost

Pull Request resolved: #170359
Approved by: https://github.com/weifengpy
ghstack dependencies: #170615, #167677

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          Revert "[DTensor] Add single-dim registration infra (pytorch#170359)"

4f05068

This reverts commit 32d0782.

Reverted pytorch#170359 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#167677 that is required to revert pytorch#170615 that is required to revert pytorch#170030 ([comment](pytorch#170359 (comment)))

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          Revert "[DTensor] Single Dim Strategy infra (pytorch#167677)"

0891b0f

This reverts commit c3e628e.

Reverted pytorch#167677 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#170615 that is required to rever pytorch#170030 ([comment](pytorch#167677 (comment)))

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          Reapply "[DTensor] Refactor strategy/rule registration into dedicated…

3c4de8d

… module (pytorch#168221)" (pytorch#170615)

This reverts commit c65f67b.

Reverted pytorch#170615 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#170030 ([comment](pytorch#170615 (comment)))

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

…module (pytorch#168221)" (pytorch#170615)

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.
Pull Request resolved: pytorch#170615
Approved by: https://github.com/wdvr, https://github.com/malfet

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          [DTensor] Single Dim Strategy infra (pytorch#167677)

db28384

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: pytorch#167677
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          [DTensor] Add single-dim registration infra (pytorch#170359)

This PR adds the register_single_dim_strategy util,  and hooks it up to sharding_propagator.  It also tests the registration.

Notes:
* I didn't yet decide how multiple registrations should be handled.  I was planning to make it an error if you register twice for the same op for either single_dim or regular strategies.
* I took the cleanest path of integration for now in sharding_prop, reusing as much code as possible with the existing 'op_strategy' case.  I may have to change this later when integrating find_min_cost

Pull Request resolved: pytorch#170359
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#170615, pytorch#167677

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

pytorchmergebot pushed a commit that referenced this pull request


          [DTensor] Hook up output tensor_meta to expand util (#170827)

e783c6e

Enforce tensor_meta is not none for new single-dim rules.

Allow tensor_meta to continue to be None for existing rules for now. We
should consider in the future asserting tensor_meta is required in
DTensorSpec, but for now we just try to limit the bleeding.
Pull Request resolved: #170827
Approved by: https://github.com/dolpm
ghstack dependencies: #170615, #167677, #170359

xgz2 pushed a commit that referenced this pull request


          Revert "[DTensor] Add single-dim registration infra (#170359)"

0d3364d

This reverts commit 32d0782.

Reverted #170359 on behalf of https://github.com/jeanschmidt due to Required to revert #167677 that is required to revert #170615 that is required to revert #170030 ([comment](#170359 (comment)))

xgz2 pushed a commit that referenced this pull request


          Revert "[DTensor] Single Dim Strategy infra (#167677)"

e644698

This reverts commit c3e628e.

Reverted #167677 on behalf of https://github.com/jeanschmidt due to Required to revert #170615 that is required to rever #170030 ([comment](#167677 (comment)))

xgz2 pushed a commit that referenced this pull request


          Reapply "[DTensor] Refactor strategy/rule registration into dedicated…

619eb57

… module (#168221)" (#170615)

This reverts commit c65f67b.

Reverted #170615 on behalf of https://github.com/jeanschmidt due to Required to revert #170030 ([comment](#170615 (comment)))

xgz2 pushed a commit that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

ef9d356

…module (#168221)" (#170615)

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.
Pull Request resolved: #170615
Approved by: https://github.com/wdvr, https://github.com/malfet

xgz2 pushed a commit that referenced this pull request


          [DTensor] Single Dim Strategy infra (#167677)

edeef15

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: #167677
Approved by: https://github.com/weifengpy
ghstack dependencies: #170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

xgz2 pushed a commit that referenced this pull request


          [DTensor] Add single-dim registration infra (#170359)

c6e0267

This PR adds the register_single_dim_strategy util,  and hooks it up to sharding_propagator.  It also tests the registration.

Notes:
* I didn't yet decide how multiple registrations should be handled.  I was planning to make it an error if you register twice for the same op for either single_dim or regular strategies.
* I took the cleanest path of integration for now in sharding_prop, reusing as much code as possible with the existing 'op_strategy' case.  I may have to change this later when integrating find_min_cost

Pull Request resolved: #170359
Approved by: https://github.com/weifengpy
ghstack dependencies: #170615, #167677

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

xgz2 pushed a commit that referenced this pull request


          [DTensor] Hook up output tensor_meta to expand util (#170827)

a0df10f

Enforce tensor_meta is not none for new single-dim rules.

Allow tensor_meta to continue to be None for existing rules for now. We
should consider in the future asserting tensor_meta is required in
DTensorSpec, but for now we just try to limit the bleeding.
Pull Request resolved: #170827
Approved by: https://github.com/dolpm
ghstack dependencies: #170615, #167677, #170359

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

5a299ef

…module (pytorch#168221)" (pytorch#170615)

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.
Pull Request resolved: pytorch#170615
Approved by: https://github.com/wdvr, https://github.com/malfet

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          [DTensor] Single Dim Strategy infra (pytorch#167677)

d9cf8d6

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: pytorch#167677
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          Revert "[DTensor] Add single-dim registration infra (pytorch#170359)"

1bac479

This reverts commit 32d0782.

Reverted pytorch#170359 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#167677 that is required to revert pytorch#170615 that is required to revert pytorch#170030 ([comment](pytorch#170359 (comment)))

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          Revert "[DTensor] Single Dim Strategy infra (pytorch#167677)"

0f2b110

This reverts commit c3e628e.

Reverted pytorch#167677 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#170615 that is required to rever pytorch#170030 ([comment](pytorch#167677 (comment)))

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          Reapply "[DTensor] Refactor strategy/rule registration into dedicated…

… module (pytorch#168221)" (pytorch#170615)

This reverts commit c65f67b.

Reverted pytorch#170615 on behalf of https://github.com/jeanschmidt due to Required to revert pytorch#170030 ([comment](pytorch#170615 (comment)))

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

765198e

…module (pytorch#168221)" (pytorch#170615)

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.
Pull Request resolved: pytorch#170615
Approved by: https://github.com/wdvr, https://github.com/malfet

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          [DTensor] Single Dim Strategy infra (pytorch#167677)

345912a

### Motivation
We find it is too difficult to implement sharding strategies for DTensor, and this slows progress towards full operator coverage, and increases likelihood of bugs in sharding rules.  We also expect to add new Placement types (e.g. more complete support for StridedShard in the near term, but possibly others as well), and the current formulation of sharding strategies is not scalable to adding new placement types.

A primary reason it's so difficult to write sharding rules today is that they combine 2 things: (a) a mathematical description of which in/out shardings are correct for the op, (b) premature runtime optimization to avoid considering too many combinations over N-D mesh.

The proposal is to remove (b) and focus just on (a).

### tl;dr
1. Write sharding prop rules in terms of 1 mesh dim and using a 'placeholder' for generic sharding types
e.g. matmul rule returns:
```
[
   ShardPlaceholder(0), Replicate() -> ShardPlaceholder(0),
   Replicate(), ShardPlaceholder(1), -> ShardPlaceholder(1),
   ShardPlaceholder(1), ShardPlaceholder(0) -> Partial()
]
```
2. After registration, each rule gets automatically expanded to include the real sharding types discovered in the inputs at runtime, and add 'full replication' rule.
e.g.
if inputs are fully replicated, we drop the placeholder rules and only use
```
[
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' is discovered in inputs, we fill placeholders like
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
if 'Shard' and 'StridedShard' are both discovered in the inputs, we expand to
```
[
   Shard(0), Replicate() -> Shard(0),
   Replicate(), Shard(1), -> Shard(1),
   Shard(1), Shard(0) -> Partial(),
   StridedShard(0), Replicate() -> StridedShard(0),
   Replicate(), StridedShard(1), -> StridedShard(1),
   StridedShard(1), StridedShard(0) -> Partial(),
   Replicate(), Replicate() -> Replicate()
]
```
3. After filling the placeholders, we expand to N-D mesh and find the minimum cost
(a) full enumeration via itertools.product is implemented and gives exact parity with rules like 'einsum' today
(b) optimized solution, starting from input placements and iterating in the order of increasing cost until reaching a min-cost solution _without_ having to fully enumerate - under development/prototyping

### This PR
* defines a 'single_dim strategy' function and a ShardingPlaceholder
* adds a util for expanding a single_dim strategy into a regular strategy
* supports StridedShard automatically via ShardingPlaceholder expansion
* writes rules for mm and cat and uses unit tests to validate their expansion

### Next Steps (PR stack)
* Support pointwise and foreach ops in the single_dim infra
* Hook up single-dim strategies to sharding_prop (op registration)
* Start to use single_dim rules to replace existing rules
* Improve the runtime of searching the fully expanded strategy
* Explore using decomps together with single-dim rules to support more operators

Pull Request resolved: pytorch#167677
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#170615

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          [DTensor] Add single-dim registration infra (pytorch#170359)

d98494c

This PR adds the register_single_dim_strategy util,  and hooks it up to sharding_propagator.  It also tests the registration.

Notes:
* I didn't yet decide how multiple registrations should be handled.  I was planning to make it an error if you register twice for the same op for either single_dim or regular strategies.
* I took the cleanest path of integration for now in sharding_prop, reusing as much code as possible with the existing 'op_strategy' case.  I may have to change this later when integrating find_min_cost

Pull Request resolved: pytorch#170359
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#170615, pytorch#167677

Co-authored-by: Pian Pawakapan <pianpwk@meta.com>

krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request


          [DTensor] Hook up output tensor_meta to expand util (pytorch#170827)

51677c6

Enforce tensor_meta is not none for new single-dim rules.

Allow tensor_meta to continue to be None for existing rules for now. We
should consider in the future asserting tensor_meta is required in
DTensorSpec, but for now we just try to limit the bleeding.
Pull Request resolved: pytorch#170827
Approved by: https://github.com/dolpm
ghstack dependencies: pytorch#170615, pytorch#167677, pytorch#170359

github-actions Bot deleted the gh/wconstab/479/head branch

January 18, 2026 02:21

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request


          Revert "[DTensor] Refactor strategy/rule registration into dedicated …

09a7083

…module (#168221)"

This reverts commit cb3754f.

Reverting this change as it affects the import path of a publicly
used API.

ghstack-source-id: b787988
Pull Request resolved: pytorch/pytorch#170615

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td ciflow/inductor Merged release notes: distributed (dtensor) Reverted