[DTensor] enable single-dim strategy for addmm and baddbmm by weifengpy · Pull Request #172387 · pytorch/pytorch

weifengpy · 2026-01-13T22:30:11Z

Stack from ghstack (oldest at bottom):

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2026-01-13T22:30:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172387

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit c1c550d with merge base 011e373 ():

NEW FAILURES - The following jobs have failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 3, 6, linux.rocm.gpu.gfx942.1) (gh)
dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_basic
trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1) (gh)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) (gh)
test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32
trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0575ba2 Pull Request resolved: #172387

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0d94f57 Pull Request resolved: #172387

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

ghstack-source-id: 1c06652 Pull Request resolved: #172387

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

ghstack-source-id: 978884c Pull Request resolved: #172387

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

ghstack-source-id: 3c89275 Pull Request resolved: #172387

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Skylion007 · 2026-01-14T17:04:30Z

+    from torch.distributed.tensor._ops.utils import infer_broadcast_dims_map
+
+    mm_strategies = gen_single_dim_einsum_strategies(mm_equation)
+    self_meta = cast(TensorMeta, args_schema[0])  # bias


Why not do an assert isinstance here? We should use cast sparingly

good catch! I updated the PR to use assert

wconstab · 2026-01-14T18:39:26Z

+    broadcast_dims_map = infer_broadcast_dims_map(mm_out_shape, self_meta.shape)
+
+    # Add bias placement to each strategy
+    addmm_like_strategies: list[list[Placement | _ShardingPlaceholder]] = []


brainstorming- would it be cleaner to add an option to gen_single_dim_einsum_strategies to insert an extra bias placement, rather than having to iteratively update the einsum strategies in the separate helper? (maybe not, but wdyt?)

totally reasonable! the logic are tighter now in gen_single_dim_einsum_strategies. I updated the PR for another review

Hah, I was going to ask why we modify gen_single_dim_einsum_strategies instead of adding a new strategy to accept bias on top of gen_single_dim_einsum_strategies. Looks like this solution has been challenged.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

wconstab · 2026-01-15T15:54:03Z

+            return output_placement
+
+        if isinstance(output_placement, Partial):
+            return Partial()


This isn't good. We should actually return output placement so we also inherit its reduce op

good catch! I am cloning placement now and added a test to catch Partial(avg)

wconstab · 2026-01-15T15:56:27Z

+            return Partial()
+        elif isinstance(output_placement, Replicate):
+            return Replicate()
+        elif isinstance(output_placement, _ShardingPlaceholder):


Seems like we only need this case as if, and then we can have else that covers replicate/partial and returns output placement.

It might be better practice to do something like output placement.clone() so we aren't sharing references but if it's a dataclass we can't mutate then it's ok this way

good sugestion! I updated the PR to have simpler if-else for _ShardingPlaceholder

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

zpcore

LGTM!

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 197020b Pull Request resolved: pytorch/pytorch#172387

ghstack-source-id: f40c4e1 Pull Request resolved: pytorch/pytorch#172387

ghstack-source-id: 17a9ec7 Pull Request resolved: pytorch/pytorch#172387

weifengpy · 2026-01-20T19:21:22Z

@pytorchmergebot merge

pytorchmergebot · 2026-01-20T19:23:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-01-20T20:22:26Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4)

Details for Dev Infra team

Raised by workflow job

weifengpy · 2026-01-22T21:18:40Z

@pytorchmergebot merge -f "unrelated error"

pytorchmergebot · 2026-01-22T21:24:03Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

enable single-dim strategy for addmm

f8c8734

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy mentioned this pull request Jan 13, 2026

[DTensor] insert Replicate at the begining for matmul single dim #172150

Closed

weifengpy mentioned this pull request Jan 13, 2026

[DTensor] enable single dim strategy for mm and bmm #172385

Closed

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Jan 13, 2026

weifengpy added a commit that referenced this pull request Jan 13, 2026

enable single-dim strategy for addmm

01b178e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0575ba2 Pull Request resolved: #172387

Update on "enable single-dim strategy for addmm"

7cdc9ba

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy added a commit that referenced this pull request Jan 13, 2026

enable single-dim strategy for addmm

af5a9a7

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0d94f57 Pull Request resolved: #172387

This was referenced Jan 14, 2026

[DTensor] enable single dim strategy for bmm #172393

Closed

[DTensor] enable single dim strategy for baddbmm #172401

Closed

Update on "enable single-dim strategy for addmm"

8fce8b9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy mentioned this pull request Jan 14, 2026

[DTensor] fix symint bug python #172404

Closed

Update on "enable single-dim strategy for addmm"

7ec425a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy added a commit that referenced this pull request Jan 14, 2026

enable single-dim strategy for addmm and baddbmm

86be2a6

ghstack-source-id: 1c06652 Pull Request resolved: #172387

Update on "enable single-dim strategy for addmm"

0becd9a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy added a commit that referenced this pull request Jan 14, 2026

enable single-dim strategy for addmm and baddbmm

1c7a0a0

ghstack-source-id: 978884c Pull Request resolved: #172387

Update on "enable single-dim strategy for addmm"

dfdc246

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy added a commit that referenced this pull request Jan 14, 2026

enable single-dim strategy for addmm and baddbmm

0187355

ghstack-source-id: 3c89275 Pull Request resolved: #172387

weifengpy added 2 commits January 13, 2026 22:10

Update on "enable single-dim strategy for addmm"

434ecf9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "enable single-dim strategy for addmm"

086aa15

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy changed the title ~~enable single-dim strategy for addmm~~ [DTensor] enable single-dim strategy for addmm and baddbmm Jan 14, 2026

Update on "[DTensor] enable single-dim strategy for addmm and baddbmm"

7cbfafd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy requested review from wconstab and zpcore January 14, 2026 06:39

Skylion007 reviewed Jan 14, 2026

View reviewed changes

wconstab reviewed Jan 14, 2026

View reviewed changes

Update on "[DTensor] enable single-dim strategy for addmm and baddbmm"

32330e0

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy requested review from Skylion007 and wconstab January 14, 2026 22:32

Update on "[DTensor] enable single-dim strategy for addmm and baddbmm"

23a460a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

wconstab reviewed Jan 15, 2026

View reviewed changes

Update on "[DTensor] enable single-dim strategy for addmm and baddbmm"

c1c550d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy requested a review from wconstab January 15, 2026 21:10

weifengpy mentioned this pull request Jan 15, 2026

[DTensor] support DTensor view (flatten/unflatten) with _StridedSharding #166483

Closed

zpcore approved these changes Jan 16, 2026

View reviewed changes

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

enable single-dim strategy for addmm

7983904

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 197020b Pull Request resolved: pytorch/pytorch#172387

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

enable single-dim strategy for addmm and baddbmm

900bd03

ghstack-source-id: f40c4e1 Pull Request resolved: pytorch/pytorch#172387

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

enable single-dim strategy for addmm and baddbmm

1911ee2

ghstack-source-id: 17a9ec7 Pull Request resolved: pytorch/pytorch#172387

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 20, 2026

pytorchmergebot added the merging label Jan 20, 2026

pytorchmergebot removed the merging label Jan 20, 2026

pytorchmergebot added the merging label Jan 22, 2026

pytorchmergebot added the Merged label Jan 22, 2026

pytorchmergebot closed this in f7aff08 Jan 22, 2026

pytorchmergebot removed the merging label Jan 22, 2026

This was referenced Jan 22, 2026

unflatten 2d #173121

Closed

unflatten 2d #173122

Closed

github-actions Bot deleted the gh/weifengpy/51/head branch February 22, 2026 02:21

Conversation

weifengpy commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172387

❌ 9 New Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zpcore left a comment

Choose a reason for hiding this comment

Uh oh!

weifengpy commented Jan 20, 2026

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge started

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge failed

Uh oh!

weifengpy commented Jan 22, 2026

Uh oh!

pytorchmergebot commented Jan 22, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

weifengpy commented Jan 13, 2026 •

edited

Loading

pytorch-bot Bot commented Jan 13, 2026 •

edited

Loading