Skip to content

[DTensor] enable single-dim strategy for addmm and baddbmm#172387

Closed
weifengpy wants to merge 12 commits intogh/weifengpy/51/basefrom
gh/weifengpy/51/head
Closed

[DTensor] enable single-dim strategy for addmm and baddbmm#172387
weifengpy wants to merge 12 commits intogh/weifengpy/51/basefrom
gh/weifengpy/51/head

Conversation

@weifengpy
Copy link
Copy Markdown
Contributor

@weifengpy weifengpy commented Jan 13, 2026

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172387

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit c1c550d with merge base 011e373 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

weifengpy added a commit that referenced this pull request Jan 13, 2026
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0575ba2
Pull Request resolved: #172387
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Jan 13, 2026
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0d94f57
Pull Request resolved: #172387
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Jan 14, 2026
ghstack-source-id: 1c06652
Pull Request resolved: #172387
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Jan 14, 2026
ghstack-source-id: 978884c
Pull Request resolved: #172387
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Jan 14, 2026
ghstack-source-id: 3c89275
Pull Request resolved: #172387
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@weifengpy weifengpy changed the title enable single-dim strategy for addmm [DTensor] enable single-dim strategy for addmm and baddbmm Jan 14, 2026
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@weifengpy weifengpy requested review from wconstab and zpcore January 14, 2026 06:39
from torch.distributed.tensor._ops.utils import infer_broadcast_dims_map

mm_strategies = gen_single_dim_einsum_strategies(mm_equation)
self_meta = cast(TensorMeta, args_schema[0]) # bias
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do an assert isinstance here? We should use cast sparingly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I updated the PR to use assert

broadcast_dims_map = infer_broadcast_dims_map(mm_out_shape, self_meta.shape)

# Add bias placement to each strategy
addmm_like_strategies: list[list[Placement | _ShardingPlaceholder]] = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brainstorming- would it be cleaner to add an option to gen_single_dim_einsum_strategies to insert an extra bias placement, rather than having to iteratively update the einsum strategies in the separate helper? (maybe not, but wdyt?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally reasonable! the logic are tighter now in gen_single_dim_einsum_strategies. I updated the PR for another review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I was going to ask why we modify gen_single_dim_einsum_strategies instead of adding a new strategy to accept bias on top of gen_single_dim_einsum_strategies. Looks like this solution has been challenged.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
return output_placement

if isinstance(output_placement, Partial):
return Partial()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't good. We should actually return output placement so we also inherit its reduce op

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I am cloning placement now and added a test to catch Partial(avg)

return Partial()
elif isinstance(output_placement, Replicate):
return Replicate()
elif isinstance(output_placement, _ShardingPlaceholder):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we only need this case as if, and then we can have else that covers replicate/partial and returns output placement.

It might be better practice to do something like output placement.clone() so we aren't sharing references but if it's a dataclass we can't mutate then it's ok this way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good sugestion! I updated the PR to have simpler if-else for _ShardingPlaceholder

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link
Copy Markdown
Member

@zpcore zpcore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 197020b
Pull Request resolved: pytorch/pytorch#172387
SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026
SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026
@weifengpy
Copy link
Copy Markdown
Contributor Author

@pytorchmergebot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 20, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4)

Details for Dev Infra team Raised by workflow job

@weifengpy
Copy link
Copy Markdown
Contributor Author

@pytorchmergebot merge -f "unrelated error"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

This was referenced Jan 22, 2026
@github-actions github-actions Bot deleted the gh/weifengpy/51/head branch February 22, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants