Skip to content

[DTensor] Register single-dim strategies for categorized pointwise ops#175795

Closed
anshul-si wants to merge 44 commits intogh/anshul-si/95/basefrom
gh/anshul-si/95/head
Closed

[DTensor] Register single-dim strategies for categorized pointwise ops#175795
anshul-si wants to merge 44 commits intogh/anshul-si/95/basefrom
gh/anshul-si/95/head

Conversation

@anshul-si
Copy link
Contributor

@anshul-si anshul-si commented Feb 25, 2026

Stack from ghstack (oldest at bottom):

Switch categorized pointwise ops (.default/._ variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175795

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 5ac45c0 with merge base 4b5eef4 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
@anshul-si anshul-si added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 26, 2026
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 9, 2026
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
anshul-si added a commit that referenced this pull request Mar 9, 2026
Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

ghstack-source-id: 02e7fb8
Pull Request resolved: #175795
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
anshul-si added a commit that referenced this pull request Mar 9, 2026
Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

ghstack-source-id: 4b66558
Pull Request resolved: #175795
…ointwise ops"

Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

[ghstack-poisoned]
anshul-si added a commit that referenced this pull request Mar 10, 2026
Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

ghstack-source-id: 375e46d
Pull Request resolved: #175795
@pytorch-bot pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 10, 2026
@laithsakka
Copy link
Contributor

FYI this adds 7% regression
"dtensor_dispatch_inplace": 7.2298019160376

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@anshul-si
Copy link
Contributor Author

@pytorchmergebot cancel

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 10, 2026

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'cancel' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'lint', 'fix-lint', 'apply-lint', 'cherry-pick')

usage: @pytorchbot [-h]
                   
                   {merge,revert,rebase,label,drci,lint,fix-lint,apply-lint,cherry-pick}
                   ...

Try @pytorchbot --help for more info.

@anshul-si
Copy link
Contributor Author

@pytorchbot --help

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 10, 2026

PyTorchBot Help

usage: @pytorchbot [-h]
                   
                   {merge,revert,rebase,label,drci,lint,fix-lint,apply-lint,cherry-pick}
                   ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,lint,fix-lint,apply-lint,cherry-pick}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    lint                Apply lint fixes to a PR
    fix-lint            ==SUPPRESS==
    apply-lint          ==SUPPRESS==
    cherry-pick         Cherry pick a PR onto a release branch

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

Lint

usage: @pytorchbot lint 

Apply lint fixes to the PR. This will trigger a workflow that automatically
applies lint fixes and pushes them to the PR branch.

Aliases: @pytorchbot fix-lint, @pytorchbot apply-lint

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO, --into ONTO
                        Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@anshul-si anshul-si removed the merging label Mar 10, 2026
@anshul-si
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@anshul-si
Copy link
Contributor Author

anshul-si commented Mar 10, 2026

FYI this adds 7% regression "dtensor_dispatch_inplace": 7.2298019160376

The testcase includes the warm-up step, which is slowed down by the look up time for this strategy. However, subsequent results will be cached, so there shouldn't be an issue here. Will also has #175999 which will make the look up more efficient.

wconstab added a commit that referenced this pull request Mar 10, 2026
The expected count for dtensor_dispatch_inplace (add_) regressed from
56530 to 58710 (~3.9%) after #175795 registered single-dim strategies
for categorized pointwise ops. The regression is on the cached dispatch
path and comes from two sources: an extra dict lookup in the C++
get_runtime_schema_info_for_op (~890 instructions), and a Python heap
layout difference in the cached OutputSharding object (~2860
instructions). Both are minor and not particularly worth fixing. While
the regression is within the 10% CI noise margin, it's better to reset
the counts so we still have our full 10% margin for the future.

Authored with Claude.

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Mar 10, 2026
The expected count for dtensor_dispatch_inplace (add_) regressed from
56530 to 58710 (~3.9%) after #175795 registered single-dim strategies
for categorized pointwise ops. The regression is on the cached dispatch
path and comes from two sources: an extra dict lookup in the C++
get_runtime_schema_info_for_op (~890 instructions), and a Python heap
layout difference in the cached OutputSharding object (~2860
instructions). Both are minor and not particularly worth fixing. While
the regression is within the 10% CI noise margin, it's better to reset
the counts so we still have our full 10% margin for the future.

Authored with Claude.

ghstack-source-id: 1e59629
Pull Request resolved: #177074
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
Switch categorized pointwise ops (.default/._  variants) from
register_op_strategy to register_single_dim_strategy, using the
infrastructure (category lists, rule constants, factory functions)
introduced in the previous PR.

The old register_op_strategy registrations are kept alongside as
fallback for .out variants that will be migrated in a follow-up PR.

Authored with Claude.

ghstack-source-id: 2b59bac
Pull Request resolved: pytorch/pytorch#175795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants